Household demographic metadata for households participating in the Customer Journey study. Due to nature of the data, the demographic information is not available for all households.

demographics

Format

A data frame with 801 rows and 8 variables

  • household_id: Uniquely identifies each household

  • age: Estimated age range

  • income: Household income range

  • home_ownership: Homeowner status (Homeowner, Renter, Unknown)

  • marital_status: Marital status (Married, Single, Unknown)

  • household_size: Size of household up to 5+

  • household_comp: Household composition description

  • kids_count: Number of children present up to 3+

Source

84.51°, Customer Journey study, http://www.8451.com/area51/

Value

demographics

a tibble

Examples

# full data set demographics
#> # A tibble: 801 x 8 #> household_id age income home_ownership marital_status household_size #> <chr> <ord> <ord> <ord> <ord> <ord> #> 1 1 65+ 35-49K Homeowner Married 2 #> 2 1001 45-54 50-74K Homeowner Unmarried 1 #> 3 1003 35-44 25-34K NA Unmarried 1 #> 4 1004 25-34 15-24K NA Unmarried 1 #> 5 101 45-54 Under… Homeowner Married 4 #> 6 1012 35-44 35-49K NA Married 5+ #> 7 1014 45-54 15-24K NA Married 4 #> 8 1015 45-54 50-74K Homeowner Unmarried 1 #> 9 1018 45-54 35-49K Homeowner Married 5+ #> 10 1020 45-54 25-34K Homeowner Married 2 #> # … with 791 more rows, and 2 more variables: household_comp <ord>, #> # kids_count <ord>
# Transaction line items that don't have household metadata require("dplyr") transactions_sample %>% anti_join(demographics, "household_id")
#> # A tibble: 32,801 x 11 #> household_id store_id basket_id product_id quantity sales_value retail_disc #> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> #> 1 2261 309 31625220… 940996 1 3.86 0.43 #> 2 2131 368 32053127… 873902 1 1.59 0.9 #> 3 511 316 32445856… 847901 1 1 0.69 #> 4 918 340 32074655… 1085604 1 1.29 0 #> 5 1688 450 34850403… 1028715 1 2 1.79 #> 6 467 31782 31280745… 896613 2 6.55 4.44 #> 7 1947 32004 32744181… 978497 1 3.99 0 #> 8 568 446 32932232… 949023 1 3.49 0.5 #> 9 1783 369 33409764… 1079223 1 1 0 #> 10 401 31642 40955342… 839753 1 0.17 0 #> # … with 32,791 more rows, and 4 more variables: coupon_disc <dbl>, #> # coupon_match_disc <dbl>, week <int>, transaction_timestamp <dttm>