Household demographic metadata for households participating in the Customer Journey study. Due to nature of the data, the demographic information is not available for all households.

demographics

Format

A data frame with 801 rows and 8 variables

  • household_id: Uniquely identifies each household

  • age: Estimated age range

  • income: Household income range

  • home_ownership: Homeowner status (Homeowner, Renter, Unknown)

  • marital_status: Marital status (Married, Single, Unknown)

  • household_size: Size of household up to 5+

  • household_comp: Household composition description

  • kids_count: Number of children present up to 3+

Source

84.51°, Customer Journey study, https://www.8451.com/area51/

Value

demographics

a tibble

Examples

# \donttest{
# full data set
demographics
#> # A tibble: 801 × 8
#>    household_id age   income    home_ownership marital_status household_size
#>    <chr>        <ord> <ord>     <ord>          <ord>          <ord>         
#>  1 1            65+   35-49K    Homeowner      Married        2             
#>  2 1001         45-54 50-74K    Homeowner      Unmarried      1             
#>  3 1003         35-44 25-34K    NA             Unmarried      1             
#>  4 1004         25-34 15-24K    NA             Unmarried      1             
#>  5 101          45-54 Under 15K Homeowner      Married        4             
#>  6 1012         35-44 35-49K    NA             Married        5+            
#>  7 1014         45-54 15-24K    NA             Married        4             
#>  8 1015         45-54 50-74K    Homeowner      Unmarried      1             
#>  9 1018         45-54 35-49K    Homeowner      Married        5+            
#> 10 1020         45-54 25-34K    Homeowner      Married        2             
#> # ℹ 791 more rows
#> # ℹ 2 more variables: household_comp <ord>, kids_count <ord>

# Transaction line items that don't have household metadata
require("dplyr")
transactions_sample %>%
  anti_join(demographics, "household_id")
#> # A tibble: 32,801 × 11
#>    household_id store_id basket_id   product_id quantity sales_value retail_disc
#>    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
#>  1 2261         309      31625220889 940996            1        3.86        0.43
#>  2 2131         368      32053127496 873902            1        1.59        0.9 
#>  3 511          316      32445856036 847901            1        1           0.69
#>  4 918          340      32074655895 1085604           1        1.29        0   
#>  5 1688         450      34850403304 1028715           1        2           1.79
#>  6 467          31782    31280745102 896613            2        6.55        4.44
#>  7 1947         32004    32744181707 978497            1        3.99        0   
#>  8 568          446      32932232291 949023            1        3.49        0.5 
#>  9 1783         369      33409764350 1079223           1        1           0   
#> 10 401          31642    40955342402 839753            1        0.17        0   
#> # ℹ 32,791 more rows
#> # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
#> #   transaction_timestamp <dttm>
# }