Product metadata for all products purchased by households participating in the Customer Journey study.
products
A data frame with 92,331 rows and 7 variables
product_id: Uniquely identifies each product
manufacturer_id: Uniquely identifies each manufacturer
department: Groups similar products together
brand: Indicates Private or National label brand
product_category: Groups similar products together at lower level
product_type: Groups similar products together at lowest level
package_size: Indicates package size (not available for all products)
84.51°, Customer Journey study, http://www.8451.com/area51/
a tibble
# full data set products#> # A tibble: 92,331 x 7 #> product_id manufacturer_id department brand product_category product_type #> <chr> <chr> <chr> <fct> <chr> <chr> #> 1 25671 2 GROCERY Nati… FRZN ICE ICE - CRUSH… #> 2 26081 2 MISCELLAN… Nati… NA NA #> 3 26093 69 PASTRY Priv… BREAD BREAD:ITALI… #> 4 26190 69 GROCERY Priv… FRUIT - SHELF S… APPLE SAUCE #> 5 26355 69 GROCERY Priv… COOKIES/CONES SPECIALTY C… #> 6 26426 69 GROCERY Priv… SPICES & EXTRAC… SPICES & SE… #> 7 26540 69 GROCERY Priv… COOKIES/CONES TRAY PACK/C… #> 8 26601 69 DRUG GM Priv… VITAMINS VITAMIN - M… #> 9 26636 69 PASTRY Priv… BREAKFAST SWEETS SW GDS: SW … #> 10 26691 16 GROCERY Priv… PNT BTR/JELLY/J… HONEY #> # … with 92,321 more rows, and 1 more variable: package_size <chr># Transaction line items that don't have product metadata require("dplyr") transactions_sample %>% anti_join(products, "product_id")#> # A tibble: 222 x 11 #> household_id store_id basket_id product_id quantity sales_value retail_disc #> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> #> 1 1166 408 31969185… 5978656 0 0 0 #> 2 867 369 40436331… 5978656 0 0 0 #> 3 40 406 40085429… 5978648 0 0 0 #> 4 1633 32004 32187016… 5978656 0 0 0 #> 5 2305 450 35000781… 5978648 0 0 0 #> 6 910 299 35293721… 5978648 0 0 0 #> 7 2178 343 32557004… 5978656 0 0 0 #> 8 115 329 40968842… 5978648 0 0 0 #> 9 367 368 35840831… 5978648 0 0 0 #> 10 2462 403 40853241… 5978648 0 0 0 #> # … with 212 more rows, and 4 more variables: coupon_disc <dbl>, #> # coupon_match_disc <dbl>, week <int>, transaction_timestamp <dttm>