Per capita income and expenditures provide crucial insight into the average standard of living in specified areas. Disposable per capita income measures the average income earned after taxes per person in a given area (city, state, country, etc.) in a specified year. It is calculated by dividing the area’s total income after tax by its total population. Per capita expenditures, on the other hand, measures the average outlay for goods and services by person and provides insight into spending patterns across a given area. Together, the assessment of per capita income versus expenditures can provide better understanding of regional economies, differences in standard of living, and approximate savings rates.

This post involves exploring Bureau of Economic Analysis data regarding per capita disposable income (hereafter referred to as PCI) and per capita personal expenditures (hereafter referred to as PCE). The PCI data provides annual (non-inflation adjusted) per capita disposable income at the national and state-level from 1948-2015 and the PCE data provides annual (non-inflation adjusted) per capita personal consumption expenditures at the national and state-level from 1997-2014. Consequently, this research seeks to identify how the national and state-level savings rates defined as $Savings = PCI - PCE$ has changed over time and by geographic location.

The analysis finds that the national-level and average state-level savings rates have remained around 7-8% since 1997. Furthermore, we find that American’s are not making fundamental shifts in their earnings and expenditure rates. However, the analysis does uncover a noticable shift in the disparity of savings rates across the states in recent years with much of the growth in savings rates being concentrated in the central U.S. states - from the Dakotas down to Oklahoma, Texas and Louisiana. Consequently, it appears that the often neglected fly-over states offer Americans greater opportunities to save than the eastern and western states.

## Packages Required

To reproduce the code and results throughout this project you will need to load the following packages.

library(rvest)        # scraping data
library(tidyr)        # creating tidy data
library(dplyr)        # transforming (joining, summarizing, etc.) data
library(tibble)       # coercing data to tibbles
library(magrittr)     # for piping capabilities
library(DT)           # for printing nice HTML output tables
library(ggplot2)      # visualizing data
library(ggrepel)      # Repel overlapping text labels in plots


## Data Preparation

Prior to assessing how PCI, PCE, and savings rates have behaved over time and by geographic location we must acquire and clean the data.

The data for this project originated from the following sources:

To identify the HTML link to scrape this data follow these steps:

1. Using the links above go to the page displaying either the PCI or PCE data
3. Paste copied link into browser window
#####################
#####################
# url for PCI HTML table

# download PCI table and extract the data frame from the list
pci_raw <- url_pci %>%
html_nodes("table") %>%
.[2] %>%
html_table(fill = TRUE) %>%
.[[1]]

#####################
#####################
# url for PCE HTML table

# download PCE table and extract the data frame from the list
pce_raw <- url_pce %>%
html_nodes("table") %>%
.[2] %>%
html_table(fill = TRUE) %>%
.[[1]]


### Creating Tidy Data

Once the basic data has been acquired we need to pre-process it to get the data into a tidy format. This includes removing punctuations, changing the income and expenditure data from character to a numeric data type, reducing the data sets to the same time period (1997-2014), making sure the common variables share the same names, and changing the data from a wide format to a long format. Once this has been done for both the PCI and PCE data we can merge the clean data frames into one common data frame (titled data_clean) and create a new Savings variable ($Savings = Income - Expenditures$). I also remove the District of Columbia location since this is more comparable to metropolitan-level geographic areas than state-level geographic areas. We now have the data cleaned, in a tidy format, and ready to analyze as Table 1 illustrates.

# create tidy PCI data
pci_clean <- pci_raw %>%
apply(2, function(x) gsub("[[:punct:]]", "", x)) %>%
as_tibble(.) %>%
group_by(GeoFips, GeoName) %>%
mutate_each(funs(as.numeric)) %>%
ungroup() %>%
select(Fips = GeoFips, Location = GeoName, 1997:2014) %>%
gather(Year, Income, -c(Fips, Location))

# create tidy PCE data
pce_clean <- pce_raw %>%
apply(2, function(x) gsub("[[:punct:]]", "", x)) %>%
as_tibble(.) %>%
group_by(Fips, Area) %>%
mutate_each(funs(as.numeric)) %>%
ungroup() %>%
rename(Location = Area) %>%
gather(Year, Expenditures, -c(Fips, Location))

# create tidy merged data frame
data_clean <- pci_clean %>%
left_join(pce_clean) %>%
mutate(Savings = Income - Expenditures,
Year = as.numeric(Year)) %>%
filter(Location != "District of Columbia")

datatable(data_clean, caption = 'Table 1: Clean and tidy data.')


## Exploratory Data Analysis

The primary purpose of this analysis is to assess how national and state-level PCI, PCE, and savings rates have changed over time and by geographic location. Thus, we will proceed by first assessing the national-level trends and then move on to assessing state-level trends.

### National-Level Patterns

At the national-level PCI grew by 79.6% from $22,536 in 1997 to$40,471 in 2014. Expenditures (PCE), on the other hand, grew 82.5% from $20,384 in 1997 to$37,186. Although we are assessing non-inflation adjusted dollars, we can still observe that since 1997 the rate of growth in PCE has only slightly outpaced PCI. Figure 1 illustrates the growing trends (not surprising since inflation has not been removed) and also captures the decrease in both PCI and PCE from 2008 to 2009 due to the Great Recession.

data_clean %>%
filter(Location == "United States") %>%
ggplot(aes(x = Year)) +
geom_line(aes(y = Income, group = 1), color = "darkseagreen4") +
geom_line(aes(y = Expenditures, group = 1), color = "firebrick3") +
geom_ribbon(aes(ymin = Expenditures, ymax = Income, group = 1), fill = "darkseagreen1", alpha = .5) +
annotate("text", x = 2014.15, y = 41000, label = "2014 PCI: $40.5K", color = "darkseagreen4", hjust = 0, size = 3) + annotate("text", x = 2014.15, y = 37000, label = "2014 PCE:$37.2K",
color = "firebrick3", hjust = 0, size = 3) +
annotate("text", x = 2014.15, y = 39000, label = "2014 Savings: $3.3K", color = "darkseagreen2", hjust = 0, size = 3) + scale_x_continuous(NULL, limits = c(1997, 2016.5), breaks = seq(1998, 2014, by = 4)) + scale_y_continuous(NULL, labels = scales::dollar) + ggtitle("Figure 1: Growth in PCI and PCE", subtitle = "Growth represented as current year dollars from 1997-2014 (not adjusted for inflation)") + theme_minimal() + theme(panel.grid.minor = element_blank(), text = element_text(family = "Georgia", size = 12), plot.title = element_text(size = 28, margin = margin(b = 10)), plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)))  However, a closer look at just the savings rate ($Savings Rate = \frac{Savings}{Income}$) depicted in Figure 2 illustrates that no consistent trend has been established. In other words, the aggregate per capita savings rate has not consistently increased or decreased year-over-year. In 1998 the savings rate was 10% but reduced over the next few years to 6.5% in 2005 before peaking at 10.9% in 2012 and then dipping back down to about 8-9% in recent years. Bottom-line is that since 1997 the national-level per capita savings rate has ranged between 6.5% and 10.9% with an average of 8.6%. data_clean %>% filter(Location == "United States") %>% mutate(Savings_Rate = Savings / Income) %>% ggplot(aes(Year, Savings_Rate)) + geom_line() + geom_hline(aes(yintercept = mean(Savings_Rate)), linetype = "dashed", alpha = .5) + annotate("text", x = 2010, y = .08, label = "Average: 8.6%", size = 3) + scale_y_continuous(NULL, labels = scales::percent, limits = c(0, .115)) + scale_x_continuous(NULL, breaks = seq(1998, 2014, by = 4)) + ggtitle("Figure 2: National-level savings rate", subtitle = "Changes in state-level savings rates from 1997-2014") + theme_minimal() + theme(panel.grid.minor = element_blank(), text = element_text(family = "Georgia", size = 12), plot.title = element_text(size = 28, margin = margin(b = 10)), plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)))  However, understanding aggregate ratios and trends provides limited insight regarding lower-level activity 1. Consequently, next we turn to investigating state-level trends. ### State-Level Patterns To get a quick understanding of how U.S. states have progressed over the years we can map the savings rates over time. Figure 3 highlights a few attributes: 1. Note how the earlier years have less diverging colors suggesting that there was more “equality” in the savings rates across the states; however, the latter years appear to have more disparity in the savings rates 2. As the years have progressed it appears that a growth in savings rates has been concentrated in the central states; primarily from the Dakotas down to Texas 3. A few individual states stand out: • Main, Vermont & Montana for savings rates that are consistently less than 0% • Massachusetts for consistently being a top savings rate state data_clean %>% mutate(region = tolower(Location), Savings_Rate = Savings / Income) %>% right_join(map_data("state")) %>% select(-subregion) %>% filter(Year %in% seq(1998, 2014, by = 2)) %>% ggplot(aes(x = long, y = lat, group = group)) + geom_polygon(aes(fill = Savings_Rate)) + facet_wrap(~ Year, ncol = 3) + scale_fill_gradient2(name = "Savings Rate", labels = scales::percent) + ggtitle("Figure 3: Savings rate changes over time", subtitle = "Temporal map assessment of state-level savings rates (1998-2014)") + expand_limits() + theme_void() + theme(strip.text.x = element_text(size = 12), text = element_text(family = "Georgia"), plot.title = element_text(size = 28, margin = margin(b = 10)), plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)))  A closer look at the state-level trends provides more insight. We can see that the average savings rate over time has remained around 7%; however, confirming our assessment of the map it appears that the variance (or disparity in savings rates) has increased in recent years. Moreover, the trend lines illustrate that with a few exceptions, states that are leading the way as top or bottom savings rate states have, historically, always been near the top or bottom. However, this should not be too surprising as it takes decades for states to change their industrial and economic infrastructure. savings_rate <- data_clean %>% mutate(Savings_Rate = Savings / Income) %>% filter(Location != "United States") top5 <- savings_rate %>% arrange(desc(Savings_Rate)) %>% filter(Year == 2014) %>% slice(1:5) bottom5 <- savings_rate %>% arrange(Savings_Rate) %>% filter(Year == 2014) %>% slice(1:5) avg <- savings_rate %>% group_by(Year) %>% summarise(Avg_mn = mean(Savings_Rate), Avg_md = median(Savings_Rate)) %>% mutate(Avg = "Average") ggplot(savings_rate, aes(Year, Savings_Rate, group = Location)) + geom_line(alpha = .1) + geom_line(data = filter(savings_rate, Location %in% top5$Location),
aes(Year, Savings_Rate, group = Location), color = "dodgerblue") +
geom_line(data = filter(savings_rate, Location %in% bottom5\$Location),
aes(Year, Savings_Rate, group = Location), color = "red") +
geom_line(data = avg, aes(Year, Avg_mn, group = 1), linetype = "dashed") +
annotate("text", x = 2014.25, y = .071, label = "Average", hjust = 0, size = 3) +
geom_text_repel(data = top5, aes(label = Location), nudge_x = .5, size = 3) +
geom_point(data = top5, aes(Year, Savings_Rate), color = "dodgerblue") +
geom_text_repel(data = bottom5, aes(label = Location), nudge_x = 0.5, size = 3) +
geom_point(data = bottom5, aes(Year, Savings_Rate), color = "red") +
scale_x_continuous(NULL, limits = c(1997, 2015.25), breaks = seq(1998, 2014, by = 2)) +
scale_y_continuous(NULL, labels = scales::percent) +
ggtitle("Figure 4: Savings rate changes over time",
subtitle = "Temporal assessment of state-level savings rates (1997-2014)") +
theme_minimal() +
theme(panel.grid.minor = element_blank(),
text = element_text(family = "Georgia"),
plot.title = element_text(size = 28, margin = margin(b = 10)),
plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)))


However, we can also look at those states that have had the largest change in their savings rate since 1997. As Table 2 displays, three of the four states with the largest change in their savings rate were Wyoming, Oklahoma and North Dakota; all having a savings rate increase close to, or more than, 10%. The remaining states with the largest changes have all experienced declining savings rates, led by Nevada.

largest_change <- savings_rate %>%
filter(Year == 1997 | Year == 2014) %>%
select(Location, Year, Savings_Rate) %>%
mutate(Change = 2014 - 1997) %>%
arrange(desc(abs(Change))) %>%
mutate(1997 = paste0(round(1997 * 100, 1), "%"),
2014 = paste0(round(2014 * 100, 1), "%"),
Change = paste0(round(Change * 100, 1), "%")) %>%
slice(1:10)

knitr::kable(largest_change, caption = 'Table 2: Top 10 states with the largest change in their savings rate since 1997',
align = c('l', 'r', 'r', 'r'))


Table 2: Top 10 states with the largest change in their savings rate since 1997

Location 1997 2014 Change
Wyoming 7.9% 21.1% 13.2%
Oklahoma 9.7% 21.9% 12.1%
North Dakota -2.2% 6.3% 8.5%
Maine -2.6% -9.8% -7.2%
Michigan 6.7% -0.3% -7.1%
New York 14.4% 7.5% -6.9%
West Virginia 6% -0.5% -6.5%
Montana -1% -7% -6%
New Jersey 13.9% 8% -5.9%

This may lead us to wonder if one component (PCI vs PCE) is driving the changes in savings rate. In other words, for those states that are growing above the average level, is their PCI level growing at a greater level than those states below the average? Or could it be that the those states with above average savings rates are experiencing a slower increase in their expenditures than those states below average. Figure 5 helps to illustrate this issue.

Figure 5 shows that, concerning PCE (left pane), the states that have had above average savings rates have not experienced, on average, any difference in PCE growth since 1997. However, the states with below average savings rates have experienced greater variance in their PCE growth rates. Concerning PCI (right pane), the states that have had above average savings rates have experienced, on average, slightly greater PCE growth since 1997; however, this difference is likely not to be statistically significant (though validation would be required to confirm). Again, those states with below average savings rates have experienced slightly greater variance in their growth rates than the above average savings rate states.

Thus, it appears that those states with below average savings rates have greater variability in their PCI and PCE growth rates whereas those states with above average savings rates have more consistency. However, no significant differences appear to exist in the average PCI & PCE growth rates among states with above versus below average savings rate. This is likely why we are seeing the average savings rate remain relatively steady but the variability in savings rates among the states growing.

changes <- savings_rate %>%
filter(Year == 1997 | Year == 2014) %>%
arrange(Location) %>%
select(Location, Year, Income, Expenditures, Savings_Rate) %>%
group_by(Location) %>%
mutate(PCI = diff(Income) / lag(Income),
PCE = diff(Expenditures) / lag(Expenditures)) %>%
na.omit() %>%
ungroup() %>%
mutate(Group = ifelse(Savings_Rate > mean(Savings_Rate), "Above Average", "Below Average")) %>%
gather(Metric, Value, PCI:PCE)

ggplot(changes, aes(Group, Value)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = .1, alpha = .5) +
geom_text(data = filter(changes, Value > 1.1 | Value < .6), aes(label = Location), size = 3, hjust = 0, nudge_x = .05) +
facet_wrap(~ Metric) +
scale_y_continuous("Percent change from 1997 to 2014", labels = scales::percent) +
xlab(NULL) +
ggtitle("Figure 5: Percent change in PCE & PCI",
subtitle = "Comparing the change in PCE & PCI from 1997 to 2014 for those states with above versus below \naverage savings rates") +
theme_bw() +
theme(panel.grid.minor = element_blank(),
text = element_text(family = "Georgia", size = 12),
strip.text.x = element_text(size = 14),
plot.title = element_text(size = 28, margin = margin(b = 10)),
plot.subtitle = element_text(size = 12, color = "darkslategrey", margin = margin(b = 25)))


## Summary

Consequently, our analysis finds that the national-level and average state-level savings rates have remained around 7-8% since 1997. Furthermore, we find that PCI and PCE have grown at a relatively similiar rates at the national, state-levels, and among those states that have experienced above versus below average savings rates. This suggests that the U.S. has not experienced a fundamental shift in PCI or PCE behavior.

The noticable change that we have seen is a greater disparity in savings rates among the states in recent years. Although the average savings rate has remained around 7-8%, the variance in state-level savings rates has grown since 1997. Moreover, much of the above average growth in savings rates has been concentrated in the central U.S. states from the Dakotas down to Oklahoma, Texas and Louisiana; whereas much of the below average growth has been concentrated in more eastern and western states. Thus, if you are looking to save more of your hard-earned income you may have greater opportunities by seeking refuge in one of the fly-over states.