{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lession 4a: Tidy data\n",
    "\n",
    "[Tidy data](https://vita.had.co.nz/papers/tidy-data.pdf) is about “linking the structure of a dataset with its semantics (its meaning)”. It is defined by:\n",
    "\n",
    "1. Each variable forms a column\n",
    "2. Each observation forms a row\n",
    "3. Each type of observational unit forms a table\n",
    "\n",
    "Often you’ll need to reshape a dataframe to make it tidy (or for some other purpose).\n",
    "\n",
    "<center>\n",
    "<img src=\"https://github.com/bradleyboehmke/uc-bana-6043/blob/main/book/images/tidy.png?raw=true\" alt=\"tidy-data\" width=\"80%\" height=\"80%\">\n",
    "</center>\n",
    "\n",
    "Source: [R4DS](https://r4ds.had.co.nz/tidy-data.html#fig:tidy-structure)\n",
    "\n",
    "Once a DataFrame is tidy, it becomes much easier to compute summary statistics, join with other datasets, visualize, apply machine learning models, etc. In this lesson we will focus on ways to reshape DataFrames so that they meet the tidy guidelines.\n",
    "\n",
    "```{admonition} Video 🎥:\n",
    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/A27KVLjFc5M?si=dxIqtSEuoF_atJVj\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen></iframe>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learning objectives\n",
    "\n",
    "By the end of this lesson you will be able to:\n",
    "\n",
    "- Reshape data from wide to long\n",
    "- Reshape data from long to wide"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tools for reshaping\n",
    "\n",
    "Pandas provides multiple methods that help to reshape DataFrames:\n",
    "\n",
    "* `.melt()`: make wide data long.\n",
    "* `.pivot()`: make long data width.\n",
    "* `.pivot_table()`: same as `.pivot()` but can handle multiple indexes.\n",
    "\n",
    "<center>\n",
    "<img src=\"https://github.com/bradleyboehmke/uc-bana-6043/blob/main/book/images/melt_pivot.gif?raw=true\" alt=\"melt-pivot\" width=\"60%\" height=\"60%\">\n",
    "</center>\n",
    "\n",
    "Source: [Garrick Aden-Buie](https://github.com/gadenbuie/tidyexplain#spread-and-gather)\n",
    "\n",
    "The following will illustrate each of these for their unique purpose."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Melting wide data\n",
    "\n",
    "The below data shows how many homes were sold within each neighborhood across each year. Is this considered 'tidy data'?  No because we have a variable (year sold) that is represented as the columns and another variable (number of homes sold) filled in as the element values. \n",
    "\n",
    "If you wanted to answer questions like: “Does the number of homes sold vary depending on year?” then the below data is not in the appropriate form to answer this question."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>neighborhood</th>\n",
       "      <th>2006</th>\n",
       "      <th>2007</th>\n",
       "      <th>2008</th>\n",
       "      <th>2009</th>\n",
       "      <th>2010</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Blmngtn</td>\n",
       "      <td>11.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Blueste</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>BrDale</td>\n",
       "      <td>9.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>BrkSide</td>\n",
       "      <td>19.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>31.0</td>\n",
       "      <td>23.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ClearCr</td>\n",
       "      <td>10.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  neighborhood  2006  2007  2008  2009  2010\n",
       "0      Blmngtn  11.0   4.0   5.0   6.0   2.0\n",
       "1      Blueste   NaN   2.0   2.0   4.0   2.0\n",
       "2       BrDale   9.0   5.0   7.0   6.0   3.0\n",
       "3      BrkSide  19.0  24.0  31.0  23.0  11.0\n",
       "4      ClearCr  10.0   9.0  11.0   6.0   8.0"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "ames_wide = pd.read_csv('../data/ames_wide.csv')\n",
    "ames_wide.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example we would consider this data \"wide\" and our objective is to convert it into a DataFrame with three variables:\n",
    "\n",
    "1. neighborhood\n",
    "2. year\n",
    "3. homes_sold\n",
    "\n",
    "To do so we'll use the `.melt()` method. `.melt()` arguments include:\n",
    "\n",
    "- `id_vars`: Identifier column\n",
    "- `var_name`: Name to give the new variable represented by the old column headers\n",
    "- `value_name`: Name to give the new variable represented by the old element values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>neighborhood</th>\n",
       "      <th>year</th>\n",
       "      <th>homes_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Blmngtn</td>\n",
       "      <td>2006</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Blueste</td>\n",
       "      <td>2006</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>BrDale</td>\n",
       "      <td>2006</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>BrkSide</td>\n",
       "      <td>2006</td>\n",
       "      <td>19.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ClearCr</td>\n",
       "      <td>2006</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>135</th>\n",
       "      <td>SawyerW</td>\n",
       "      <td>2010</td>\n",
       "      <td>18.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>136</th>\n",
       "      <td>Somerst</td>\n",
       "      <td>2010</td>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>137</th>\n",
       "      <td>StoneBr</td>\n",
       "      <td>2010</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>138</th>\n",
       "      <td>Timber</td>\n",
       "      <td>2010</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139</th>\n",
       "      <td>Veenker</td>\n",
       "      <td>2010</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>140 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    neighborhood  year  homes_sold\n",
       "0        Blmngtn  2006        11.0\n",
       "1        Blueste  2006         NaN\n",
       "2         BrDale  2006         9.0\n",
       "3        BrkSide  2006        19.0\n",
       "4        ClearCr  2006        10.0\n",
       "..           ...   ...         ...\n",
       "135      SawyerW  2010        18.0\n",
       "136      Somerst  2010        21.0\n",
       "137      StoneBr  2010         6.0\n",
       "138       Timber  2010         8.0\n",
       "139      Veenker  2010         NaN\n",
       "\n",
       "[140 rows x 3 columns]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ames_melt = ames_wide.melt(id_vars='neighborhood', var_name='year', value_name='homes_sold')\n",
    "ames_melt\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `value_vars` argument allows us to select which specific variables we want to “melt” (if you don’t specify `value_vars`, all non-identifier columns will be used). For example, below I’m omitting the 2006 column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>neighborhood</th>\n",
       "      <th>year</th>\n",
       "      <th>homes_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Blmngtn</td>\n",
       "      <td>2007</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Blueste</td>\n",
       "      <td>2007</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>BrDale</td>\n",
       "      <td>2007</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>BrkSide</td>\n",
       "      <td>2007</td>\n",
       "      <td>24.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ClearCr</td>\n",
       "      <td>2007</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>107</th>\n",
       "      <td>SawyerW</td>\n",
       "      <td>2010</td>\n",
       "      <td>18.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108</th>\n",
       "      <td>Somerst</td>\n",
       "      <td>2010</td>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>109</th>\n",
       "      <td>StoneBr</td>\n",
       "      <td>2010</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>110</th>\n",
       "      <td>Timber</td>\n",
       "      <td>2010</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>111</th>\n",
       "      <td>Veenker</td>\n",
       "      <td>2010</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>112 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    neighborhood  year  homes_sold\n",
       "0        Blmngtn  2007         4.0\n",
       "1        Blueste  2007         2.0\n",
       "2         BrDale  2007         5.0\n",
       "3        BrkSide  2007        24.0\n",
       "4        ClearCr  2007         9.0\n",
       "..           ...   ...         ...\n",
       "107      SawyerW  2010        18.0\n",
       "108      Somerst  2010        21.0\n",
       "109      StoneBr  2010         6.0\n",
       "110       Timber  2010         8.0\n",
       "111      Veenker  2010         NaN\n",
       "\n",
       "[112 rows x 3 columns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ames_wide.melt(\n",
    "    id_vars='neighborhood',\n",
    "    value_vars=['2007', '2008', '2009', '2010'],\n",
    "    var_name='year',\n",
    "    value_name='homes_sold'\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Knowledge check\n",
    "\n",
    "```{admonition} Questions:\n",
    ":class: attention\n",
    "Given the following DataFrame, reshape the DataFrame from the current \"wide\" format to a \"longer\" format made up of the following variables:\n",
    "\n",
    "- `Name`: will contain the same values in the current `Name` column, \n",
    "- `Year`: will contain the year values which are currently column names, and \n",
    "- `Courses`: will contain the values that are currently listed under each year variable.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({\"Name\": [\"Tom\", \"Mike\", \"Tiffany\", \"Varada\", \"Joel\"],\n",
    "                   \"2018\": [1, 3, 4, 5, 3],\n",
    "                   \"2019\": [2, 4, 3, 2, 1],\n",
    "                   \"2020\": [5, 2, 4, 4, 3]})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{admonition} Video 🎥:\n",
    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/65TxCizdvfQ?si=0wOv2nR6h8blh38p\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen></iframe>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Pivoting long data\n",
    "\n",
    "Sometimes, you want to make long data wide, which we can do with `.pivot()`. When using `.pivot()` we need to specify the index to pivot on, and the columns that will be used to make the new columns of the wider dataframe. Let's convert our `ames_melt` DataFrame back to the wide format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>year</th>\n",
       "      <th>2006</th>\n",
       "      <th>2007</th>\n",
       "      <th>2008</th>\n",
       "      <th>2009</th>\n",
       "      <th>2010</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>neighborhood</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Blmngtn</th>\n",
       "      <td>11.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Blueste</th>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>BrDale</th>\n",
       "      <td>9.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>BrkSide</th>\n",
       "      <td>19.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>31.0</td>\n",
       "      <td>23.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ClearCr</th>\n",
       "      <td>10.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>CollgCr</th>\n",
       "      <td>60.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>58.0</td>\n",
       "      <td>63.0</td>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Crawfor</th>\n",
       "      <td>20.0</td>\n",
       "      <td>34.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Edwards</th>\n",
       "      <td>43.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>42.0</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Gilbert</th>\n",
       "      <td>38.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Greens</th>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>GrnHill</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>IDOTRR</th>\n",
       "      <td>20.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Landmrk</th>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MeadowV</th>\n",
       "      <td>10.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mitchel</th>\n",
       "      <td>23.0</td>\n",
       "      <td>28.0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NAmes</th>\n",
       "      <td>99.0</td>\n",
       "      <td>105.0</td>\n",
       "      <td>86.0</td>\n",
       "      <td>95.0</td>\n",
       "      <td>58.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NPkVill</th>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NWAmes</th>\n",
       "      <td>25.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NoRidge</th>\n",
       "      <td>17.0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NridgHt</th>\n",
       "      <td>32.0</td>\n",
       "      <td>43.0</td>\n",
       "      <td>31.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OldTown</th>\n",
       "      <td>51.0</td>\n",
       "      <td>48.0</td>\n",
       "      <td>56.0</td>\n",
       "      <td>55.0</td>\n",
       "      <td>29.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SWISU</th>\n",
       "      <td>12.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sawyer</th>\n",
       "      <td>38.0</td>\n",
       "      <td>37.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>23.0</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SawyerW</th>\n",
       "      <td>20.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>18.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Somerst</th>\n",
       "      <td>29.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>StoneBr</th>\n",
       "      <td>15.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Timber</th>\n",
       "      <td>11.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Veenker</th>\n",
       "      <td>4.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "year          2006   2007  2008  2009  2010\n",
       "neighborhood                               \n",
       "Blmngtn       11.0    4.0   5.0   6.0   2.0\n",
       "Blueste        NaN    2.0   2.0   4.0   2.0\n",
       "BrDale         9.0    5.0   7.0   6.0   3.0\n",
       "BrkSide       19.0   24.0  31.0  23.0  11.0\n",
       "ClearCr       10.0    9.0  11.0   6.0   8.0\n",
       "CollgCr       60.0   65.0  58.0  63.0  21.0\n",
       "Crawfor       20.0   34.0  21.0  21.0   7.0\n",
       "Edwards       43.0   41.0  45.0  42.0  23.0\n",
       "Gilbert       38.0   45.0  26.0  41.0  15.0\n",
       "Greens         4.0    1.0   NaN   1.0   2.0\n",
       "GrnHill        1.0    1.0   NaN   NaN   NaN\n",
       "IDOTRR        20.0   25.0  26.0  13.0   9.0\n",
       "Landmrk        1.0    NaN   NaN   NaN   NaN\n",
       "MeadowV       10.0    8.0   7.0   5.0   7.0\n",
       "Mitchel       23.0   28.0  22.0  24.0  17.0\n",
       "NAmes         99.0  105.0  86.0  95.0  58.0\n",
       "NPkVill        3.0    3.0   3.0  10.0   4.0\n",
       "NWAmes        25.0   30.0  30.0  35.0  11.0\n",
       "NoRidge       17.0   17.0  14.0  13.0  10.0\n",
       "NridgHt       32.0   43.0  31.0  45.0  15.0\n",
       "OldTown       51.0   48.0  56.0  55.0  29.0\n",
       "SWISU         12.0    5.0  10.0  10.0  11.0\n",
       "Sawyer        38.0   37.0  30.0  23.0  23.0\n",
       "SawyerW       20.0   21.0  25.0  41.0  18.0\n",
       "Somerst       29.0   51.0  41.0  40.0  21.0\n",
       "StoneBr       15.0   12.0  10.0   8.0   6.0\n",
       "Timber        11.0   21.0  18.0  14.0   8.0\n",
       "Veenker        4.0    9.0   7.0   4.0   NaN"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ames_pivot = ames_melt.pivot(index='neighborhood', columns='year', values='homes_sold')\n",
    "ames_pivot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You’ll notice that Pandas set our specified index as the index of the new DataFrame and preserved the label of the columns. We can easily remove these names and reset the index to make our DataFrame look like it originally did:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>neighborhood</th>\n",
       "      <th>2006</th>\n",
       "      <th>2007</th>\n",
       "      <th>2008</th>\n",
       "      <th>2009</th>\n",
       "      <th>2010</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Blmngtn</td>\n",
       "      <td>11.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Blueste</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>BrDale</td>\n",
       "      <td>9.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>BrkSide</td>\n",
       "      <td>19.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>31.0</td>\n",
       "      <td>23.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ClearCr</td>\n",
       "      <td>10.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>CollgCr</td>\n",
       "      <td>60.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>58.0</td>\n",
       "      <td>63.0</td>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Crawfor</td>\n",
       "      <td>20.0</td>\n",
       "      <td>34.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Edwards</td>\n",
       "      <td>43.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>42.0</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Gilbert</td>\n",
       "      <td>38.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Greens</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>GrnHill</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>IDOTRR</td>\n",
       "      <td>20.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Landmrk</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>MeadowV</td>\n",
       "      <td>10.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Mitchel</td>\n",
       "      <td>23.0</td>\n",
       "      <td>28.0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>NAmes</td>\n",
       "      <td>99.0</td>\n",
       "      <td>105.0</td>\n",
       "      <td>86.0</td>\n",
       "      <td>95.0</td>\n",
       "      <td>58.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>NPkVill</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>NWAmes</td>\n",
       "      <td>25.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>NoRidge</td>\n",
       "      <td>17.0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>NridgHt</td>\n",
       "      <td>32.0</td>\n",
       "      <td>43.0</td>\n",
       "      <td>31.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>OldTown</td>\n",
       "      <td>51.0</td>\n",
       "      <td>48.0</td>\n",
       "      <td>56.0</td>\n",
       "      <td>55.0</td>\n",
       "      <td>29.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>SWISU</td>\n",
       "      <td>12.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Sawyer</td>\n",
       "      <td>38.0</td>\n",
       "      <td>37.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>23.0</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>SawyerW</td>\n",
       "      <td>20.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>18.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Somerst</td>\n",
       "      <td>29.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>StoneBr</td>\n",
       "      <td>15.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Timber</td>\n",
       "      <td>11.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Veenker</td>\n",
       "      <td>4.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   neighborhood  2006   2007  2008  2009  2010\n",
       "0       Blmngtn  11.0    4.0   5.0   6.0   2.0\n",
       "1       Blueste   NaN    2.0   2.0   4.0   2.0\n",
       "2        BrDale   9.0    5.0   7.0   6.0   3.0\n",
       "3       BrkSide  19.0   24.0  31.0  23.0  11.0\n",
       "4       ClearCr  10.0    9.0  11.0   6.0   8.0\n",
       "5       CollgCr  60.0   65.0  58.0  63.0  21.0\n",
       "6       Crawfor  20.0   34.0  21.0  21.0   7.0\n",
       "7       Edwards  43.0   41.0  45.0  42.0  23.0\n",
       "8       Gilbert  38.0   45.0  26.0  41.0  15.0\n",
       "9        Greens   4.0    1.0   NaN   1.0   2.0\n",
       "10      GrnHill   1.0    1.0   NaN   NaN   NaN\n",
       "11       IDOTRR  20.0   25.0  26.0  13.0   9.0\n",
       "12      Landmrk   1.0    NaN   NaN   NaN   NaN\n",
       "13      MeadowV  10.0    8.0   7.0   5.0   7.0\n",
       "14      Mitchel  23.0   28.0  22.0  24.0  17.0\n",
       "15        NAmes  99.0  105.0  86.0  95.0  58.0\n",
       "16      NPkVill   3.0    3.0   3.0  10.0   4.0\n",
       "17       NWAmes  25.0   30.0  30.0  35.0  11.0\n",
       "18      NoRidge  17.0   17.0  14.0  13.0  10.0\n",
       "19      NridgHt  32.0   43.0  31.0  45.0  15.0\n",
       "20      OldTown  51.0   48.0  56.0  55.0  29.0\n",
       "21        SWISU  12.0    5.0  10.0  10.0  11.0\n",
       "22       Sawyer  38.0   37.0  30.0  23.0  23.0\n",
       "23      SawyerW  20.0   21.0  25.0  41.0  18.0\n",
       "24      Somerst  29.0   51.0  41.0  40.0  21.0\n",
       "25      StoneBr  15.0   12.0  10.0   8.0   6.0\n",
       "26       Timber  11.0   21.0  18.0  14.0   8.0\n",
       "27      Veenker   4.0    9.0   7.0   4.0   NaN"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ames_pivot = ames_pivot.reset_index()\n",
    "ames_pivot.columns.name = None\n",
    "ames_pivot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Knowledge check\n",
    "\n",
    "```{admonition} Questions:\n",
    ":class: attention\n",
    "Given the following DataFrame, reshape the DataFrame from the current \"long\" format to a \"wider\" format made up of the following variables:\n",
    "\n",
    "- `Name`: will contain the same values in the current `Name` column, \n",
    "- `Year`: will contain the year values which are currently column names, and \n",
    "- `Courses`: will contain the values that are currently listed under each year variable.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({\n",
    "    \"Name\": [\"Tom\", \"Mike\", \"Tiffany\", \"Tom\", \"Mike\", \"Tiffany\"],\n",
    "    \"Variable\": [\"Year\", \"Year\", \"Year\", \"Courses\", \"Courses\", \"Courses\"],\n",
    "    \"Value\": [2018, 2018, 2018, 1, 3, 4]\n",
    "})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{admonition} Video 🎥:\n",
    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/9OfiqZfwFME?si=9PIO_JrHuw76OCXJ\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen></iframe>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pivoting with special needs\n",
    "\n",
    "`.pivot()` will often get you what you want, but it won’t work if you want to:\n",
    "\n",
    "* Use multiple indexes or\n",
    "* Have duplicate index/column labels\n",
    "\n",
    "For example, let's look at pivoting the below data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>neighborhood</th>\n",
       "      <th>year_sold</th>\n",
       "      <th>bedrooms</th>\n",
       "      <th>homes_sold</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Blmngtn</td>\n",
       "      <td>2006</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Blmngtn</td>\n",
       "      <td>2006</td>\n",
       "      <td>2</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Blmngtn</td>\n",
       "      <td>2007</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Blmngtn</td>\n",
       "      <td>2008</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Blmngtn</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>430</th>\n",
       "      <td>Veenker</td>\n",
       "      <td>2007</td>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>431</th>\n",
       "      <td>Veenker</td>\n",
       "      <td>2008</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>432</th>\n",
       "      <td>Veenker</td>\n",
       "      <td>2008</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>433</th>\n",
       "      <td>Veenker</td>\n",
       "      <td>2009</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>434</th>\n",
       "      <td>Veenker</td>\n",
       "      <td>2009</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>435 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    neighborhood  year_sold  bedrooms  homes_sold\n",
       "0        Blmngtn       2006         1           1\n",
       "1        Blmngtn       2006         2          10\n",
       "2        Blmngtn       2007         2           4\n",
       "3        Blmngtn       2008         2           5\n",
       "4        Blmngtn       2009         1           2\n",
       "..           ...        ...       ...         ...\n",
       "430      Veenker       2007         3           6\n",
       "431      Veenker       2008         1           3\n",
       "432      Veenker       2008         3           4\n",
       "433      Veenker       2009         2           3\n",
       "434      Veenker       2009         4           1\n",
       "\n",
       "[435 rows x 4 columns]"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ames2 = pd.read_csv('../data/ames_wide2.csv')\n",
    "ames2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example, say you wanted to pivot `ames_wide2` so that the `year_sold` is represented as columns and `homes_sold` values are the elements. If we try to do this similar to the last section's example we get an error stating `ValueError: Index contains duplicate entries, cannot reshape`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "Index contains duplicate entries, cannot reshape",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m/var/folders/8f/c06lv6q17tjbyjv2nkt0_s4s1sh0tg/T/ipykernel_5256/2062246961.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mames2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpivot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'neighborhood'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcolumns\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'year_sold'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalues\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'homes_sold'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36mpivot\u001b[0;34m(self, index, columns, values)\u001b[0m\n\u001b[1;32m   6877\u001b[0m         \u001b[0;32mfrom\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpivot\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpivot\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   6878\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 6879\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0mpivot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcolumns\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalues\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   6880\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   6881\u001b[0m     _shared_docs[\n",
      "\u001b[0;32m~/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pandas/core/reshape/pivot.py\u001b[0m in \u001b[0;36mpivot\u001b[0;34m(data, index, columns, values)\u001b[0m\n\u001b[1;32m    459\u001b[0m         \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    460\u001b[0m             \u001b[0mindexed\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_constructor_sliced\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_values\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 461\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0mindexed\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munstack\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    462\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    463\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pandas/core/series.py\u001b[0m in \u001b[0;36munstack\u001b[0;34m(self, level, fill_value)\u001b[0m\n\u001b[1;32m   3827\u001b[0m         \u001b[0;32mfrom\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0munstack\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   3828\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3829\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0munstack\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfill_value\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   3830\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   3831\u001b[0m     \u001b[0;31m# ----------------------------------------------------------------------\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pandas/core/reshape/reshape.py\u001b[0m in \u001b[0;36munstack\u001b[0;34m(obj, level, fill_value)\u001b[0m\n\u001b[1;32m    428\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mis_extension_array_dtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    429\u001b[0m             \u001b[0;32mreturn\u001b[0m \u001b[0m_unstack_extension_series\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfill_value\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 430\u001b[0;31m         unstacker = _Unstacker(\n\u001b[0m\u001b[1;32m    431\u001b[0m             \u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlevel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconstructor\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_constructor_expanddim\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    432\u001b[0m         )\n",
      "\u001b[0;32m~/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pandas/core/reshape/reshape.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, index, level, constructor)\u001b[0m\n\u001b[1;32m    116\u001b[0m             \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Unstacked DataFrame is too big, causing int32 overflow\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    117\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 118\u001b[0;31m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_selectors\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    119\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    120\u001b[0m     \u001b[0;34m@\u001b[0m\u001b[0mcache_readonly\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/.pyenv/versions/3.9.4/lib/python3.9/site-packages/pandas/core/reshape/reshape.py\u001b[0m in \u001b[0;36m_make_selectors\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    165\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    166\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mmask\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 167\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Index contains duplicate entries, cannot reshape\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    168\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    169\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroup_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcomp_index\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mValueError\u001b[0m: Index contains duplicate entries, cannot reshape"
     ]
    }
   ],
   "source": [
    "ames2.pivot(index='neighborhood', columns='year_sold', values='homes_sold')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The reason is we have duplicate values in our neighborhood column and Pandas doesn't know how to isolate the index values to properly align the pivoted data. In such a case, we’d use `.pivot_table()`. It will apply an aggregation function to our duplicates, in this case, we’ll `sum()` them up:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>year_sold</th>\n",
       "      <th>2006</th>\n",
       "      <th>2007</th>\n",
       "      <th>2008</th>\n",
       "      <th>2009</th>\n",
       "      <th>2010</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>neighborhood</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Blmngtn</th>\n",
       "      <td>11.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Blueste</th>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>BrDale</th>\n",
       "      <td>9.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>BrkSide</th>\n",
       "      <td>19.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>31.0</td>\n",
       "      <td>23.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ClearCr</th>\n",
       "      <td>10.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>CollgCr</th>\n",
       "      <td>60.0</td>\n",
       "      <td>65.0</td>\n",
       "      <td>58.0</td>\n",
       "      <td>63.0</td>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Crawfor</th>\n",
       "      <td>20.0</td>\n",
       "      <td>34.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Edwards</th>\n",
       "      <td>43.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>42.0</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Gilbert</th>\n",
       "      <td>38.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Greens</th>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>GrnHill</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>IDOTRR</th>\n",
       "      <td>20.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>26.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Landmrk</th>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MeadowV</th>\n",
       "      <td>10.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mitchel</th>\n",
       "      <td>23.0</td>\n",
       "      <td>28.0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NAmes</th>\n",
       "      <td>99.0</td>\n",
       "      <td>105.0</td>\n",
       "      <td>86.0</td>\n",
       "      <td>95.0</td>\n",
       "      <td>58.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NPkVill</th>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NWAmes</th>\n",
       "      <td>25.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NoRidge</th>\n",
       "      <td>17.0</td>\n",
       "      <td>17.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NridgHt</th>\n",
       "      <td>32.0</td>\n",
       "      <td>43.0</td>\n",
       "      <td>31.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OldTown</th>\n",
       "      <td>51.0</td>\n",
       "      <td>48.0</td>\n",
       "      <td>56.0</td>\n",
       "      <td>55.0</td>\n",
       "      <td>29.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SWISU</th>\n",
       "      <td>12.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sawyer</th>\n",
       "      <td>38.0</td>\n",
       "      <td>37.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>23.0</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SawyerW</th>\n",
       "      <td>20.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>18.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Somerst</th>\n",
       "      <td>29.0</td>\n",
       "      <td>51.0</td>\n",
       "      <td>41.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>21.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>StoneBr</th>\n",
       "      <td>15.0</td>\n",
       "      <td>12.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>8.0</td>\n",
       "      <td>6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Timber</th>\n",
       "      <td>11.0</td>\n",
       "      <td>21.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Veenker</th>\n",
       "      <td>4.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>7.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "year_sold     2006   2007  2008  2009  2010\n",
       "neighborhood                               \n",
       "Blmngtn       11.0    4.0   5.0   6.0   2.0\n",
       "Blueste        NaN    2.0   2.0   4.0   2.0\n",
       "BrDale         9.0    5.0   7.0   6.0   3.0\n",
       "BrkSide       19.0   24.0  31.0  23.0  11.0\n",
       "ClearCr       10.0    9.0  11.0   6.0   8.0\n",
       "CollgCr       60.0   65.0  58.0  63.0  21.0\n",
       "Crawfor       20.0   34.0  21.0  21.0   7.0\n",
       "Edwards       43.0   41.0  45.0  42.0  23.0\n",
       "Gilbert       38.0   45.0  26.0  41.0  15.0\n",
       "Greens         4.0    1.0   NaN   1.0   2.0\n",
       "GrnHill        1.0    1.0   NaN   NaN   NaN\n",
       "IDOTRR        20.0   25.0  26.0  13.0   9.0\n",
       "Landmrk        1.0    NaN   NaN   NaN   NaN\n",
       "MeadowV       10.0    8.0   7.0   5.0   7.0\n",
       "Mitchel       23.0   28.0  22.0  24.0  17.0\n",
       "NAmes         99.0  105.0  86.0  95.0  58.0\n",
       "NPkVill        3.0    3.0   3.0  10.0   4.0\n",
       "NWAmes        25.0   30.0  30.0  35.0  11.0\n",
       "NoRidge       17.0   17.0  14.0  13.0  10.0\n",
       "NridgHt       32.0   43.0  31.0  45.0  15.0\n",
       "OldTown       51.0   48.0  56.0  55.0  29.0\n",
       "SWISU         12.0    5.0  10.0  10.0  11.0\n",
       "Sawyer        38.0   37.0  30.0  23.0  23.0\n",
       "SawyerW       20.0   21.0  25.0  41.0  18.0\n",
       "Somerst       29.0   51.0  41.0  40.0  21.0\n",
       "StoneBr       15.0   12.0  10.0   8.0   6.0\n",
       "Timber        11.0   21.0  18.0  14.0   8.0\n",
       "Veenker        4.0    9.0   7.0   4.0   NaN"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ames2.pivot_table(index='neighborhood', columns='year_sold', values='homes_sold', aggfunc='sum')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we wanted to keep the numbers per bedroom, we could specify both `neighborhood` and `bedrooms` as multiple indexes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year_sold</th>\n",
       "      <th>2006</th>\n",
       "      <th>2007</th>\n",
       "      <th>2008</th>\n",
       "      <th>2009</th>\n",
       "      <th>2010</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>neighborhood</th>\n",
       "      <th>bedrooms</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Blmngtn</th>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">Blueste</th>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">Veenker</th>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>125 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "year_sold              2006  2007  2008  2009  2010\n",
       "neighborhood bedrooms                              \n",
       "Blmngtn      1          1.0   NaN   NaN   2.0   NaN\n",
       "             2         10.0   4.0   5.0   4.0   2.0\n",
       "Blueste      1          NaN   1.0   NaN   NaN   1.0\n",
       "             2          NaN   1.0   1.0   4.0   1.0\n",
       "             3          NaN   NaN   1.0   NaN   NaN\n",
       "...                     ...   ...   ...   ...   ...\n",
       "Veenker      0          1.0   NaN   NaN   NaN   NaN\n",
       "             1          NaN   1.0   3.0   NaN   NaN\n",
       "             2          1.0   2.0   NaN   3.0   NaN\n",
       "             3          2.0   6.0   4.0   NaN   NaN\n",
       "             4          NaN   NaN   NaN   1.0   NaN\n",
       "\n",
       "[125 rows x 5 columns]"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ames2.pivot(index=['neighborhood', 'bedrooms'], columns='year_sold', values='homes_sold')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result above is a mutlti-index or “hierarchically indexed” DataFrame, which we haven't really talked about up to this point. However, we can easily flatten this with `.reset_index()` and removing the column's index name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>neighborhood</th>\n",
       "      <th>bedrooms</th>\n",
       "      <th>2006</th>\n",
       "      <th>2007</th>\n",
       "      <th>2008</th>\n",
       "      <th>2009</th>\n",
       "      <th>2010</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Blmngtn</td>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Blmngtn</td>\n",
       "      <td>2</td>\n",
       "      <td>10.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Blueste</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Blueste</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Blueste</td>\n",
       "      <td>3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  neighborhood  bedrooms  2006  2007  2008  2009  2010\n",
       "0      Blmngtn         1   1.0   NaN   NaN   2.0   NaN\n",
       "1      Blmngtn         2  10.0   4.0   5.0   4.0   2.0\n",
       "2      Blueste         1   NaN   1.0   NaN   NaN   1.0\n",
       "3      Blueste         2   NaN   1.0   1.0   4.0   1.0\n",
       "4      Blueste         3   NaN   NaN   1.0   NaN   NaN"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ames2_reshaped = (\n",
    "    ames2\n",
    "    .pivot(index=['neighborhood', 'bedrooms'], columns='year_sold', values='homes_sold')\n",
    "    .reset_index()\n",
    ")\n",
    "ames2_reshaped.columns.name = None\n",
    "ames2_reshaped.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Additional video\n",
    "\n",
    "```{admonition} Video 🎥:\n",
    "Here's a webinar that provides a thorough discussion around tidy data principles along with illustrating examples of reshaping data with Pandas. It is longer (50 minutes) but is worth a watch if you are still trying to get your arms around the above lesson conceps.\n",
    "\n",
    "<iframe id=\"kaltura_player\" src=\"https://cdnapisec.kaltura.com/p/1492301/sp/149230100/embedIframeJs/uiconf_id/49148882/partner_id/1492301?iframeembed=true&playerId=kaltura_player&entry_id=1_ors8hblh&flashvars[streamerType]=auto&amp;flashvars[localizationCode]=en_US&amp;flashvars[forceMobileHTML5]=true&amp;flashvars[scrubber.sliderPreview]=false&amp;flashvars[Kaltura.addCrossoriginToIframe]=true&amp;&wid=1_zvr62kj3\" width=\"640\" height=\"610\" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow=\"autoplay *; fullscreen *; encrypted-media *\" sandbox=\"allow-downloads allow-forms allow-same-origin allow-scripts allow-top-navigation allow-pointer-lock allow-popups allow-modals allow-orientation-lock allow-popups-to-escape-sandbox allow-presentation allow-top-navigation-by-user-activation\" frameborder=\"0\" title=\"BANA 6043 : Webinar -  Data Reshaping with Pandas in Python\"></iframe>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "For this exercise, we're going to work with this data set from this paper by [Reeves, et al.](https://www.cell.com/developmental-cell/fulltext/S1534-5807(11)00573-9?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS1534580711005739%3Fshowall%3Dtrue) in which they measured the width of the gradient in the morphogen Dorsal in Drosophila embryos for various genotypes using different method. Don't get hung up in what this means, our object is to simply tidy this dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">wt</th>\n",
       "      <th colspan=\"3\" halign=\"left\">dl1/+; dl-venus/+</th>\n",
       "      <th colspan=\"3\" halign=\"left\">dl1/+; dl-gfp/+</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>wholemounts</th>\n",
       "      <th>cross-sections</th>\n",
       "      <th>anti-Dorsal</th>\n",
       "      <th>anti-Venus</th>\n",
       "      <th>Venus (live)</th>\n",
       "      <th>anti-Dorsal</th>\n",
       "      <th>anti-GFP</th>\n",
       "      <th>GFP (live)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.1288</td>\n",
       "      <td>0.1327</td>\n",
       "      <td>0.1482</td>\n",
       "      <td>0.1632</td>\n",
       "      <td>0.1666</td>\n",
       "      <td>0.2248</td>\n",
       "      <td>0.2389</td>\n",
       "      <td>0.2412</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.1554</td>\n",
       "      <td>0.1457</td>\n",
       "      <td>0.1503</td>\n",
       "      <td>0.1671</td>\n",
       "      <td>0.1753</td>\n",
       "      <td>0.1891</td>\n",
       "      <td>0.2035</td>\n",
       "      <td>0.1942</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.1306</td>\n",
       "      <td>0.1447</td>\n",
       "      <td>0.1577</td>\n",
       "      <td>0.1704</td>\n",
       "      <td>0.1705</td>\n",
       "      <td>0.1705</td>\n",
       "      <td>0.1943</td>\n",
       "      <td>0.2186</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.1413</td>\n",
       "      <td>0.1282</td>\n",
       "      <td>0.1711</td>\n",
       "      <td>0.1779</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.1735</td>\n",
       "      <td>0.2000</td>\n",
       "      <td>0.2104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.1557</td>\n",
       "      <td>0.1487</td>\n",
       "      <td>0.1342</td>\n",
       "      <td>0.1483</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.2135</td>\n",
       "      <td>0.2560</td>\n",
       "      <td>0.2463</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.1466</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>148</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.1671</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.1265</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>150</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.1448</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>151</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.1740</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>152 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             wt                dl1/+; dl-venus/+                          \\\n",
       "    wholemounts cross-sections       anti-Dorsal anti-Venus Venus (live)   \n",
       "0        0.1288         0.1327            0.1482     0.1632       0.1666   \n",
       "1        0.1554         0.1457            0.1503     0.1671       0.1753   \n",
       "2        0.1306         0.1447            0.1577     0.1704       0.1705   \n",
       "3        0.1413         0.1282            0.1711     0.1779          NaN   \n",
       "4        0.1557         0.1487            0.1342     0.1483          NaN   \n",
       "..          ...            ...               ...        ...          ...   \n",
       "147         NaN         0.1466               NaN        NaN          NaN   \n",
       "148         NaN         0.1671               NaN        NaN          NaN   \n",
       "149         NaN         0.1265               NaN        NaN          NaN   \n",
       "150         NaN         0.1448               NaN        NaN          NaN   \n",
       "151         NaN         0.1740               NaN        NaN          NaN   \n",
       "\n",
       "    dl1/+; dl-gfp/+                      \n",
       "        anti-Dorsal anti-GFP GFP (live)  \n",
       "0            0.2248   0.2389     0.2412  \n",
       "1            0.1891   0.2035     0.1942  \n",
       "2            0.1705   0.1943     0.2186  \n",
       "3            0.1735   0.2000     0.2104  \n",
       "4            0.2135   0.2560     0.2463  \n",
       "..              ...      ...        ...  \n",
       "147             NaN      NaN        NaN  \n",
       "148             NaN      NaN        NaN  \n",
       "149             NaN      NaN        NaN  \n",
       "150             NaN      NaN        NaN  \n",
       "151             NaN      NaN        NaN  \n",
       "\n",
       "[152 rows x 8 columns]"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv(\"../data/reeves_gradient_width_various_methods.csv\", comment='#', header=[0,1])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As can happen with spreadsheets, we have a multiindex, where we have three main groups:\n",
    "\n",
    "* `wt` which refers to wild type\n",
    "* `dl1/+; dl-venus/+` which we'll refer to as simply Venus \n",
    "* `\tdl1/+; dl-gfp/+` which we'll refer to as simply GFP\n",
    "\n",
    "For each of these main groups we have multiple sub-columns: two for wild type (`wholemounts`, `cross-sections`), three for Venus (`anti-Dorsal`, `Anti-Venus`, `Venus (live)`), and three for GFP (`anti-Dorsal`, `anti-GFP`, `GFP (live)`). The rows here are the gradient width values recorded for each of the categories. Clearly these data are not tidy.\n",
    "\n",
    "For this exercise your objective is to:\n",
    "\n",
    "1. Reshape this data so that it looks like the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>genotype</th>\n",
       "      <th>method</th>\n",
       "      <th>gradient width</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>wt</td>\n",
       "      <td>wholemounts</td>\n",
       "      <td>0.1288</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>wt</td>\n",
       "      <td>wholemounts</td>\n",
       "      <td>0.1554</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>wt</td>\n",
       "      <td>wholemounts</td>\n",
       "      <td>0.1306</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>wt</td>\n",
       "      <td>wholemounts</td>\n",
       "      <td>0.1413</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>wt</td>\n",
       "      <td>wholemounts</td>\n",
       "      <td>0.1557</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1211</th>\n",
       "      <td>dl1/+; dl-gfp/+</td>\n",
       "      <td>GFP (live)</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1212</th>\n",
       "      <td>dl1/+; dl-gfp/+</td>\n",
       "      <td>GFP (live)</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1213</th>\n",
       "      <td>dl1/+; dl-gfp/+</td>\n",
       "      <td>GFP (live)</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1214</th>\n",
       "      <td>dl1/+; dl-gfp/+</td>\n",
       "      <td>GFP (live)</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1215</th>\n",
       "      <td>dl1/+; dl-gfp/+</td>\n",
       "      <td>GFP (live)</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1216 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             genotype       method  gradient width\n",
       "0                  wt  wholemounts          0.1288\n",
       "1                  wt  wholemounts          0.1554\n",
       "2                  wt  wholemounts          0.1306\n",
       "3                  wt  wholemounts          0.1413\n",
       "4                  wt  wholemounts          0.1557\n",
       "...               ...          ...             ...\n",
       "1211  dl1/+; dl-gfp/+   GFP (live)             NaN\n",
       "1212  dl1/+; dl-gfp/+   GFP (live)             NaN\n",
       "1213  dl1/+; dl-gfp/+   GFP (live)             NaN\n",
       "1214  dl1/+; dl-gfp/+   GFP (live)             NaN\n",
       "1215  dl1/+; dl-gfp/+   GFP (live)             NaN\n",
       "\n",
       "[1216 rows x 3 columns]"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "expected_result = pd.read_csv('../data/tidy_reeves_gradients.csv')\n",
    "expected_result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2\\. Now that you have a tidy data frame you will notice that you have many `NaN`s in the `gradient width` column because there were many of them in the data set. Drop all observations that contain `NaN` values.\n",
    "\n",
    "3\\. Now compute summary statistics via `.describe()` for the `gradient width` variable grouped by `genotype` and `method`. Which `genotype` and `method` has the narrowest `gradient width`?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Computing environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%load_ext watermark\n",
    "%watermark -v -p jupyterlab,pandas"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "fd8d25807323b6a73e4e6e484dd361ea1af80a43f200f9cf712ca91de587529f"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}