{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Lesson 3a: Subsetting data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "> “Every second of every day, our senses bring in way too much data than we can possibly process in our brains.”\n",
    ">\n",
    "> \\- Peter Diamandis, Founder of the X-Prize for human-AI collaboration\n",
    "\n",
    "When performing data analysis tasks, rarely do we want to use _all_ our data. Often, we want to focus in on specific variables of interest and/or observations of interest. This requires us to be able to subset our DataFrame in various ways, which is the emphasis of this lesson.\n",
    "\n",
    "## Learning objectives\n",
    "\n",
    "By the end of this lesson you'll be able to:\n",
    "\n",
    "* Differentiate between the different ways to subset DataFrames.\n",
    "* Select columns of a DataFrame.\n",
    "* Slice and filter specific rows of a DataFrame.\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "To illustrate selecting and filtering let's go ahead and load the pandas library and import some data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>N102UW</td>\n",
       "      <td>1998.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>N103US</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>N104UW</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>N10575</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145LR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  tailnum    year                     type      manufacturer      model  \\\n",
       "0  N10156  2004.0  Fixed wing multi engine           EMBRAER  EMB-145XR   \n",
       "1  N102UW  1998.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "2  N103US  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "3  N104UW  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "4  N10575  2002.0  Fixed wing multi engine           EMBRAER  EMB-145LR   \n",
       "\n",
       "   engines  seats  speed     engine  \n",
       "0        2     55    NaN  Turbo-fan  \n",
       "1        2    182    NaN  Turbo-fan  \n",
       "2        2    182    NaN  Turbo-fan  \n",
       "3        2    182    NaN  Turbo-fan  \n",
       "4        2     55    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "planes_df = pd.read_csv('../data/planes.csv')\n",
    "planes_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Subsetting dimensions\n",
    "\n",
    "```{admonition} Video 🎥:\n",
    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/SJDr1kdBsTA?si=bLP-ZXTNH_MyrlVr\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen></iframe>\n",
    "```\n",
    "\n",
    "We don't always want all of the data in a DataFrame, so we need to take subsets of the DataFrame. In general, **subsetting** is extracting a small portion of a DataFrame -- making the DataFrame smaller. Since the DataFrame is two-dimensional, there are two dimensions on which to subset.\n",
    "\n",
    "**Dimension 1:** We may only want to consider certain *variables*. For example, we may only care about the `year` and `engines` variables:\n",
    "\n",
    "[![](../images/selecting_columns.png)](https://github.com/bradleyboehmke/uc-bana-6043/blob/main/book/images/selecting_columns.png?raw=true)\n",
    "\n",
    "We call this **selecting** columns/variables -- this is similar to SQL's `SELECT` or R's dplyr package's `select()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "**Dimension 2:** We may only want to consider certain *cases*. For example, we may only care about the cases where the manufacturer is Embraer.\n",
    "\n",
    "[![](../images/selecting_rows.png)](https://github.com/bradleyboehmke/uc-bana-6043/blob/main/book/images/selecting_rows.png?raw=true)\n",
    "\n",
    "We call this **filtering** or **slicing** -- this is similar to SQL's `WHERE` or R's dplyr package's `filter()` or `slice()`. And we can combine these two options to subset in both dimensions -- the `year` and `engines` variables where the manufacturer is Embraer:\n",
    "\n",
    "[![](../images/selecting_rows_columns.png)](https://github.com/bradleyboehmke/uc-bana-6043/blob/main/book/images/selecting_rows_columns.png?raw=true)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "In the previous example, we want to do two things using `planes_df`:\n",
    "\n",
    "  1. **select** the `year` and `engines` variables\n",
    "  2. **filter** to cases where the manufacturer is Embraer\n",
    "\n",
    "But we also want to return a new DataFrame -- not just highlight certain cells. In other words, we want to turn this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>N102UW</td>\n",
       "      <td>1998.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>N103US</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>N104UW</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>N10575</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145LR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  tailnum    year                     type      manufacturer      model  \\\n",
       "0  N10156  2004.0  Fixed wing multi engine           EMBRAER  EMB-145XR   \n",
       "1  N102UW  1998.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "2  N103US  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "3  N104UW  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "4  N10575  2002.0  Fixed wing multi engine           EMBRAER  EMB-145LR   \n",
       "\n",
       "   engines  seats  speed     engine  \n",
       "0        2     55    NaN  Turbo-fan  \n",
       "1        2    182    NaN  Turbo-fan  \n",
       "2        2    182    NaN  Turbo-fan  \n",
       "3        2    182    NaN  Turbo-fan  \n",
       "4        2     55    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Into this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>engines</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2004.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     year  engines\n",
       "0  2004.0        2\n",
       "4  2002.0        2"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df.head().loc[planes_df['manufacturer'] == 'EMBRAER', ['year', 'engines']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "So we really have a third need: return the resulting DataFrame so we can continue our analysis:\n",
    "\n",
    "  1. **select** the `year` and `engines` variables\n",
    "  2. **filter** to cases where the manufacturer is Embraer\n",
    "  3. Return a DataFrame to continue the analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Subsetting variables\n",
    "\n",
    "Recall that the subsetting of variables/columns is called **selecting** variables/columns. In a simple example, we can select a single variable using bracket subsetting notation:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2004.0\n",
       "1    1998.0\n",
       "2    1999.0\n",
       "3    1999.0\n",
       "4    2002.0\n",
       "Name: year, dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df['year'].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Notice the `head()` method also works on `planes_df['year']` to return the first five elements."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "```{admonition} Question:\n",
    ":class: attention\n",
    "What is the data type of `planes_df['year']`?\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "This returns `pandas.core.series.Series`, referred to simply as a \"Series\", rather than a DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(planes_df['year'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "This is okay -- the Series is a popular data structure in Python. Recall from a previous lesson:\n",
    "\n",
    "* A Series is a one-dimensional data structure -- this is similar to a Python `list`\n",
    "* Note that all objects in a Series are usually of the same type (but this isn't a strict requirement)\n",
    "* Each DataFrame can be thought of as a list of equal-length Series (plus an Index)\n",
    "\n",
    "<center>\n",
    "<img src=\"https://github.com/bradleyboehmke/uc-bana-6043/blob/main/book/images/dataframe-series.png?raw=true\" alt=\"dataframe-series.png\" width=\"300\" height=\"300\">\n",
    "</center>\n",
    "\n",
    "Series can be useful, but for now, we are interested in *returning a DataFrame* rather than a series. We can select a single variable and return a DataFrame by still using bracket subsetting notation, but this time we will <span style=\"color: blue\">pass a <u>`list`</u> of variables names</span>:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2004.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1998.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1999.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1999.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2002.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     year\n",
       "0  2004.0\n",
       "1  1998.0\n",
       "2  1999.0\n",
       "3  1999.0\n",
       "4  2002.0"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df[['year']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "And we can see that we've returned a DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(planes_df[['year']].head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "```{admonition} Question:\n",
    ":class: attention\n",
    "What's another advantage of this passing a `list`?\n",
    "```\n",
    "\n",
    "Passing a list into the bracket subsetting notation allows us to select multiple variables at once:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>engines</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2004.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1998.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1999.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1999.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     year  engines\n",
       "0  2004.0        2\n",
       "1  1998.0        2\n",
       "2  1999.0        2\n",
       "3  1999.0        2\n",
       "4  2002.0        2"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df[['year', 'engines']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "In another example, assume we are interested in the `model` of plane, number of `seats` and `engine` type:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>model</th>\n",
       "      <th>seats</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>55</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>A320-214</td>\n",
       "      <td>182</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>A320-214</td>\n",
       "      <td>182</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>A320-214</td>\n",
       "      <td>182</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>EMB-145LR</td>\n",
       "      <td>55</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       model  seats     engine\n",
       "0  EMB-145XR     55  Turbo-fan\n",
       "1   A320-214    182  Turbo-fan\n",
       "2   A320-214    182  Turbo-fan\n",
       "3   A320-214    182  Turbo-fan\n",
       "4  EMB-145LR     55  Turbo-fan"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df[['model', 'seats', 'engine']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Knowledge check\n",
    "\n",
    "```{admonition} Questions:\n",
    ":class: attention\n",
    "\n",
    "1. ______ is a common term for subsetting DataFrame variables.\n",
    "2. What type of object is a DataFrame column?\n",
    "3. What will be returned by the following code?\n",
    "\n",
    "   ```python\n",
    "   planes_df['type', 'model']\n",
    "   ```  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{admonition} Video 🎥:\n",
    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/1l19iK1DzoE?si=Mzt73NBLcQOJ_FY_\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen></iframe>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Subsetting rows\n",
    "\n",
    "When we subset rows (aka cases, records, observations) we primarily use two names: **slicing** and **filtering**, but *these are not the same*:\n",
    "\n",
    "  * **slicing**, similar to row **indexing**, subsets observations by the value of the Index\n",
    "  * **filtering** subsets observations using a conditional test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Slicing rows\n",
    "\n",
    "Remember that all DataFrames have an Index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>N102UW</td>\n",
       "      <td>1998.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>N103US</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>N104UW</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>N10575</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145LR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  tailnum    year                     type      manufacturer      model  \\\n",
       "0  N10156  2004.0  Fixed wing multi engine           EMBRAER  EMB-145XR   \n",
       "1  N102UW  1998.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "2  N103US  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "3  N104UW  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "4  N10575  2002.0  Fixed wing multi engine           EMBRAER  EMB-145LR   \n",
       "\n",
       "   engines  seats  speed     engine  \n",
       "0        2     55    NaN  Turbo-fan  \n",
       "1        2    182    NaN  Turbo-fan  \n",
       "2        2    182    NaN  Turbo-fan  \n",
       "3        2    182    NaN  Turbo-fan  \n",
       "4        2     55    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We can **slice** cases/rows using the values in the Index and bracket subsetting notation. It's common practice to use `.loc` to slice cases/rows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>N102UW</td>\n",
       "      <td>1998.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>N103US</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>N104UW</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>N10575</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145LR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>N105UW</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  tailnum    year                     type      manufacturer      model  \\\n",
       "0  N10156  2004.0  Fixed wing multi engine           EMBRAER  EMB-145XR   \n",
       "1  N102UW  1998.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "2  N103US  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "3  N104UW  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "4  N10575  2002.0  Fixed wing multi engine           EMBRAER  EMB-145LR   \n",
       "5  N105UW  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "\n",
       "   engines  seats  speed     engine  \n",
       "0        2     55    NaN  Turbo-fan  \n",
       "1        2    182    NaN  Turbo-fan  \n",
       "2        2    182    NaN  Turbo-fan  \n",
       "3        2    182    NaN  Turbo-fan  \n",
       "4        2     55    NaN  Turbo-fan  \n",
       "5        2    182    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df.loc[0:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "```{note}\n",
    "Note that since this is ***not*** \"indexing\", the last element is inclusive.\n",
    "```\n",
    "\n",
    "We can also pass a `list` of Index values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>N103US</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>N10575</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145LR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>N107US</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>N109UW</td>\n",
       "      <td>1999.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>AIRBUS INDUSTRIE</td>\n",
       "      <td>A320-214</td>\n",
       "      <td>2</td>\n",
       "      <td>182</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  tailnum    year                     type      manufacturer      model  \\\n",
       "0  N10156  2004.0  Fixed wing multi engine           EMBRAER  EMB-145XR   \n",
       "2  N103US  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "4  N10575  2002.0  Fixed wing multi engine           EMBRAER  EMB-145LR   \n",
       "6  N107US  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "8  N109UW  1999.0  Fixed wing multi engine  AIRBUS INDUSTRIE   A320-214   \n",
       "\n",
       "   engines  seats  speed     engine  \n",
       "0        2     55    NaN  Turbo-fan  \n",
       "2        2    182    NaN  Turbo-fan  \n",
       "4        2     55    NaN  Turbo-fan  \n",
       "6        2    182    NaN  Turbo-fan  \n",
       "8        2    182    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df.loc[[0, 2, 4, 6, 8]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Filtering rows\n",
    "\n",
    "We can **filter** rows using a logical sequence equal in length to the number of rows in the DataFrame.\n",
    "\n",
    "Continuing our example, assume we want to determine whether each case's `manufacturer` is Embraer. We can use the `manufacturer` Series and a logical equivalency test to find the result for each row:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        True\n",
       "1       False\n",
       "2       False\n",
       "3       False\n",
       "4        True\n",
       "        ...  \n",
       "3317    False\n",
       "3318    False\n",
       "3319    False\n",
       "3320    False\n",
       "3321    False\n",
       "Name: manufacturer, Length: 3322, dtype: bool"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df['manufacturer'] == 'EMBRAER'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We can use this resulting logical sequence to test **filter** cases -- rows that are `True` will be returned while those that are `False` will be removed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>N10575</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145LR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>N11106</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>N11107</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>N11109</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   tailnum    year                     type manufacturer      model  engines  \\\n",
       "0   N10156  2004.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "4   N10575  2002.0  Fixed wing multi engine      EMBRAER  EMB-145LR        2   \n",
       "10  N11106  2002.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "11  N11107  2002.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "12  N11109  2002.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "\n",
       "    seats  speed     engine  \n",
       "0      55    NaN  Turbo-fan  \n",
       "4      55    NaN  Turbo-fan  \n",
       "10     55    NaN  Turbo-fan  \n",
       "11     55    NaN  Turbo-fan  \n",
       "12     55    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df[planes_df['manufacturer'] == 'EMBRAER'].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "This also works with `.loc`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>N10575</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145LR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>N11106</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>N11107</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>N11109</td>\n",
       "      <td>2002.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   tailnum    year                     type manufacturer      model  engines  \\\n",
       "0   N10156  2004.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "4   N10575  2002.0  Fixed wing multi engine      EMBRAER  EMB-145LR        2   \n",
       "10  N11106  2002.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "11  N11107  2002.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "12  N11109  2002.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "\n",
       "    seats  speed     engine  \n",
       "0      55    NaN  Turbo-fan  \n",
       "4      55    NaN  Turbo-fan  \n",
       "10     55    NaN  Turbo-fan  \n",
       "11     55    NaN  Turbo-fan  \n",
       "12     55    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df.loc[planes_df['manufacturer'] == 'EMBRAER'].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Any conditional test can be used to **filter** DataFrame rows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>N11121</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>N11127</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>N11137</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>N11140</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   tailnum    year                     type manufacturer      model  engines  \\\n",
       "0   N10156  2004.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "15  N11121  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "16  N11127  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "17  N11137  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "18  N11140  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "\n",
       "    seats  speed     engine  \n",
       "0      55    NaN  Turbo-fan  \n",
       "15     55    NaN  Turbo-fan  \n",
       "16     55    NaN  Turbo-fan  \n",
       "17     55    NaN  Turbo-fan  \n",
       "18     55    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df.loc[planes_df['year'] > 2002].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "And multiple conditional tests can be combined using logical operators:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>N11121</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>N11127</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>N11137</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>N11140</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>N11150</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   tailnum    year                     type manufacturer      model  engines  \\\n",
       "15  N11121  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "16  N11127  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "17  N11137  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "18  N11140  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "19  N11150  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "\n",
       "    seats  speed     engine  \n",
       "15     55    NaN  Turbo-fan  \n",
       "16     55    NaN  Turbo-fan  \n",
       "17     55    NaN  Turbo-fan  \n",
       "18     55    NaN  Turbo-fan  \n",
       "19     55    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df.loc[(planes_df['year'] > 2002) & (planes_df['year'] < 2004)].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "```{note}\n",
    "Note that each condition is wrapped in parentheses -- this is required.\n",
    "```\n",
    "\n",
    "Often, as your condition gets more complex, it can be easier to read if you separate out the condition:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>N11121</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>N11127</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>N11137</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>N11140</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>N11150</td>\n",
       "      <td>2003.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   tailnum    year                     type manufacturer      model  engines  \\\n",
       "15  N11121  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "16  N11127  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "17  N11137  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "18  N11140  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "19  N11150  2003.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "\n",
       "    seats  speed     engine  \n",
       "15     55    NaN  Turbo-fan  \n",
       "16     55    NaN  Turbo-fan  \n",
       "17     55    NaN  Turbo-fan  \n",
       "18     55    NaN  Turbo-fan  \n",
       "19     55    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cond = (planes_df['year'] > 2002) & (planes_df['year'] < 2004)\n",
    "planes_df.loc[cond].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Knowledge check\n",
    "\n",
    "```{admonition} Questions:\n",
    ":class: attention\n",
    "\n",
    "1. What's the difference between **slicing** cases and **filtering** cases?\n",
    "2. Fill in the blanks to fix the following code to find planes that have more than three engines:\n",
    "\n",
    "   ```python\n",
    "    planes_df.loc[______['______'] > 3]\n",
    "   ```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{admonition} Video 🎥:\n",
    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/u174sBhrivQ?si=LwDGxnvMim8ny6nh\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen></iframe>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Selecting variables and filtering rows\n",
    "\n",
    "If we want to select variables and filter cases at the same time, we have a few options:\n",
    "\n",
    "1. Sequential operations\n",
    "2. Simultaneous operations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Sequential Operations\n",
    "\n",
    "We can use what we've previously learned to select variables and filter cases in multiple steps:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>engines</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2004.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      year  engines\n",
       "0   2004.0        2\n",
       "4   2002.0        2\n",
       "10  2002.0        2\n",
       "11  2002.0        2\n",
       "12  2002.0        2"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df_filtered = planes_df.loc[planes_df['manufacturer'] == 'EMBRAER']\n",
    "planes_df_filtered_and_selected = planes_df_filtered[['year', 'engines']]\n",
    "planes_df_filtered_and_selected.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "This is a good way to learn how to select and filter independently, and it also reads very clearly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Simultaneous operations\n",
    "\n",
    "However, we can also do both selecting and filtering in a single step with `.loc`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>engines</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2004.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      year  engines\n",
       "0   2004.0        2\n",
       "4   2002.0        2\n",
       "10  2002.0        2\n",
       "11  2002.0        2\n",
       "12  2002.0        2"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df.loc[planes_df['manufacturer'] == 'EMBRAER', ['year', 'engines']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "This option is more succinct and also reduces programming time. As before, as your filtering and selecting conditions get longer and/or more complex, it can make it easier to read to break it up into separate lines:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>engines</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2004.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>2002.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      year  engines\n",
       "0   2004.0        2\n",
       "4   2002.0        2\n",
       "10  2002.0        2\n",
       "11  2002.0        2\n",
       "12  2002.0        2"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rows = planes_df['manufacturer'] == 'EMBRAER'\n",
    "cols = ['year', 'engines']\n",
    "planes_df.loc[rows, cols].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Knowledge check\n",
    "\n",
    "```{admonition} Question:\n",
    ":class: attention\n",
    "Subset `planes_df` to only include planes made by Boeing and the `seats` and `model` variables.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{admonition} Video 🎥:\n",
    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/gfjn9jc09dg?si=6hwVJp4sLR81wHRQ\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen></iframe>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Views vs copies\n",
    "\n",
    "One thing to be aware of, as you will likely experience it eventually, is the concept of returning a ***view*** (“looking” at a part of an existing object) versus a ***copy*** (making a new copy of the object in memory). This can be a bit abstract and even [this section in the Pandas docs](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy) states _\"...it’s very hard to predict whether it will return a view or a copy.\"_\n",
    "\n",
    "```{tip}\n",
    "The concept of views and copies is confusing and you can read more about it [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy). \n",
    "```\n",
    "\n",
    "The main takeaway is that the most common warning you’ll encounter in Pandas is the `SettingWithCopyWarning`; Pandas raises it as a warning that you might not be doing what you think you’re doing or because the operation you are performing may behave unpredictably.\n",
    "\n",
    "Let's look at an example. Say the number of seats on this particular plane was recorded incorrectly. Instead of 55 seats it should actually be 60 seats."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  tailnum    year                     type manufacturer      model  engines  \\\n",
       "0  N10156  2004.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "\n",
       "   seats  speed     engine  \n",
       "0     55    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tailnum_of_interest = planes_df['tailnum'] == 'N10156'\n",
    "planes_df[tailnum_of_interest]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead of using `.iloc`, we could actually filter and select this element in our DataFrame with the following bracket notation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    55\n",
       "Name: seats, dtype: int64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df[tailnum_of_interest]['seats']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we use this approach to then assign our new value to this element we'll get a `SettingWithCopyWarning`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/8f/c06lv6q17tjbyjv2nkt0_s4s1sh0tg/T/ipykernel_39982/2190037627.py:1: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  planes_df[tailnum_of_interest]['seats'] = 60\n"
     ]
    }
   ],
   "source": [
    "planes_df[tailnum_of_interest]['seats'] = 60"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So what's going on? Did our DataFrame get changed?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  tailnum    year                     type manufacturer      model  engines  \\\n",
       "0  N10156  2004.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "\n",
       "   seats  speed     engine  \n",
       "0     55    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df[tailnum_of_interest]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "No it didn’t, even though you probably thought it did. What happened above is that `planes_df[tailnum_of_interest]['seats']` was executed first and returned a copy of the DataFrame, which is an entirely different object. We can confirm by using `id()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The id of the original dataframe is: 4621780544\n",
      " The id of the indexed dataframe is: 4622373600\n"
     ]
    }
   ],
   "source": [
    "print(f\"The id of the original dataframe is: {id(planes_df)}\")\n",
    "print(f\" The id of the indexed dataframe is: {id(planes_df[tailnum_of_interest])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then tried to set a value on this new object by appending `['seats'] = 60`. Pandas is warning us that we are doing that operation on a copy of the original dataframe, which is probably not what we want. To fix this, you need to index in a single go, using `.loc[]` for example:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "planes_df.loc[tailnum_of_interest, 'seats'] = 60"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "No error this time! And let’s confirm the change:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tailnum</th>\n",
       "      <th>year</th>\n",
       "      <th>type</th>\n",
       "      <th>manufacturer</th>\n",
       "      <th>model</th>\n",
       "      <th>engines</th>\n",
       "      <th>seats</th>\n",
       "      <th>speed</th>\n",
       "      <th>engine</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>N10156</td>\n",
       "      <td>2004.0</td>\n",
       "      <td>Fixed wing multi engine</td>\n",
       "      <td>EMBRAER</td>\n",
       "      <td>EMB-145XR</td>\n",
       "      <td>2</td>\n",
       "      <td>60</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Turbo-fan</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  tailnum    year                     type manufacturer      model  engines  \\\n",
       "0  N10156  2004.0  Fixed wing multi engine      EMBRAER  EMB-145XR        2   \n",
       "\n",
       "   seats  speed     engine  \n",
       "0     60    NaN  Turbo-fan  "
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planes_df[tailnum_of_interest]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{tip}\n",
    "When in doubt, always use `.loc[]` for combined filtering and selecting!\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Exercises\n",
    "\n",
    "```{admonition} Questions:\n",
    ":class: attention\n",
    "1. Import the heart.csv\n",
    "2. Select the age column and return a Series. Select the age column and return a DataFrame. Select the age, sex, and max_hr columns.\n",
    "3. Slice the DataFrame to get the first 25 rows and save as `first_25`. What is the age of the last observation in this sliced DataFrame?\n",
    "4. Using the original DataFrame (not the sliced DataFrame), filter for all observations where the person is 50 years or older. How many observations are there?\n",
    "5. Using the original DataFrame, filter for those observations that are male and 50 years or older. How many observations are there.\n",
    "6. Using the original DataFrame, filter for those observations that are female, 50 years or younger, and have the disease (disease = 1). Select `chest_pain`, `chol`, and `max_hr` columns. How many rows and columns are in the resulting DataFrame?\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Computing environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python implementation: CPython\n",
      "Python version       : 3.9.4\n",
      "IPython version      : 7.26.0\n",
      "\n",
      "jupyterlab: 3.1.4\n",
      "pandas    : 1.2.4\n",
      "\n"
     ]
    }
   ],
   "source": [
    "%load_ext watermark\n",
    "%watermark -v -p jupyterlab,pandas"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "interpreter": {
   "hash": "fd8d25807323b6a73e4e6e484dd361ea1af80a43f200f9cf712ca91de587529f"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  },
  "rise": {
   "autolaunch": true,
   "transition": "none"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}