Lession 6a: Conditional statements#
Conditional statements are a key part of any programming language. Their purpose is to evaluate if some condition holds and then execute a particular task depending on the result. The most common conditional statements are if
, elif
, and else
statements. In this lesson we’ll cover each of these along with a few additional applications of conditional statements.
Video 🎥:
Get introduced to conditional statements with this intro video and then read the lesson that follows to learn more and to get some hands-on practice.
Learning objectives#
By the end of this lesson you will be able to:
Apply basic
if
statements along with additional multi-branching statements (i.e.if...elif...else
).Create Pythonic versions of switch statements.
Apply vectorized conditional statements on Pandas DataFrames.
Prerequisites#
Most of the examples in this lesson use base Python code without any modules; however, we will illustrate some examples of integrating control statements within Pandas. We also illustrate a couple examples that use Numpy.
import pandas as pd
import numpy as np
Also, most of the examples use toy data; however, when illustrating concepts integrated with Pandas we will use the Complete Journey transaction data:
from completejourney_py import get_data
df = get_data()['transactions']
if statement#
The conditional if
statement is used to test an expression. If the test_expression
is True
, the statement gets executed. But if it’s False
, nothing happens.
if test_expression:
statement
The following is an example that tests if a particular object is negative. Notice that there are no braces or “begin/end” delimiters around the block of code. Python uses the indentation to determine blocks of code. Consequently, you can include multiple lines in the if statement as long as they have the same indentation.
x = -8
if x < 0:
print('x contains a negative number')
x contains a negative number
# multiple lines in the statement are fine as long as they have the
# same indentation
if x < 0:
new_value = abs(x)
print(f'absolute value of x is {new_value}')
absolute value of x is 8
It is possible to write very short if
statements on one line. This can be useful in limited situations but as soon as your resulting statement become more verbose it is best practice to switch to a multi-line approach.
# single line approach
if x < 0: print('x contains a negative number')
x contains a negative number
Its helpful to remember that everything in Python has some form of truthiness. In fact, any nonzero number or nonempty object is True
. This allows you to evaluate the object directly:
# a conditional statement on an empty object is equivalent to False
empty_list = []
if empty_list:
print("since empty_list is False this won't exectute")
# a conditional statement on a non-empty object is equivalent to True
non_empty_list = ['not', 'empty']
if non_empty_list:
print("This list is not empty")
This list is not empty
Python uses and
and or
operators to evaluate multiple expressions. They always return a single True
or False
. Moreover, Python will stop evaluating multiple expressions as soon as the result is known.
x = -1
y = 4
if x < 0 or y < 0:
print('At least one of these objects are less than zero.')
At least one of these objects are less than zero.
if x < 0 and y < 0:
print('Both x and y or less than zero')
Knowledge check#
Tasks:
Fill in the following code chunk so that:
If month has value 1-9 the file name printed out will be “data/Month-0X.csv”
What happens if the month value is 10-12?
month = 4 if month ________ : print(f'data/Month-0{month}.csv')
Video 🎥:
Multiway branching#
Multiway branching is when we want to have multiple return statement options based on the input conditions. The general form of multiway branch if
statements is as follows.
Note
The elif
block is not necessary. If you want only two output branches then just use if
followed by else
. However, if you have many branches, you can use as many elif
statements as necessary.
if test_1:
statement_1
elif test2:
statement_2
else:
statement_3
The following illustrates with a simple example. Python will perform this code in sequence and execute the statements nested under the first test that is True
or the else
if all tests are False
.
x = 22.50
if 0 <= x < 10:
print('low')
elif 10 <= x < 20:
print('medium-low')
elif 20 <= x < 30:
print('medium')
else:
print('preferred')
medium
Knowledge check#
Tasks:
Fill in the following code chunk so that:
if month has value 1-9 the file name printed out will be “data/month-0X.csv”
if month has value 10-12 the file name printed out will be “data/month-1X.csv”
if month is an invalid month number (not 1-12), the result printed out is “Invalid month”
test it out for when month equals 6, 10, & 13
month = 4 if month ________: print(f'data/Month-0{month}.csv') _____: print(f'data/Month-{month}.csv') _____: print('________')
Video 🎥:
Switch statements#
Many other languages have a switch
or case
statement that allows you to evaluate an expression and return the statement that aligns with the value. For example, in R, the switch
statement looks like the following:
choice <- 'ham'
switch(choice,
'spam' = 1.25,
'ham' = 1.99,
'eggs' = 0.99,
'bacon' = 1.10,
)
## [1] 1.99
Python does not have switch
statement but has some handy alternatives. In the most basic approach, you could just use a multiway branch if
statement:
choice = 'ham'
if choice == 'spam':
print(1.25)
elif choice == 'ham':
print(1.99)
elif choice == 'eggs':
print(0.99)
elif choice == 'bacon':
print(1.10)
else:
print('Bad choice')
1.99
However, this approach is a bit verbose. An efficient alternative is to use a dictionary that provides the same key-value matching as a switch
statement.
options = {'spam': 1.25, 'ham': 1.99, 'eggs': 0.99, 'bacon': 1.10}
You can either index this dictionary for the matching key:
options[choice]
1.99
Or, a more trustworthy approach is to use the get()
method. This allows you to provide a default response in the case that the key you are looking for is not in the dictionary
Tip
Using the get()
method allows you to supply a value to provide if there is no matching key (or as in if
statements if there are no other conditions that equate to True
).
options.get(choice, 'Bad choice')
1.99
choice = 'broccoli'
options.get(choice, 'Bad choice')
'Bad choice'
Dictionaries are good for associating values with keys, but what about the more complicated actions you can code in the statement blocks associated with if
statements? Fortunately, dictionaries can also hold functions (both named functions and lambda functions) which can allow you to perform more sophisticated switch-like execution.
Note
Don’t worry, you will learn more about functions in an upcoming lesson.
def produce_revenue(sqft, visits, trend):
total = 9.91 * sqft * visits * trend
return round(total, 2)
def frozen_revenue(sqft, visits, trend):
prod = produce_revenue(sqft, visits, trend)
total = 3.28 * sqft * visits * trend - prod * .005
return round(total, 2)
expected_annual_revenue = {
'produce': produce_revenue,
'frozen': frozen_revenue,
'pharmacy': lambda: 16.11 * visits * trend
}
choice = 'frozen'
expected_annual_revenue.get(choice, 'Bad choice')(sqft=937, visits=465, trend=0.98)
1379372.75
Knowledge check#
Tasks:
Convert the following multi-branch if-else
statement into a dict
where you get the month path file with path_files.get(month)
. In this case, which approach seems more reasonable?
```python
month = 4
if month <= 9:
print(f'data/Month-0{month}.csv')
elif month >= 10 and month <= 12:
print(f'data/Month-{month}.csv')
else:
print('Invalid month')
Video 🎥:
Applying in Pandas#
When data mining, we often want to perform conditional statements to not only filter observations, but also to create new variables. For example, say we want to create a new variable that classifies transactions above $10 as “high value” otherwise they are “low value”. There are several methods we can use to perform this but a simple one is to use the apply
method:
df['value'] = df['sales_value'].apply(lambda x: 'high value' if x > 10 else 'low value')
df.head()
household_id | store_id | basket_id | product_id | quantity | sales_value | retail_disc | coupon_disc | coupon_match_disc | week | transaction_timestamp | value | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 900 | 330 | 31198570044 | 1095275 | 1 | 0.50 | 0.00 | 0.0 | 0.0 | 1 | 2017-01-01 11:53:26 | low value |
1 | 900 | 330 | 31198570047 | 9878513 | 1 | 0.99 | 0.10 | 0.0 | 0.0 | 1 | 2017-01-01 12:10:28 | low value |
2 | 1228 | 406 | 31198655051 | 1041453 | 1 | 1.43 | 0.15 | 0.0 | 0.0 | 1 | 2017-01-01 12:26:30 | low value |
3 | 906 | 319 | 31198705046 | 1020156 | 1 | 1.50 | 0.29 | 0.0 | 0.0 | 1 | 2017-01-01 12:30:27 | low value |
4 | 906 | 319 | 31198705046 | 1053875 | 2 | 2.78 | 0.80 | 0.0 | 0.0 | 1 | 2017-01-01 12:30:27 | low value |
df.groupby('value').size()
value
high value 45265
low value 1424042
dtype: int64
An alternative, and much faster approach is to use np.where()
, which requires numpy to be loaded. np.where
has been show to be over 2.5 times faster than apply()
:
df['value'] = np.where(df['sales_value'] > 10, 'high value', 'low value')
df.head()
household_id | store_id | basket_id | product_id | quantity | sales_value | retail_disc | coupon_disc | coupon_match_disc | week | transaction_timestamp | value | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 900 | 330 | 31198570044 | 1095275 | 1 | 0.50 | 0.00 | 0.0 | 0.0 | 1 | 2017-01-01 11:53:26 | low value |
1 | 900 | 330 | 31198570047 | 9878513 | 1 | 0.99 | 0.10 | 0.0 | 0.0 | 1 | 2017-01-01 12:10:28 | low value |
2 | 1228 | 406 | 31198655051 | 1041453 | 1 | 1.43 | 0.15 | 0.0 | 0.0 | 1 | 2017-01-01 12:26:30 | low value |
3 | 906 | 319 | 31198705046 | 1020156 | 1 | 1.50 | 0.29 | 0.0 | 0.0 | 1 | 2017-01-01 12:30:27 | low value |
4 | 906 | 319 | 31198705046 | 1053875 | 2 | 2.78 | 0.80 | 0.0 | 0.0 | 1 | 2017-01-01 12:30:27 | low value |
As our conditions get more complex, it often becomes useful to create a separate function and use apply
. This approach is probably the most legible; however, see this post for a much faster approach if you are working with significantly large data.
def flag(df):
if (df['quantity'] > 20) or (df['sales_value'] > 10):
return 'Large purchase'
elif (df['quantity'] > 10) or (df['sales_value'] > 5):
return 'Medium purchase'
elif (df['quantity'] > 0) or (df['sales_value'] > 0):
return 'Small purchase'
else:
return 'Alternative transaction'
df['purchase_flag'] = df.apply(flag, axis = 1)
df.head()
household_id | store_id | basket_id | product_id | quantity | sales_value | retail_disc | coupon_disc | coupon_match_disc | week | transaction_timestamp | value | purchase_flag | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 900 | 330 | 31198570044 | 1095275 | 1 | 0.50 | 0.00 | 0.0 | 0.0 | 1 | 2017-01-01 11:53:26 | low value | Small purchase |
1 | 900 | 330 | 31198570047 | 9878513 | 1 | 0.99 | 0.10 | 0.0 | 0.0 | 1 | 2017-01-01 12:10:28 | low value | Small purchase |
2 | 1228 | 406 | 31198655051 | 1041453 | 1 | 1.43 | 0.15 | 0.0 | 0.0 | 1 | 2017-01-01 12:26:30 | low value | Small purchase |
3 | 906 | 319 | 31198705046 | 1020156 | 1 | 1.50 | 0.29 | 0.0 | 0.0 | 1 | 2017-01-01 12:30:27 | low value | Small purchase |
4 | 906 | 319 | 31198705046 | 1053875 | 2 | 2.78 | 0.80 | 0.0 | 0.0 | 1 | 2017-01-01 12:30:27 | low value | Small purchase |
df.groupby('purchase_flag').size()
purchase_flag
Alternative transaction 8820
Large purchase 46730
Medium purchase 128361
Small purchase 1285396
dtype: int64
Video 🎥:
Here is a more thorough introduction to the apply
method; plus, you’ll also be introduced to the map
and applymap
methods.
Knowledge check#
Tasks:
Fill in the blanks below to assign each transaction to a power rating of 1, 2, 3, or 4 based on the sales_value
variable:
power_rating = 1: if
sales_value
< 25th percentilepower_rating = 2: if
sales_value
< 50th percentilepower_rating = 3: if
sales_value
< 75th percentilepower_rating = 4: if
sales_value
>= 75th percentile
Hint: use the .quantile(perc_value)
low = df['sales_value'].quantile(____)
med = df['sales_value'].quantile(____)
hig = df['sales_value'].quantile(____)
def power_rater(df):
if (df['sales_value'] < _____):
return ___
elif (df['sales_value'] < _____):
return ___
elif (df['sales_value'] < _____):
return ___
else:
return ___
df['power_rating'] = df.apply(power_rater, axis = 1)
Video 🎥:
Exercises#
Tasks:
Using the Complete Journey transactions
data:
Create a new column titled
total_disc
that is the sum of all discounts applied to each transaction.Create a new column
disc_rating
that classifies each transaction as:‘none’: if
total_disc
== 0‘low’: if
total_disc
< 25th percentile‘medium’: if
total_disc
< 75th percentile‘high’: if
total_disc
>= 75th percentile‘other’: for any transaction that doesn’t meet any of the above conditions
How many transactions are in each of the above
disc_rating
categories?
Computing environment#
%load_ext watermark
%watermark -v -p pandas,jupyterlab,completejourney_py
Python implementation: CPython
Python version : 3.12.4
IPython version : 8.26.0
pandas : 2.2.2
jupyterlab : 4.2.3
completejourney_py: 0.0.3