Lesson 6b: Iteration statement#
Often, we need to execute repetitive code statements a particular number of times. Or, we may even need to execute code for an undetermined number of times until a certain condition no longer holds. There are multiple ways we can achieve this and in this lesson we will cover several of the more common approaches to perform iteration.
Learning objectives#
By the end of this lesson you will be able to:
Apply
for
andwhile
loops to execute repetitive code statements.Incorporate
break
andcontinue
to control looping statements.Explain what a list comprehension is and implement variations of them.
Discuss the concept of iterables.
Video 🎥:
First, check out this video for a simple introduction to for
and while
loops. Then move on to the lesson that follows which will reiterate and build upon these basic concepts.
Prerequisites#
import glob
import os
import random
import pandas as pd
for
loop#
The for
loop is used to execute repetitive code statements for a particular number of times. The general syntax is provided below where i
is the counter and as i
assumes each sequential value the code in the body will be performed for that ith value.
# syntax of for loop
for i in sequence:
<do stuff here with i>
There are three main components of a for
loop to consider:
Sequence: The sequence represents each element in a list or tuple, each key-value pair in a dictionary, or each column in a DataFrame.
Body: apply some function(s) to the object we are iterating over.
Output: You must specify what to do with the result. This may include printing out a result or modifying the object in place.
For example, say we want to iterate N times, we can perform a for loop using the range()
function:
for number in range(10):
print(number)
0
1
2
3
4
5
6
7
8
9
We can add multiple lines to our for
loop; we just need to ensure that each line follows the same indentation patter:
for number in range(10):
squared = number * number
print(f'{number} squared = {squared}')
0 squared = 0
1 squared = 1
2 squared = 4
3 squared = 9
4 squared = 16
5 squared = 25
6 squared = 36
7 squared = 49
8 squared = 64
9 squared = 81
Rather than just print out some result, we can also assign the computation to an object. For example, say we wanted to assign the squared result in the previous for
loop to a dictionary where the key is the original number and the value is the squared value.
squared_values = {}
for number in range(10):
squared = number * number
squared_values[number] = squared
squared_values
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
Knowledge check#
We can see all data sets that we have in the “data/monthly_data” folder with glob.glob
:
monthly_data_files = glob.glob("../data/monthly_data/*")
monthly_data_files
['../data/monthly_data/Month-08.csv',
'../data/monthly_data/Month-09.csv',
'../data/monthly_data/Month-01.csv',
'../data/monthly_data/Month-02.csv',
'../data/monthly_data/Month-03.csv',
'../data/monthly_data/Month-07.csv',
'../data/monthly_data/Month-06.csv',
'../data/monthly_data/Month-10.csv',
'../data/monthly_data/Month-04.csv',
'../data/monthly_data/Month-05.csv',
'../data/monthly_data/Month-11.csv']
If you wanted to get just the file name from the string path we can use os.path.basename
:
file_name = os.path.basename(monthly_data_files[0])
file_name
'Month-08.csv'
And if we wanted to just get the name minus the file extension we can apply some simple string indexing to remove the last four characters (.csv
):
file_name[:-4]
'Month-08'
Tasks:
Use this knowledge to:
Create an empty dictionary called
monthly_data
.Loop over
monthly_data_files
and assign the file name as the dictionary key and assign the file path as the value.Loop over
monthly_data_files
and assign the file name as the dictionary key, import the data withpd.read_csv()
and assign the imported DataFrame as the value in the dictionary.
Video 🎥:
Controlling sequences#
There are two ways to control the progression of a loop:
continue
: terminates the current iteration and advances to the next.break
: exits the entire for loop.
Both are used in conjunction with if statements. For example, this for loop will iterate for each element in year
; however, when it gets to the element that equals the year of covid
(2020) it will break
out and end the for loop process.
# range will produce numbers starting at 2018 and up to but not include 2023
years = range(2018, 2023)
list(years)
[2018, 2019, 2020, 2021, 2022]
covid = 2020
for year in years:
if year == covid: break
print(year)
2018
2019
The continue
argument is useful when we want to skip the current iteration of a loop without terminating it. On encountering continue
, the Python parser skips further evaluation and starts the next iteration of the loop. In this example, the for loop will iterate for each element in year; however, when it gets to the element that equals covid it will skip the rest of the code execution simply jump to the next iteration.
for year in years:
if year == covid: continue
print(year)
2018
2019
2021
2022
Knowledge check#
Tasks:
Modify the following for loop
with a continue
or break
statement to:
only import Month-01 through Month-07
only import Month-08 through Month-10
monthly_data_files = glob.glob("../data/monthly_data/*")
monthly_data = {}
for file in monthly_data_files:
file_name = os.path.basename(file)[:-4]
monthly_data[file_name] = pd.read_csv(file)
Video 🎥:
List comprehensions#
List comprehensions offer a shorthand syntax for for
loops and are very common in the Python community. Although a little odd at first, the way to think of list comprehensions is as a backward for
loop where we state the expression first, and then the sequence.
# syntax of for loop
for i in sequence:
expression
# syntax for a list comprehension
[expression for i in sequence]
Often, we’ll see a pattern like the following where we:
create an empty object (list in this example)
loop over an object and perform some computation
save the result to the empty object
squared_values = []
for number in range(5):
squared = number * number
squared_values.append(squared)
squared_values
[0, 1, 4, 9, 16]
A list comprehension allows us to condense this pattern to a single line:
squared_values = [number * number for number in range(5)]
squared_values
[0, 1, 4, 9, 16]
List comprehensions even allow us to add conditional statements. For example, here we use a conditional statement to skip even numbers:
squared_odd_values = [number * number for number in range(10) if number % 2 != 0]
squared_odd_values
[1, 9, 25, 49, 81]
For more complex conditional statements, or if the list comprehension gets a bit long, we can use multiple lines to make it easier to digest:
squared_certain_values = [
number * number for number in range(10)
if number % 2 != 0 and number != 5
]
squared_certain_values
[1, 9, 49, 81]
There are other forms of comprehensions as well. For example, we can perform a dictionary comprehension where we follow the same patter; however, we use dict brackets ({
) instead of list brackets ([
):
squared_values_dict = {number: number*number for number in range(10)}
squared_values_dict
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
Video 🎥:
Check out this video that provides more discussion and examples of using comprehensions.
Knowledge check#
Tasks:
Re-write the following for
loop using a dictionary comprehension:
monthly_data_files = glob.glob("../data/monthly_data/*")
monthly_data = {}
for file in monthly_data_files:
file_name = os.path.basename(file)[:-4]
monthly_data[file_name] = pd.read_csv(file)
Video 🎥:
while
loop#
We may not always know how many iterations we need to make. Rather, we simply want to perform some task while a particular condition exists. This is the job of a while
loop. A while
loop follows the same logic as a for
loop, except, rather than specify a sequence we want to specify a condition that will determine how many iterations.
# syntax of for loop
while condition_holds:
<do stuff here with i>
For example, the probability of flipping 10 coins and getting all heads or tails is \((\frac{1}{2})^{10} = 0.0009765625\) (1 in 1024 tries). Let’s implement this and see how many times it’ll take to accomplish this feat.
The following while
statement will check if the number of unique values for 10 flips are 1, which implies that we flipped all heads or tails. If it is not equal to 1 then we repeat the process of flipping 10 coins and incrementing the number of tries. When our condition statement ten_of_a_kind == True
then our while loop will stop.
# create a coin
coin = ['heads', 'tails']
# we'll use this to track how many tries it takes to get 10 heads or 10 tails
n_tries = 0
# signals if we got 10 heads or 10 tails
ten_of_a_kind = False
while not ten_of_a_kind:
# flip coin 10 times
ten_coin_flips = [random.choice(coin) for flip in range(11)]
# check if there
ten_of_a_kind = len(set(ten_coin_flips)) == 1
# add iteration to counter
n_tries += 1
print(f'After {n_tries} flips: {ten_coin_flips}')
After 2008 flips: ['heads', 'heads', 'heads', 'heads', 'heads', 'heads', 'heads', 'heads', 'heads', 'heads', 'heads']
Knowledge check#
Tasks:
An elementary example of a random walk is the random walk on the integer number line, \(Z\), which starts at 0 and at each step moves +1 or −1 with equal probability.
Fill in the incomplete code chunk below to perform a random walk starting at value 0, with each step either adding or subtracting 1. Have your random walk stop if the value it exceeds 100 or if the number of steps taken exceeds 10,000.
value = 0
n_tries = 0
exceeds_100 = False
while not exceeds_100 or _______:
# randomly add or subtract 1
random_value = random.choice([-1, 1])
value += _____
# check if value exceeds 100
exceeds_100 = ______
# add iteration to counter
n_tries += _____
print(f'The final value was {value} after {n_tries} iterations.')
Video 🎥:
Iterables#
Python strongly leverages the concept of iterable objects. An object is considered iterable if it is either a physically stored sequence, or an object that produces one result at a time in the context of an interation tool like a for
loop. Up to this point, our example looping structures have primarily iterated over a DataFrame or a list.
When our for
loop iterates over a DataFrame, underneath the hood it is first accessing the iterable object, and then iterating over each item. As the following illustrates, the default iterable components of a DataFrame are the columns:
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [3, 4, 5], 'col3': [6, 6, 6]})
I = df.__iter__() # access iterable object
print(next(I)) # first iteration
print(next(I)) # second iteration
print(next(I)) # third iteration
col1
col2
col3
When our for
loop iterates over a list, the same procedure unfolds. Note that when no more items are available to iterate over, a StopIteration
is thrown which signals to our for
loop that no more itertions should be performed.
names = ['Robert', 'Sandy', 'John', 'Patrick']
I = names.__iter__() # access iterable object
print(next(I)) # first iteration
print(next(I)) # second iteration
print(next(I)) # third iteration
print(next(I)) # fourth iteration
print(next(I)) # no more items
Robert
Sandy
John
Patrick
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
Cell In[18], line 8
6 print(next(I)) # third iteration
7 print(next(I)) # fourth iteration
----> 8 print(next(I)) # no more items
StopIteration:
Dictionaries and tuples are also iterable objects. Iterating over dictionary automatically returns one key at a time, which allows us to have the key and index for that key at the same time:
D = {'a':1, 'b':2, 'c':3}
I = D.__iter__() # access iterable object
print(next(I)) # first iteration
print(next(I)) # second iteration
print(next(I)) # third iteration
a
b
c
for key in D:
print(key, D[key])
a 1
b 2
c 3
Although using these iterables in a for loop is quite common, you will often see two other approaches which include the iterables range()
and enumerate()
. range is often used to generate indexes in a for loop but you can use it anywhere you need a series of integers. However, range is an iterable that generates items on demand:
values = range(5)
I = values.__iter__()
print(next(I))
print(next(I))
print(next(I))
0
1
2
So if you wanted to iterate over each column in our DataFrame, an alternative is to use range. In this example, range produces the numeric index for each column so we simply use that value to index for the column within the for loop:
unique_values = []
for col in range(len(df.columns)):
value = df.iloc[:, col].nunique()
unique_values.append(value)
unique_values
[3, 3, 1]
Another common iterator you will see is enumerate
. Actually, the enumerate
function returns a generator object, which also supports this iterator concept. The benefit of enumerate
is that it returns a (index, value) tuple each time through the loop:
E = enumerate(df) # access iterable object
print(next(E)) # first iteration
print(next(E)) # second iteration
print(next(E)) # third iteration
(0, 'col1')
(1, 'col2')
(2, 'col3')
The for
loop steps through these tuples automatically and allows us to unpack their values with tuple assignment in the header of the for
loop. In the following example, we unpack the tuples into the variables index
and col
and we can now use both of these values however necessary in a for loop.
for index, col in enumerate(df):
print(f'{index} - {col}')
0 - col1
1 - col2
2 - col3
Note
There are additional iterable objects that can be used in looping structures (i.e. zip, map); however, the ones discussed here are the most common you will come across and likely use.
Video 🎥:
Learn more about iterables and a similar, yet different concept – ‘iterators’ with this video.
Exercises#
Questions:
For the following list of names, write a list comprehension that creates a list of only words that start with a capital letter (hint:
str.isupper()
). Which names are included in the result?names = ['Steve Irwin', 'koala', 'kangaroo', 'Australia', 'Sydney', 'desert']
The Fibonacci Sequence is a series of numbers where the next number is found by adding up the two numbers before it. The first two numbers are 0 and 1. For example, 0, 1, 1, 2, 3, 5, 8, 13, 21. The next number in this series above is 13+21 = 34. Use a
for
loop to produce the first 25 numbers in the Fibanacci Sequence (0, 1, 1, 2, 3, 5, 8, 13, 21…)Create a
for
loop that sums the numbers from 0 through 100; however, skip the numbers in the following list:skip_these_numbers = [8, 29, 43, 68, 98]
Computing environment#
%load_ext watermark
%watermark -v -p pandas,jupyterlab
Python implementation: CPython
Python version : 3.9.12
IPython version : 8.2.0
pandas : 1.4.2
jupyterlab : 3.3.2
completejourney_py: 0.0.3