Lesson 6b: Iteration statement#

Often, we need to execute repetitive code statements a particular number of times. Or, we may even need to execute code for an undetermined number of times until a certain condition no longer holds. There are multiple ways we can achieve this and in this lesson we will cover several of the more common approaches to perform iteration.

Learning objectives#

By the end of this lesson you will be able to:

  • Apply for and while loops to execute repetitive code statements.

  • Incorporate break and continue to control looping statements.

  • Explain what a list comprehension is and implement variations of them.

  • Discuss the concept of iterables.

Video 🎥:

First, check out this video for a simple introduction to for and while loops. Then move on to the lesson that follows which will reiterate and build upon these basic concepts.

Prerequisites#

import glob
import os
import random
import pandas as pd

for loop#

The for loop is used to execute repetitive code statements for a particular number of times. The general syntax is provided below where i is the counter and as i assumes each sequential value the code in the body will be performed for that ith value.

# syntax of for loop
for i in sequence:
    <do stuff here with i>

There are three main components of a for loop to consider:

  1. Sequence: The sequence represents each element in a list or tuple, each key-value pair in a dictionary, or each column in a DataFrame.

  2. Body: apply some function(s) to the object we are iterating over.

  3. Output: You must specify what to do with the result. This may include printing out a result or modifying the object in place.

For example, say we want to iterate N times, we can perform a for loop using the range() function:

for number in range(10):
    print(number)
0
1
2
3
4
5
6
7
8
9

We can add multiple lines to our for loop; we just need to ensure that each line follows the same indentation patter:

for number in range(10):
    squared = number * number
    print(f'{number} squared = {squared}')
0 squared = 0
1 squared = 1
2 squared = 4
3 squared = 9
4 squared = 16
5 squared = 25
6 squared = 36
7 squared = 49
8 squared = 64
9 squared = 81

Rather than just print out some result, we can also assign the computation to an object. For example, say we wanted to assign the squared result in the previous for loop to a dictionary where the key is the original number and the value is the squared value.

squared_values = {}

for number in range(10):
    squared = number * number
    squared_values[number] = squared

squared_values
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

Knowledge check#

We can see all data sets that we have in the “data/monthly_data” folder with glob.glob:

monthly_data_files = glob.glob("../data/monthly_data/*")
monthly_data_files
['../data/monthly_data/Month-08.csv',
 '../data/monthly_data/Month-09.csv',
 '../data/monthly_data/Month-01.csv',
 '../data/monthly_data/Month-02.csv',
 '../data/monthly_data/Month-03.csv',
 '../data/monthly_data/Month-07.csv',
 '../data/monthly_data/Month-06.csv',
 '../data/monthly_data/Month-10.csv',
 '../data/monthly_data/Month-04.csv',
 '../data/monthly_data/Month-05.csv',
 '../data/monthly_data/Month-11.csv']

If you wanted to get just the file name from the string path we can use os.path.basename:

file_name = os.path.basename(monthly_data_files[0])
file_name
'Month-08.csv'

And if we wanted to just get the name minus the file extension we can apply some simple string indexing to remove the last four characters (.csv):

file_name[:-4]
'Month-08'

Tasks:

Use this knowledge to:

  1. Create an empty dictionary called monthly_data.

  2. Loop over monthly_data_files and assign the file name as the dictionary key and assign the file path as the value.

  3. Loop over monthly_data_files and assign the file name as the dictionary key, import the data with pd.read_csv() and assign the imported DataFrame as the value in the dictionary.

Video 🎥:

Controlling sequences#

There are two ways to control the progression of a loop:

  • continue: terminates the current iteration and advances to the next.

  • break: exits the entire for loop.

Both are used in conjunction with if statements. For example, this for loop will iterate for each element in year; however, when it gets to the element that equals the year of covid (2020) it will break out and end the for loop process.

# range will produce numbers starting at 2018 and up to but not include 2023
years = range(2018, 2023)
list(years)
[2018, 2019, 2020, 2021, 2022]
covid = 2020

for year in years:
    if year == covid: break
    print(year)
2018
2019

The continue argument is useful when we want to skip the current iteration of a loop without terminating it. On encountering continue, the Python parser skips further evaluation and starts the next iteration of the loop. In this example, the for loop will iterate for each element in year; however, when it gets to the element that equals covid it will skip the rest of the code execution simply jump to the next iteration.

for year in years:
    if year == covid: continue
    print(year)
2018
2019
2021
2022

Knowledge check#

Tasks:

Modify the following for loop with a continue or break statement to:

  1. only import Month-01 through Month-07

  2. only import Month-08 through Month-10

monthly_data_files = glob.glob("../data/monthly_data/*")
monthly_data = {}

for file in monthly_data_files:
    file_name = os.path.basename(file)[:-4]
    monthly_data[file_name] = pd.read_csv(file)

Video 🎥:

List comprehensions#

List comprehensions offer a shorthand syntax for for loops and are very common in the Python community. Although a little odd at first, the way to think of list comprehensions is as a backward for loop where we state the expression first, and then the sequence.

# syntax of for loop
for i in sequence:
    expression
  
# syntax for a list comprehension
[expression for i in sequence]

Often, we’ll see a pattern like the following where we:

  1. create an empty object (list in this example)

  2. loop over an object and perform some computation

  3. save the result to the empty object

squared_values = []
for number in range(5):
    squared = number * number
    squared_values.append(squared)

squared_values
[0, 1, 4, 9, 16]

A list comprehension allows us to condense this pattern to a single line:

squared_values = [number * number for number in range(5)]
squared_values
[0, 1, 4, 9, 16]

List comprehensions even allow us to add conditional statements. For example, here we use a conditional statement to skip even numbers:

squared_odd_values = [number * number for number in range(10) if number % 2 != 0]
squared_odd_values
[1, 9, 25, 49, 81]

For more complex conditional statements, or if the list comprehension gets a bit long, we can use multiple lines to make it easier to digest:

squared_certain_values = [
    number * number for number in range(10)
    if number % 2 != 0 and number != 5
    ]

squared_certain_values
[1, 9, 49, 81]

There are other forms of comprehensions as well. For example, we can perform a dictionary comprehension where we follow the same patter; however, we use dict brackets ({) instead of list brackets ([):

squared_values_dict = {number: number*number for number in range(10)}
squared_values_dict
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

Video 🎥:

Check out this video that provides more discussion and examples of using comprehensions.

Knowledge check#

Tasks:

Re-write the following for loop using a dictionary comprehension:

monthly_data_files = glob.glob("../data/monthly_data/*")
monthly_data = {}

for file in monthly_data_files:
    file_name = os.path.basename(file)[:-4]
    monthly_data[file_name] = pd.read_csv(file)

Video 🎥:

while loop#

We may not always know how many iterations we need to make. Rather, we simply want to perform some task while a particular condition exists. This is the job of a while loop. A while loop follows the same logic as a for loop, except, rather than specify a sequence we want to specify a condition that will determine how many iterations.

# syntax of for loop
while condition_holds:
    <do stuff here with i>

For example, the probability of flipping 10 coins and getting all heads or tails is \((\frac{1}{2})^{10} = 0.0009765625\) (1 in 1024 tries). Let’s implement this and see how many times it’ll take to accomplish this feat.

The following while statement will check if the number of unique values for 10 flips are 1, which implies that we flipped all heads or tails. If it is not equal to 1 then we repeat the process of flipping 10 coins and incrementing the number of tries. When our condition statement ten_of_a_kind == True then our while loop will stop.

# create a coin
coin = ['heads', 'tails']

# we'll use this to track how many tries it takes to get 10 heads or 10 tails
n_tries = 0

# signals if we got 10 heads or 10 tails
ten_of_a_kind = False

while not ten_of_a_kind:
    # flip coin 10 times
    ten_coin_flips = [random.choice(coin) for flip in range(11)]

    # check if there
    ten_of_a_kind = len(set(ten_coin_flips)) == 1

    # add iteration to counter
    n_tries += 1


print(f'After {n_tries} flips: {ten_coin_flips}')
After 2008 flips: ['heads', 'heads', 'heads', 'heads', 'heads', 'heads', 'heads', 'heads', 'heads', 'heads', 'heads']

Knowledge check#

Tasks:

An elementary example of a random walk is the random walk on the integer number line, \(Z\), which starts at 0 and at each step moves +1 or −1 with equal probability.

Fill in the incomplete code chunk below to perform a random walk starting at value 0, with each step either adding or subtracting 1. Have your random walk stop if the value it exceeds 100 or if the number of steps taken exceeds 10,000.

value = 0
n_tries = 0
exceeds_100 = False

while not exceeds_100 or _______:
    # randomly add or subtract 1
    random_value = random.choice([-1, 1])
    value += _____

    # check if value exceeds 100
    exceeds_100 = ______

    # add iteration to counter
    n_tries += _____

  
print(f'The final value was {value} after {n_tries} iterations.')

Video 🎥:

Iterables#

Python strongly leverages the concept of iterable objects. An object is considered iterable if it is either a physically stored sequence, or an object that produces one result at a time in the context of an interation tool like a for loop. Up to this point, our example looping structures have primarily iterated over a DataFrame or a list.

When our for loop iterates over a DataFrame, underneath the hood it is first accessing the iterable object, and then iterating over each item. As the following illustrates, the default iterable components of a DataFrame are the columns:

df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [3, 4, 5], 'col3': [6, 6, 6]})

I = df.__iter__() # access iterable object
print(next(I))    # first iteration
print(next(I))    # second iteration
print(next(I))    # third iteration
col1
col2
col3

When our for loop iterates over a list, the same procedure unfolds. Note that when no more items are available to iterate over, a StopIteration is thrown which signals to our for loop that no more itertions should be performed.

names = ['Robert', 'Sandy', 'John', 'Patrick']

I = names.__iter__() # access iterable object
print(next(I))       # first iteration
print(next(I))       # second iteration
print(next(I))       # third iteration
print(next(I))       # fourth iteration
print(next(I))       # no more items
Robert
Sandy
John
Patrick
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[18], line 8
      6 print(next(I))       # third iteration
      7 print(next(I))       # fourth iteration
----> 8 print(next(I))       # no more items

StopIteration: 

Dictionaries and tuples are also iterable objects. Iterating over dictionary automatically returns one key at a time, which allows us to have the key and index for that key at the same time:

D = {'a':1, 'b':2, 'c':3}

I = D.__iter__()  # access iterable object
print(next(I))    # first iteration
print(next(I))    # second iteration
print(next(I))    # third iteration
a
b
c
for key in D:
    print(key, D[key])
a 1
b 2
c 3

Although using these iterables in a for loop is quite common, you will often see two other approaches which include the iterables range() and enumerate(). range is often used to generate indexes in a for loop but you can use it anywhere you need a series of integers. However, range is an iterable that generates items on demand:

values = range(5)

I = values.__iter__()
print(next(I))
print(next(I))
print(next(I))
0
1
2

So if you wanted to iterate over each column in our DataFrame, an alternative is to use range. In this example, range produces the numeric index for each column so we simply use that value to index for the column within the for loop:

unique_values = []
for col in range(len(df.columns)):
  value = df.iloc[:, col].nunique()
  unique_values.append(value)

unique_values
[3, 3, 1]

Another common iterator you will see is enumerate. Actually, the enumerate function returns a generator object, which also supports this iterator concept. The benefit of enumerate is that it returns a (index, value) tuple each time through the loop:

E = enumerate(df) # access iterable object
print(next(E))    # first iteration
print(next(E))    # second iteration
print(next(E))    # third iteration
(0, 'col1')
(1, 'col2')
(2, 'col3')

The for loop steps through these tuples automatically and allows us to unpack their values with tuple assignment in the header of the for loop. In the following example, we unpack the tuples into the variables index and col and we can now use both of these values however necessary in a for loop.

for index, col in enumerate(df):
    print(f'{index} - {col}')
0 - col1
1 - col2
2 - col3

Note

There are additional iterable objects that can be used in looping structures (i.e. zip, map); however, the ones discussed here are the most common you will come across and likely use.

Video 🎥:

Learn more about iterables and a similar, yet different concept – ‘iterators’ with this video.

Exercises#

Questions:

  1. For the following list of names, write a list comprehension that creates a list of only words that start with a capital letter (hint: str.isupper()). Which names are included in the result?

    names = ['Steve Irwin', 'koala', 'kangaroo', 'Australia', 'Sydney', 'desert']
    
    
  2. The Fibonacci Sequence is a series of numbers where the next number is found by adding up the two numbers before it. The first two numbers are 0 and 1. For example, 0, 1, 1, 2, 3, 5, 8, 13, 21. The next number in this series above is 13+21 = 34. Use a for loop to produce the first 25 numbers in the Fibanacci Sequence (0, 1, 1, 2, 3, 5, 8, 13, 21…)

  3. Create a for loop that sums the numbers from 0 through 100; however, skip the numbers in the following list:

    skip_these_numbers = [8, 29, 43, 68, 98]
    

Computing environment#

%load_ext watermark
%watermark -v -p pandas,jupyterlab
Python implementation: CPython
Python version       : 3.9.12
IPython version      : 8.2.0

pandas            : 1.4.2
jupyterlab        : 3.3.2
completejourney_py: 0.0.3