import glob
import os
import random
import pandas as pd
17 Controlling Repetition with Iteration Statements
Repetition is common in data science tasks—from looping through rows of a dataset to applying transformations across multiple features or files. Fortunately, programming languages like Python provide iteration statements to handle these repetitive tasks efficiently and clearly.
In this chapter, you’ll learn how to use for
and while
loops to perform repetition in your code. You’ll also learn how to control loop behavior using break
and continue
, explore the concept of iterables, and practice using list comprehensions—a powerful and Pythonic way to iterate and transform data collections.
These tools are foundational in data mining and data science work, where we often need to process large amounts of data, automate repetitive operations, and build reusable code structures.
By the end of this lesson you will be able to:
- Apply
for
andwhile
loops to execute repetitive code statements. - Incorporate
break
andcontinue
to control looping statements. - Explain what a list comprehension is and implement variations of them.
- Discuss the concept of iterables.
As you read through this chapter, we encourage you to follow along using the companion notebook in Google Colab (or other editor of choice). This interactive notebook lets you run code examples covered in the chapter—and experiment with your own ideas.
👉 Open the Iteration Statements Notebook in Colab.
17.1 Prerequisites
17.2 for
loop
The for
loop is used to execute repetitive code statements for a particular number of times. The general syntax is provided below where i
is the counter and as i
assumes each sequential value the code in the body will be performed for that ith value.
# syntax of for loop
# !! this code won't run but, rather, gives you an idea of what the syntax looks like !!
for i in sequence:
<do stuff here with i>
There are three main components of a for
loop to consider:
- Sequence: The sequence represents each element in a list or tuple, each key-value pair in a dictionary, or each column in a DataFrame.
- Body: apply some function(s) to the object we are iterating over.
- Output: You must specify what to do with the result. This may include printing out a result or modifying the object in place.
For example, say we want to iterate N times, we can perform a for loop using the range()
function:
for number in range(10):
print(number)
0
1
2
3
4
5
6
7
8
9
We can add multiple lines to our for
loop; we just need to ensure that each line follows the same indentation patter:
for number in range(10):
= number * number
squared print(f'{number} squared = {squared}')
0 squared = 0
1 squared = 1
2 squared = 4
3 squared = 9
4 squared = 16
5 squared = 25
6 squared = 36
7 squared = 49
8 squared = 64
9 squared = 81
Rather than just print out some result, we can also assign the computation to an object. For example, say we wanted to assign the squared result in the previous for
loop to a dictionary where the key is the original number and the value is the squared value.
= {}
squared_values
for number in range(10):
= number * number
squared = squared
squared_values[number]
squared_values
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
Knowledge check
We can see all data sets that we have in the “data/monthly_data” folder with glob.glob
:
= sorted(glob.glob("../data/monthly_data/*"))
monthly_data_files monthly_data_files
['../data/monthly_data/Month-01.csv',
'../data/monthly_data/Month-02.csv',
'../data/monthly_data/Month-03.csv',
'../data/monthly_data/Month-04.csv',
'../data/monthly_data/Month-05.csv',
'../data/monthly_data/Month-06.csv',
'../data/monthly_data/Month-07.csv',
'../data/monthly_data/Month-08.csv',
'../data/monthly_data/Month-09.csv',
'../data/monthly_data/Month-10.csv',
'../data/monthly_data/Month-11.csv']
If you wanted to get just the file name from the string path we can use os.path.basename
:
= os.path.basename(monthly_data_files[0])
file_name file_name
'Month-01.csv'
And if we wanted to just get the name minus the file extension we can apply some simple string indexing to remove the last four characters (.csv
):
-4] file_name[:
'Month-01'
17.3 Controlling sequences
There are two ways to control the progression of a loop:
continue
: terminates the current iteration and advances to the next.break
: exits the entire for loop.
Both are used in conjunction with if statements. For example, this for loop will iterate for each element in year
; however, when it gets to the element that equals the year of covid
(2020) it will break
out and end the for loop process.
# range will produce numbers starting at 2018 and up to but not include 2023
= range(2018, 2023)
years list(years)
[2018, 2019, 2020, 2021, 2022]
= 2020
covid
for year in years:
if year == covid: break
print(year)
2018
2019
The continue
argument is useful when we want to skip the current iteration of a loop without terminating it. On encountering continue
, the Python parser skips further evaluation and starts the next iteration of the loop. In this example, the for loop will iterate for each element in year; however, when it gets to the element that equals covid it will skip the rest of the code execution simply jump to the next iteration.
for year in years:
if year == covid: continue
print(year)
2018
2019
2021
2022
Knowledge check
17.4 List comprehensions
List comprehensions offer a shorthand syntax for for
loops and are very common in the Python community. Although a little odd at first, the way to think of list comprehensions is as a backward for
loop where we state the expression first, and then the sequence.
# !! this code won't run but, rather, gives you an idea of what the syntax looks like !!
# syntax of for loop
for i in sequence:
expression
# syntax for a list comprehension
for i in sequence] [expression
Often, we’ll see a pattern like the following where we:
- create an empty object (list in this example)
- loop over an object and perform some computation
- save the result to the empty object
= []
squared_values for number in range(5):
= number * number
squared
squared_values.append(squared)
squared_values
[0, 1, 4, 9, 16]
A list comprehension allows us to condense this pattern to a single line:
= [number * number for number in range(5)]
squared_values squared_values
[0, 1, 4, 9, 16]
List comprehensions even allow us to add conditional statements. For example, here we use a conditional statement to skip even numbers:
= [number * number for number in range(10) if number % 2 != 0]
squared_odd_values squared_odd_values
[1, 9, 25, 49, 81]
For more complex conditional statements, or if the list comprehension gets a bit long, we can use multiple lines to make it easier to digest:
= [
squared_certain_values * number for number in range(10)
number if number % 2 != 0 and number != 5
]
squared_certain_values
[1, 9, 49, 81]
There are other forms of comprehensions as well. For example, we can perform a dictionary comprehension where we follow the same patter; however, we use dict brackets ({
) instead of list brackets ([
):
= {number: number*number for number in range(10)}
squared_values_dict squared_values_dict
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
Check out this video that provides more discussion and examples of using comprehensions.
Knowledge check
17.5 while
loop
We may not always know how many iterations we need to make. Rather, we simply want to perform some task while a particular condition exists. This is the job of a while
loop. A while
loop follows the same logic as a for
loop, except, rather than specify a sequence we want to specify a condition that will determine how many iterations.
# syntax of for loop
while condition_holds:
<do stuff here with i>
For example, the probability of flipping 10 coins and getting all heads or tails is \((\frac{1}{2})^{10} = 0.0009765625\) (1 in 1024 tries). Let’s implement this and see how many times it’ll take to accomplish this feat.
The following while
statement will check if the number of unique values for 10 flips are 1, which implies that we flipped all heads or tails. If it is not equal to 1 then we repeat the process of flipping 10 coins and incrementing the number of tries. When our condition statement ten_of_a_kind == True
then our while loop will stop.
# create a coin
= ['heads', 'tails']
coin
# we'll use this to track how many tries it takes to get 10 heads or 10 tails
= 0
n_tries
# signals if we got 10 heads or 10 tails
= False
ten_of_a_kind
while not ten_of_a_kind:
# flip coin 10 times
= [random.choice(coin) for flip in range(11)]
ten_coin_flips
# check if there
= len(set(ten_coin_flips)) == 1
ten_of_a_kind
# add iteration to counter
+= 1
n_tries
print(f'After {n_tries} flips: {ten_coin_flips}')
After 1581 flips: ['tails', 'tails', 'tails', 'tails', 'tails', 'tails', 'tails', 'tails', 'tails', 'tails', 'tails']
Knowledge check
17.6 Iterables
Python strongly leverages the concept of iterable objects. An object is considered iterable if it is either a physically stored sequence, or an object that produces one result at a time in the context of an interation tool like a for
loop. Up to this point, our example looping structures have primarily iterated over a DataFrame or a list.
When our for
loop iterates over a DataFrame, underneath the hood it is first accessing the iterable object, and then iterating over each item. As the following illustrates, the default iterable components of a DataFrame are the columns:
= pd.DataFrame({'col1': [1, 2, 3], 'col2': [3, 4, 5], 'col3': [6, 6, 6]})
df
= df.__iter__() # access iterable object
I print(next(I)) # first iteration
print(next(I)) # second iteration
print(next(I)) # third iteration
col1
col2
col3
When our for
loop iterates over a list, the same procedure unfolds. Note that when no more items are available to iterate over, a StopIteration
is thrown which signals to our for
loop that no more itertions should be performed.
= ['Robert', 'Sandy', 'John', 'Patrick']
names
= names.__iter__() # access iterable object
I print(next(I)) # first iteration
print(next(I)) # second iteration
print(next(I)) # third iteration
print(next(I)) # fourth iteration
print(next(I)) # no more items
Robert
Sandy
John
Patrick
--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) Cell In[18], line 8 6 print(next(I)) # third iteration 7 print(next(I)) # fourth iteration ----> 8 print(next(I)) # no more items StopIteration:
Dictionaries and tuples are also iterable objects. Iterating over dictionary automatically returns one key at a time, which allows us to have the key and index for that key at the same time:
= {'a':1, 'b':2, 'c':3}
D
= D.__iter__() # access iterable object
I print(next(I)) # first iteration
print(next(I)) # second iteration
print(next(I)) # third iteration
a
b
c
for key in D:
print(key, D[key])
a 1
b 2
c 3
Although using these iterables in a for loop is quite common, you will often see two other approaches which include the iterables range()
and enumerate()
. range is often used to generate indexes in a for loop but you can use it anywhere you need a series of integers. However, range is an iterable that generates items on demand:
= range(5)
values
= values.__iter__()
I print(next(I))
print(next(I))
print(next(I))
0
1
2
So if you wanted to iterate over each column in our DataFrame, an alternative is to use range. In this example, range produces the numeric index for each column so we simply use that value to index for the column within the for loop:
= []
unique_values for col in range(len(df.columns)):
= df.iloc[:, col].nunique()
value
unique_values.append(value)
unique_values
[3, 3, 1]
Another common iterator you will see is enumerate
. Actually, the enumerate
function returns a generator object, which also supports this iterator concept. The benefit of enumerate
is that it returns a (index, value) tuple each time through the loop:
= enumerate(df) # access iterable object
E print(next(E)) # first iteration
print(next(E)) # second iteration
print(next(E)) # third iteration
(0, 'col1')
(1, 'col2')
(2, 'col3')
The for
loop steps through these tuples automatically and allows us to unpack their values with tuple assignment in the header of the for
loop. In the following example, we unpack the tuples into the variables index
and col
and we can now use both of these values however necessary in a for loop.
for index, col in enumerate(df):
print(f'{index} - {col}')
0 - col1
1 - col2
2 - col3
There are additional iterable objects that can be used in looping structures (i.e. zip, map); however, the ones discussed here are the most common you will come across and likely use.
Learn more about iterables and a similar, yet different concept – ‘iterators’ with this video.
17.7 Summary
In this chapter, you learned how to use iteration statements to write more efficient and powerful Python code. These tools are essential for any data scientist or analyst, especially when working with large datasets or needing to automate repetitive tasks.
You explored how:
- The
for
loop allows you to iterate over sequences like lists, dictionaries, and DataFrames. - The
while
loop executes code repeatedly until a specified condition is no longer true. break
andcontinue
give you more control over loop execution.- List and dictionary comprehensions provide a compact and readable way to create new collections.
- Iterables and iterator objects, such as
range()
andenumerate()
, form the foundation of Python looping behavior and data traversal.
Understanding these concepts sets you up for more advanced programming patterns, where repetition, transformation, and control flow are crucial.
But iteration isn’t the only way to make your code more concise and reusable. In the next chapter, you’ll take your skills a step further by learning how to write your own functions. Functions allow you to encapsulate logic into clean, modular blocks of code—another key capability for data scientists who want to write readable, efficient, and maintainable analysis pipelines.
17.8 Exercise: Practicing Looping and Iteration Patterns
In this exercise set, you’ll practice using for
loops, while
loops, conditional logic, and comprehensions. These tasks will help you build fluency with the iteration patterns that show up frequently in data wrangling and automation tasks.
You can run these exercises in your own Python editor or in the companion notebook.
Don’t hesitate to ask for help! Use ChatGPT, GitHub Copilot, or any other AI coding assistant to get guidance or debug your code. It’s a great way to reinforce learning and explore alternate solutions.