# instead of this
= 85
grade_1 = 90
grade_2 = 88
grade_3 = 92
grade_4
# more convenient to group them together like this
= [85, 90, 88, 92] grades
5 Introduction to Data Structures
In the real world, we rarely work with individual pieces of data. Whether we’re analyzing a dataset, building a model, or writing a simulation, we typically work with collections of data. Python provides several built-in ways to store and organize these collections. These tools are called data structures.
In this chapter, you will learn how to:
- Store and manipulate groups of values using lists
- Work with tuples, which are similar to lists but immutable
- Use dictionaries to associate keys with values
- Understand why data structures matter in real-world data science
- Get a preview of NumPy arrays, an advanced data structure used in numerical computing
As you read through this chapter, we encourage you to follow along using the companion notebook in Google Colab (or other editor of choice). This interactive notebook lets you run code examples covered in the chapter—and experiment with your own ideas.
👉 Open the Data Structures Notebook in Colab.
5.1 Why Data Structures?
Imagine you want to store the test scores of a student across multiple exams. Instead of creating separate variables for each score, it makes more sense to group them together:
This approach allows you to access, modify, and analyze data more efficiently. That’s the power of data structures: they help us organize and operate on collections of data.
Python includes different types of data structures that allow us to organize data with specific characteristics suited to the task at hand. For example, some data structures preserve the order of elements, while others do not. Some allow us to modify their contents (mutable), and others do not (immutable). Still others let us associate labels or keys with each item, such as names linked to phone numbers. Choosing the right data structure depends on how we need to access and manage the data.
Building on this, each structure in Python has tradeoffs that make it more suitable for specific situations. If you need to keep track of items in a particular order and expect the data to change, a list might be ideal. If you want to store a fixed grouping of values like GPS coordinates, a tuple ensures the data remains unchanged. And when you need to link one piece of information to another—like a person’s name to their phone number—a dictionary’s key-value format is the perfect tool. Understanding these attributes helps you pick the right structure for the task and write more efficient, readable code.
Think about a recent experience filling out an online form.
- What types of data did you need to input?
- Do you think this data should be stored in a structure that allows for changes (mutability)?
- Does the order of the data matter for how it’s used or displayed? Were there any key-value pairs—like a name linked to an email address—that need to be stored and accessed together?
5.2 Lists: Ordered and Mutable
What is a List?
A list is an ordered, changeable collection of items. You can add, remove, or modify elements in a list. Lists are among the most commonly used data structures in Python due to their flexibility and ease of use. When you need to keep items in a specific order—like daily stock prices, game scores, or survey responses—a list is often the right choice.
Lists are mutable, meaning you can change their contents after they’re created. This makes them ideal for scenarios where your data updates frequently, such as tracking website visits or logging sensor readings.
Creating Lists
To create a list, use square brackets []
and separate items with commas. Lists can hold values of any type: numbers, strings, Booleans, or even other lists. Here are three simple examples:
= ['apple', 'banana', 'cherry']
fruits = [85, 90, 88, 92]
scores = ['blue', 42, True, 3.14] mixed
The order in which you define the elements is preserved. We can see this when we evaluate a list - the output will always be in the same order as it was input:
mixed
['blue', 42, True, 3.14]
This makes lists especially helpful for keeping track of sequences where position matters, such as chronological data or ranked preferences. Because lists are mutable, you can also build them incrementally as your data grows or changes during the course of a program.
Accessing List Items
To access a specific item in a list, you use indexing - which uses brackets ([]
) along with the specified location. So, say we want to get the first item from a list:
1] fruits[
'banana'
Wait a minute! Shouldn’t fruits[1]
give the first item in the list? It seems to give the second. This is because indexing in Python starts at zero.
Python uses zero-based indexing, which means that the first element in a list has an index of 0, the second has an index of 1, and so on. You can also use negative indexing to count from the end of the list, where -1 is the last item, -2 is the second-to-last, etc.
print(fruits[0]) # first item
print(fruits[2]) # second item
print(fruits[-1]) # last item
apple
cherry
cherry
Fun reading: Why Python uses 0-based indexing
Understanding indexing is essential because it affects how you loop through lists, retrieve elements, and manipulate values. For example, if you mistakenly assume that indexing starts at 1, you might accidentally skip or mislabel elements in your data, or you may specify an invalid location which will raise an error like the following:
4] fruits[
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[6], line 1 ----> 1 fruits[4] IndexError: list index out of range
Modifying Lists
Once a list is created, you can change its contents easily. This mutability makes lists incredibly useful for situations where your data evolves over time. You can add new items using methods like .append()
or change existing elements by assigning new values using their index.
'orange')
fruits.append(1] = 'blueberry'
fruits[
fruits
['apple', 'blueberry', 'cherry', 'orange']
These operations reflect how dynamic lists can be—whether you’re updating a list of customer names, recording daily measurements, or tracking items in a to-do list, lists let you adjust your data in place without rebuilding the structure from scratch.
Common List Operations
Lists come with many built-in functions and methods that make it easy to analyze and manipulate data. These operations can help you answer questions like: How many elements are in the list? Does it contain a specific item? Can I sort or remove items from it?
len(fruits) # 4 (returns the number of elements in the list)
'apple' in fruits # True (checks if 'apple' is in the list)
'apple') # removes the first occurrence of 'apple'
fruits.remove(# sorts the list in place (alphabetically or numerically)
fruits.sort() # removes and returns the last item fruits.pop()
These methods are especially useful in data wrangling tasks, such as filtering survey responses, cleaning up logs, or ranking scores.
You can find many other list operations here: https://docs.python.org/3/tutorial/datastructures.html#data-structures
Use Cases
- Time series data
- Collections of records (e.g., names, prices, grades)
- Maintaining a dynamic list of inputs that might grow or shrink over time
- Storing results from computations or simulations
- Collecting user inputs or responses in interactive programs
- And others!
Knowledge Check
5.3 Tuples: Ordered and Immutable
What is a Tuple?
A tuple is similar to a list in that it is an ordered collection of items. However, unlike lists, tuples are immutable, meaning their contents cannot be changed after creation. This immutability makes tuples useful for representing fixed sets of data that should not be altered during a program’s execution.
Think of tuples as a read-only list. This may be contentious, as described in this blog post. Tuples do have many other capabilities beyond what you would expect from just being “a read-only list,” but for us just beginning now, we can think of it that way.
Tuples are typically used when you want to ensure data integrity, such as storing constant values or configuration settings. They are also commonly used to return multiple values from a function, or to represent simple groupings like coordinates, RGB color values, or database records.
Because of their immutability, tuples can be used as keys in dictionaries (we’ll learn what these are shortly), unlike lists. Additionally, they offer slightly better performance than lists when it comes to iteration.
Creating Tuples
To create a tuple, use parentheses ()
and separate values with commas. Like lists, tuples can hold values of different data types. However, because tuples are immutable, they are particularly well-suited for storing data that should remain constant throughout your program.
For example, if you’re working with geographic coordinates, a tuple ensures the latitude and longitude values stay paired and unchanged:
= (39.76, -84.19) coordinates
Or you might use a tuple to store a birthdate, where the structure will never need to be modified:
= (7, 14, 1998) birthday
Tuples are also frequently used when functions need to return multiple values, making them both practical and efficient in everyday programming. We’ll see this in action in later sections.
Accessing Tuple Items
Indexing tuples works exactly the same way as indexing lists in Python. You use square brackets []
with a zero-based index to access elements. For example:
0] coordinates[
39.76
Just like lists, you can use negative indices to access elements from the end:
-1] birthday[
1998
While both lists and tuples support indexing, remember that tuples are immutable. This means you can read elements by index, but you cannot change them:
0] = 41.62 coordinates[
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[12], line 1 ----> 1 coordinates[0] = 41.62 TypeError: 'tuple' object does not support item assignment
Tuple Unpacking
Tuple unpacking allows you to assign each item in a tuple to its own variable in a single line. This is especially useful when a function returns multiple values or when you’re working with grouped data like coordinates, dimensions, or ranges. It improves readability and simplifies your code when working with known-length tuples.
= coordinates x, y
print(x)
39.76
print(y)
-84.19
Here, the value of x will be 39.76 and y will be -84.19, corresponding to the first and second elements of the coordinates tuple, respectively. If you try to unpack a tuple into a different number of variables than it has elements, Python will raise an error
.
Why Use Tuples?
Tuples are ideal when you want to group together related values that should not be changed once set. Their immutability provides a built-in safeguard against accidental modifications, which is especially helpful when the data must remain consistent throughout the execution of a program. Because they are more lightweight than lists, tuples also offer faster performance in scenarios that involve iteration or large numbers of data groupings. Furthermore, since tuples can be used as keys in dictionaries (unlike lists), they provide a reliable way to map compound keys to values in advanced data structures.
Knowledge Check
5.4 Dictionaries: Key-Value Pairs
A dictionary is a collection of key-value pairs, where each unique key maps to a specific value. This structure allows you to organize and retrieve data using meaningful identifiers rather than relying on position, as with lists or tuples. Dictionaries are especially helpful when you need to store data that has a clear label or attribute—such as a student’s name, ID, or grade—making it easy to access or update values using their corresponding keys. This makes dictionaries one of the most powerful and flexible tools for working with structured data in Python.
Creating Dictionaries
To create a dictionary, use curly braces {}
with key-value pairs separated by colons. Each key must be unique and is typically a string, though it can also be a number or other immutable type. Dictionaries are ideal when you want to store data that has meaningful labels. For instance, instead of remembering that index 0 in a list corresponds to a name and index 1 corresponds to a score, a dictionary lets you associate 'name'
directly with a value.
= {
student 'name': 'Jordan',
'score': 95,
'major': 'Data Science'
}
Accessing and Modifying Dictionaries
To access a value in a dictionary, you reference its key in square brackets. This allows you to retrieve values without knowing their position, unlike lists or tuples.
# access the value associated with the 'score' key
'score'] student[
95
You can also update the value of an existing key or add a completely new key-value pair to the dictionary. This mutability makes dictionaries ideal for dynamic data, such as user profiles, configuration settings, or database records.
'score'] = 98 # update the score
student['grad_year'] = 2025 # add a new key-value pair
student[
student
{'name': 'Jordan', 'score': 98, 'major': 'Data Science', 'grad_year': 2025}
Dictionary Methods
Dictionaries come with several built-in methods that allow you to efficiently access and interact with their contents. These methods help you retrieve just the keys, just the values, or both together. They are particularly useful when you’re iterating over a dictionary to analyze or transform its contents.
Try the following operations and see what their outputs are:
# dict_keys(['name', 'score', 'major', 'grad_year'])
student.keys() # dict_values(['Jordan', 98, 'Data Science', 2025])
student.values() # dict_items([('name', 'Jordan'), ('score', 98), ...])
student.items() 'name' in student # True (checks if a key exists in the dictionary)
del student['major'] # removes the 'major' key and its associated value
Using these methods allows you to build flexible programs that can dynamically explore or modify structured data—especially when working with JSON data from APIs, reading metadata from files, or updating attributes of users or records.
Use Cases
Dictionaries are incredibly versatile and can be found in many practical programming and data science scenarios:
- Lookup tables: Use dictionaries to map inputs to outputs, such as converting state abbreviations to full names or translating category codes to descriptions.
- Structured data: Store rows of data as dictionaries where keys represent column names—this mirrors how data is stored in JSON format and is common when working with web APIs or data from files.
- Feature storage in machine learning models: Organize features (like ‘age’, ‘income’, or ‘region’) with their corresponding values for each individual observation, allowing for dynamic construction and retrieval of input data for models.
- Configuration settings: Store user preferences or system parameters that can be easily accessed or updated using descriptive keys.
- Data merging and deduplication: Use keys to uniquely identify records, making it easier to combine and clean data from different sources.
Knowledge Check
5.5 Summary
In this chapter, we focused on three of Python’s most commonly used data structures: lists, tuples, and dictionaries. These structures provide the foundation for how data is organized and accessed in most Python programs, especially in data science workflows.
- Lists are ordered and mutable, making them ideal when you need to maintain and modify a sequence of items.
- Tuples are ordered but immutable, which is helpful for fixed sets of values where data integrity is important.
- Dictionaries store labeled data as key-value pairs, offering a flexible way to organize and retrieve data by meaningful identifiers.
Data Structure | Type (via type() ) |
Description & When to Use | Example |
---|---|---|---|
List | list |
Ordered, mutable collection. Use when you need to keep items in sequence and change them later. | scores = [85, 90, 88, 92] |
Tuple | tuple |
Ordered, immutable collection. Ideal for fixed groupings like coordinates or dates. | coordinates = (39.76, -84.19) |
Dictionary | dict |
Unordered, mutable key-value pairs. Great for labeled data or fast lookups. | student = {'name': 'Jordan', 'score': 95} |
As we move forward in this book, we’ll explore more advanced data structures that build on these basics and help you perform data mining tasks more efficiently. For now, having a solid understanding of these core structures will serve as a crucial building block for your continued work in Python and data science.
5.6 Exercise: Student Records Management
You’ve been asked to build a simple data tracking system for a small classroom.