5  Introduction to Data Structures

In the real world, we rarely work with individual pieces of data. Whether we’re analyzing a dataset, building a model, or writing a simulation, we typically work with collections of data. Python provides several built-in ways to store and organize these collections. These tools are called data structures.

In this chapter, you will learn how to:

Note📓 Follow Along in Colab!

As you read through this chapter, we encourage you to follow along using the companion notebook in Google Colab (or other editor of choice). This interactive notebook lets you run code examples covered in the chapter—and experiment with your own ideas.

👉 Open the Data Structures Notebook in Colab.

5.1 Why Data Structures?

Imagine you want to store the test scores of a student across multiple exams. Instead of creating separate variables for each score, it makes more sense to group them together:

# instead of this
grade_1 = 85
grade_2 = 90
grade_3 = 88
grade_4 = 92

# more convenient to group them together like this
grades = [85, 90, 88, 92]

This approach allows you to access, modify, and analyze data more efficiently. That’s the power of data structures: they help us organize and operate on collections of data.

Python includes different types of data structures that allow us to organize data with specific characteristics suited to the task at hand. For example, some data structures preserve the order of elements, while others do not. Some allow us to modify their contents (mutable), and others do not (immutable). Still others let us associate labels or keys with each item, such as names linked to phone numbers. Choosing the right data structure depends on how we need to access and manage the data.

Building on this, each structure in Python has tradeoffs that make it more suitable for specific situations. If you need to keep track of items in a particular order and expect the data to change, a list might be ideal. If you want to store a fixed grouping of values like GPS coordinates, a tuple ensures the data remains unchanged. And when you need to link one piece of information to another—like a person’s name to their phone number—a dictionary’s key-value format is the perfect tool. Understanding these attributes helps you pick the right structure for the task and write more efficient, readable code.

Think about a recent experience filling out an online form.

  • What types of data did you need to input?
  • Do you think this data should be stored in a structure that allows for changes (mutability)?
  • Does the order of the data matter for how it’s used or displayed? Were there any key-value pairs—like a name linked to an email address—that need to be stored and accessed together?

5.2 Lists: Ordered and Mutable

What is a List?

A list is an ordered, changeable collection of items. You can add, remove, or modify elements in a list. Lists are among the most commonly used data structures in Python due to their flexibility and ease of use. When you need to keep items in a specific order—like daily stock prices, game scores, or survey responses—a list is often the right choice.

Lists are mutable, meaning you can change their contents after they’re created. This makes them ideal for scenarios where your data updates frequently, such as tracking website visits or logging sensor readings.

Creating Lists

To create a list, use square brackets [] and separate items with commas. Lists can hold values of any type: numbers, strings, Booleans, or even other lists. Here are three simple examples:

fruits = ['apple', 'banana', 'cherry']
scores = [85, 90, 88, 92]
mixed = ['blue', 42, True, 3.14]

The order in which you define the elements is preserved. We can see this when we evaluate a list - the output will always be in the same order as it was input:

mixed
['blue', 42, True, 3.14]

This makes lists especially helpful for keeping track of sequences where position matters, such as chronological data or ranked preferences. Because lists are mutable, you can also build them incrementally as your data grows or changes during the course of a program.

Accessing List Items

To access a specific item in a list, you use indexing - which uses brackets ([]) along with the specified location. So, say we want to get the first item from a list:

fruits[1]
'banana'

Wait a minute! Shouldn’t fruits[1] give the first item in the list? It seems to give the second. This is because indexing in Python starts at zero.

ImportantZero-based Indexing

Python uses zero-based indexing, which means that the first element in a list has an index of 0, the second has an index of 1, and so on. You can also use negative indexing to count from the end of the list, where -1 is the last item, -2 is the second-to-last, etc.

print(fruits[0])   # first item
print(fruits[2])   # second item
print(fruits[-1])  # last item
apple
cherry
cherry

Fun reading: Why Python uses 0-based indexing

Understanding indexing is essential because it affects how you loop through lists, retrieve elements, and manipulate values. For example, if you mistakenly assume that indexing starts at 1, you might accidentally skip or mislabel elements in your data, or you may specify an invalid location which will raise an error like the following:

fruits[4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 fruits[4]

IndexError: list index out of range

Modifying Lists

Once a list is created, you can change its contents easily. This mutability makes lists incredibly useful for situations where your data evolves over time. You can add new items using methods like .append() or change existing elements by assigning new values using their index.

fruits.append('orange')
fruits[1] = 'blueberry'

fruits
['apple', 'blueberry', 'cherry', 'orange']

These operations reflect how dynamic lists can be—whether you’re updating a list of customer names, recording daily measurements, or tracking items in a to-do list, lists let you adjust your data in place without rebuilding the structure from scratch.

Common List Operations

Lists come with many built-in functions and methods that make it easy to analyze and manipulate data. These operations can help you answer questions like: How many elements are in the list? Does it contain a specific item? Can I sort or remove items from it?

len(fruits)            # 4 (returns the number of elements in the list)
'apple' in fruits      # True (checks if 'apple' is in the list)
fruits.remove('apple') # removes the first occurrence of 'apple'
fruits.sort()          # sorts the list in place (alphabetically or numerically)
fruits.pop()           # removes and returns the last item

These methods are especially useful in data wrangling tasks, such as filtering survey responses, cleaning up logs, or ranking scores.

You can find many other list operations here: https://docs.python.org/3/tutorial/datastructures.html#data-structures

Use Cases

  • Time series data
  • Collections of records (e.g., names, prices, grades)
  • Maintaining a dynamic list of inputs that might grow or shrink over time
  • Storing results from computations or simulations
  • Collecting user inputs or responses in interactive programs
  • And others!

Knowledge Check

NoneTry It!

Say the last 5 days had daily high temperatures of 79, 83, 81, 89, 78.

  1. Store these values in a list.
  2. Now add a new item to this list that represents today’s high temp of 85.
  3. Next, suppose the first day’s reading was inaccurate and the actual high temp was 76 — update that value in the list.
  4. Finally, sort the list to see the temperatures in ascending order.

5.3 Tuples: Ordered and Immutable

What is a Tuple?

A tuple is similar to a list in that it is an ordered collection of items. However, unlike lists, tuples are immutable, meaning their contents cannot be changed after creation. This immutability makes tuples useful for representing fixed sets of data that should not be altered during a program’s execution.

Think of tuples as a read-only list. This may be contentious, as described in this blog post. Tuples do have many other capabilities beyond what you would expect from just being “a read-only list,” but for us just beginning now, we can think of it that way.

Tuples are typically used when you want to ensure data integrity, such as storing constant values or configuration settings. They are also commonly used to return multiple values from a function, or to represent simple groupings like coordinates, RGB color values, or database records.

Because of their immutability, tuples can be used as keys in dictionaries (we’ll learn what these are shortly), unlike lists. Additionally, they offer slightly better performance than lists when it comes to iteration.

Creating Tuples

To create a tuple, use parentheses () and separate values with commas. Like lists, tuples can hold values of different data types. However, because tuples are immutable, they are particularly well-suited for storing data that should remain constant throughout your program.

For example, if you’re working with geographic coordinates, a tuple ensures the latitude and longitude values stay paired and unchanged:

coordinates = (39.76, -84.19)

Or you might use a tuple to store a birthdate, where the structure will never need to be modified:

birthday = (7, 14, 1998)

Tuples are also frequently used when functions need to return multiple values, making them both practical and efficient in everyday programming. We’ll see this in action in later sections.

Accessing Tuple Items

Indexing tuples works exactly the same way as indexing lists in Python. You use square brackets [] with a zero-based index to access elements. For example:

coordinates[0]
39.76

Just like lists, you can use negative indices to access elements from the end:

birthday[-1]
1998

While both lists and tuples support indexing, remember that tuples are immutable. This means you can read elements by index, but you cannot change them:

coordinates[0] = 41.62
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 coordinates[0] = 41.62

TypeError: 'tuple' object does not support item assignment

Tuple Unpacking

Tuple unpacking allows you to assign each item in a tuple to its own variable in a single line. This is especially useful when a function returns multiple values or when you’re working with grouped data like coordinates, dimensions, or ranges. It improves readability and simplifies your code when working with known-length tuples.

x, y = coordinates
print(x)
39.76
print(y)
-84.19

Here, the value of x will be 39.76 and y will be -84.19, corresponding to the first and second elements of the coordinates tuple, respectively. If you try to unpack a tuple into a different number of variables than it has elements, Python will raise an error.

Why Use Tuples?

Tuples are ideal when you want to group together related values that should not be changed once set. Their immutability provides a built-in safeguard against accidental modifications, which is especially helpful when the data must remain consistent throughout the execution of a program. Because they are more lightweight than lists, tuples also offer faster performance in scenarios that involve iteration or large numbers of data groupings. Furthermore, since tuples can be used as keys in dictionaries (unlike lists), they provide a reliable way to map compound keys to values in advanced data structures.

Knowledge Check

NoneTry It!

Given the following tuple schooling = ('UC', 'BANA', '4080')

  • Use indexing to grab the word “BANA”.
  • Change the value of “BANA” to “Business Analytics”. What happens?
  • Unpack the schooling tuple into three variables: university, program, class_id.

5.4 Dictionaries: Key-Value Pairs

A dictionary is a collection of key-value pairs, where each unique key maps to a specific value. This structure allows you to organize and retrieve data using meaningful identifiers rather than relying on position, as with lists or tuples. Dictionaries are especially helpful when you need to store data that has a clear label or attribute—such as a student’s name, ID, or grade—making it easy to access or update values using their corresponding keys. This makes dictionaries one of the most powerful and flexible tools for working with structured data in Python.

Creating Dictionaries

To create a dictionary, use curly braces {} with key-value pairs separated by colons. Each key must be unique and is typically a string, though it can also be a number or other immutable type. Dictionaries are ideal when you want to store data that has meaningful labels. For instance, instead of remembering that index 0 in a list corresponds to a name and index 1 corresponds to a score, a dictionary lets you associate 'name' directly with a value.

student = {
    'name': 'Jordan',
    'score': 95,
    'major': 'Data Science'
}

Accessing and Modifying Dictionaries

To access a value in a dictionary, you reference its key in square brackets. This allows you to retrieve values without knowing their position, unlike lists or tuples.

# access the value associated with the 'score' key
student['score']
95

You can also update the value of an existing key or add a completely new key-value pair to the dictionary. This mutability makes dictionaries ideal for dynamic data, such as user profiles, configuration settings, or database records.

student['score'] = 98  # update the score
student['grad_year'] = 2025  # add a new key-value pair

student
{'name': 'Jordan', 'score': 98, 'major': 'Data Science', 'grad_year': 2025}

Dictionary Methods

Dictionaries come with several built-in methods that allow you to efficiently access and interact with their contents. These methods help you retrieve just the keys, just the values, or both together. They are particularly useful when you’re iterating over a dictionary to analyze or transform its contents.

Try the following operations and see what their outputs are:

student.keys()         # dict_keys(['name', 'score', 'major', 'grad_year'])
student.values()       # dict_values(['Jordan', 98, 'Data Science', 2025])
student.items()        # dict_items([('name', 'Jordan'), ('score', 98), ...])
'name' in student      # True (checks if a key exists in the dictionary)
del student['major']   # removes the 'major' key and its associated value

Using these methods allows you to build flexible programs that can dynamically explore or modify structured data—especially when working with JSON data from APIs, reading metadata from files, or updating attributes of users or records.

Use Cases

Dictionaries are incredibly versatile and can be found in many practical programming and data science scenarios:

  • Lookup tables: Use dictionaries to map inputs to outputs, such as converting state abbreviations to full names or translating category codes to descriptions.
  • Structured data: Store rows of data as dictionaries where keys represent column names—this mirrors how data is stored in JSON format and is common when working with web APIs or data from files.
  • Feature storage in machine learning models: Organize features (like ‘age’, ‘income’, or ‘region’) with their corresponding values for each individual observation, allowing for dynamic construction and retrieval of input data for models.
  • Configuration settings: Store user preferences or system parameters that can be easily accessed or updated using descriptive keys.
  • Data merging and deduplication: Use keys to uniquely identify records, making it easier to combine and clean data from different sources.

Knowledge Check

NoneTry It!
  1. Create a dictionary that stores a classmate’s nickname, phone number, and age.

    john_doe = {
        'nickname': _______,
        'phone_number': __,
        'age': __
    }
  2. Create another dictionary called classmate2 with similar information for a different classmate.

  3. Combine these two dictionaries into a new dictionary called contacts, where each key is the classmate’s name and the value is their corresponding dictionary.

  4. Add a new entry to the contacts dictionary for a third classmate, including their name, phone number, and age.

5.5 Summary

In this chapter, we focused on three of Python’s most commonly used data structures: lists, tuples, and dictionaries. These structures provide the foundation for how data is organized and accessed in most Python programs, especially in data science workflows.

  • Lists are ordered and mutable, making them ideal when you need to maintain and modify a sequence of items.
  • Tuples are ordered but immutable, which is helpful for fixed sets of values where data integrity is important.
  • Dictionaries store labeled data as key-value pairs, offering a flexible way to organize and retrieve data by meaningful identifiers.
Data Structure Type (via type()) Description & When to Use Example
List list Ordered, mutable collection. Use when you need to keep items in sequence and change them later. scores = [85, 90, 88, 92]
Tuple tuple Ordered, immutable collection. Ideal for fixed groupings like coordinates or dates. coordinates = (39.76, -84.19)
Dictionary dict Unordered, mutable key-value pairs. Great for labeled data or fast lookups. student = {'name': 'Jordan', 'score': 95}

As we move forward in this book, we’ll explore more advanced data structures that build on these basics and help you perform data mining tasks more efficiently. For now, having a solid understanding of these core structures will serve as a crucial building block for your continued work in Python and data science.

5.6 Exercise: Student Records Management

You’ve been asked to build a simple data tracking system for a small classroom.

  • Make a list called student_names that includes the names of 5 students.
  • Add a new student to the list.
  • Remove the second student in the list.
  • Sort the list alphabetically and print the final list.
  • Create a tuple named classroom_location that stores the building name and room number, such as ("Lindner Hall", 315).
  • Unpack the tuple into two variables: building and room, and print each one with a label.
  • Create a dictionary named student_info for one of the students, including their name, major, and graduation year.
  • Add a new key for GPA with a value of your choice.
  • Print out all the keys, all the values, and the full dictionary.

Think through what you just did:

  • What makes a list a good choice for student_names? Is there an alternative approach you could’ve taken?
  • Why would we use a tuple for classroom_location?
  • What makes a dictionary a good choice for student_info?