Lesson 1d: Data structures#


So far we’ve worked with single values: numbers, strings, and booleans. But Python also supports more complex data types, sometimes called data structures.

There are three very common built-in data structures that we are going to learn about in this lesson: lists, tuples, and dictionaries.

Learning objectives#

By the end of this lesson you will be able to:

  • Explain the difference between lists, tuples, and dictionaries and when to use each.

  • Create and manage these data structures along with how to apply operators on them.

Video 🎥:

Lists#

Lists and tuples allow us to store multiple things (“elements”) in a single object. The elements are considered ordered, which just means the elements remain in the same position as when created unless they are manually re-ordered. Let’s start with lists.

Creation#

Lists are represented using brackets ([]).

# A list of integers
numbers = [1, 2, 3]
numbers
[1, 2, 3]
type(numbers)
list
# A list of strings
strings = ['abc', 'def']
strings
['abc', 'def']

Lists are highly flexible. They can contain heterogeneous data (i.e. strings, booleans, and numbers can all be in the same list) and lists can even contain other lists!

# Lists containing heterogeneous data
combo = ['a', 'b', 3, 4]
combo_2 = [True, 'True', 1, 1.0]

# Note that the last element of the list is another list!
nested_list = [1, 2, 3, [4, 5]]
nested_list
[1, 2, 3, [4, 5]]

We can also create a list by type conversion with list(). For example, we can convert a string into a list of characters.

my_str = 'A string.'
list(my_str)
['A', ' ', 's', 't', 'r', 'i', 'n', 'g', '.']

Indexing#

Individual elements of a list can be accessed by specifying a location in brackets. This is called indexing. So, say we want to get the first item from a list:

letters = ['a', 'b', 'c']
letters[1]
'b'

Wait a minute! Shouldn’t letters[1] give the first item in the list? It seems to give the second. This is because indexing in Python starts at zero.

Warning

Python uses zero-based indexing, so the first element is element 0! (Historical note: Why Python uses zero-based indexing.)

letters[0]
'a'
letters[2]
'c'

Specifying an invalid location will raise an error.

letters[4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 letters[4]

IndexError: list index out of range

Note

Most programming languages are zero indexed, so a list with 3 elements has valid locations [0, 1, 2]. But this means that there is no element #3 in a 3-element list! Trying to access it will cause an out-of-range error. This is a common mistake for those new to programming (and sometimes it bites the veterans too).

Slicing#

Now, what if we want to pull out multiple sequential items in a list? We call this slicing? We can use colons (:) for that.

letters_in_my_name = list('brad boehmke')

# first 3 elements
letters_in_my_name[0:3]
['b', 'r', 'a']

We got elements 0 through 2 even though we stated 0:3. When using the colon indexing, my_list[i:j], we get items i through j-1.

Note

The slice range is inclusive of the first index and exclusive of the last. If the slice’s final index is larger than the length of the sequence, the slice ends at the last element.

We can even get away with not specifying the first or last number if we wish to get all elements up to, or all elements starting with, a certain element.

# all elements up to element 3
letters_in_my_name[0:3]
['b', 'r', 'a']
# all elements starting with element 3
letters_in_my_name[2:]
['a', 'd', ' ', 'b', 'o', 'e', 'h', 'm', 'k', 'e']

One last thing to note is that we can specify a stride. The stride comes after a second colon. For example, if we only wanted to get every other element

# stride of 2 will get every other element
letters_in_my_name[::2]
['b', 'a', ' ', 'o', 'h', 'k']

So, in general, the indexing scheme is:

my_list[start:end:stride]

  • If there are no colons, a single element is returned.

  • If there are any colons, we are slicing the list, and a list is returned.

  • If there is one colon, stride is assumed to be 1.

  • If start is not specified, it is assumed to be zero.

  • If end is not specified, the interpreted assumed you want the entire list.

  • If stride is not specified, it is assumed to be 1.

Note

There are a lot of options here and I don’t expect you to remember them after first glance. Just realize that slicing is extremely flexible!

Operators#

Operators on lists behave much like operators on strings. The + operator on lists means list concatenation.

[1, 2, 3] + [4, 5, 6]
[1, 2, 3, 4, 5, 6]

The * operator on lists means list replication and concatenation.

[1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]

Membership operators are used to determine if an item is in a list. The two membership operators are:

English

Operator

is a member of

in

is not a member of

not in

The result of the operator is True or False. Let’s look at letters again:

'a' in letters
True
'z' in letters
False

Note

Membership operators are case sensitive!

'A' not in letters
True

These membership operators offer a great convenience for conditionals.

first_inital = 'b'

if first_inital in letters:
    print('My first initial is in the list!')
else:
    print('Aw shucks!')
My first initial is in the list!

Mutability#

Lists are mutable. This means we can change their values without creating a new list. (You cannot change the data type or identity.) Let’s see this by example.

my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
my_list[3] = 'four'

my_list
[1, 2, 3, 'four', 5, 6, 7, 8, 9, 10]

The other data types we have encountered so far, integers, floats, and strings, are immutable. You cannot change their values without reassigning them. To see this, we’ll use the id() function, which tells us where in memory that the variable is stored. (Note: this identity is unique to the Python interpreter, and should not be considered an actual physical address in memory.)

a = 8451
print(id(a))

a = 8452
print(id(a))
4537275440
4537277616

So, we see that the identity of a, an integer, changed when we tried to change its value. So, we didn’t actually change its value; we made a new variable. With lists, though, this is not the case.

print(id(my_list))

my_list[0] = 'zero'
print(id(my_list))
4523764416
4523764416

Tip

It is still the same list! This is very important to consider when we do assignments.

Knowledge check#

Questions

Given the following list l = [10, [3, 4], [5, [100, 200, ["BANA"]], 23, 11], 1, 7]

  1. Use indexing to grab the word “BANA”.

  2. Change the value of “BANA” to “BANA 6043”.

  3. Use slicing to get the last 4 elements.

Video 🎥:

Tuples#

A tuple is just like a list, except it is immutable (basically a read-only list).

Note

What I just said there is explosive, as described in this blog post. Tuples do have many other capabilities beyond what you would expect from just being “a read-only list,” but for us just beginning now, we can think of it that way.

Creation#

A tuple is created just like a list, except we use parentheses () instead of brackets. The only watch-out is that a tuple with a single item needs to include a comma after the item.

my_tuple = ('a', 'b', 'c')
type(my_tuple)
tuple
a_tuple = (0,)   # Create a single element tuple
not_a_tuple = (0) # This is just the number 0 (normal use of parantheses)

type(a_tuple), type(not_a_tuple)
(tuple, int)

We can also create a tuple by doing a type conversion. We can convert our list to a tuple.

name_as_string = 'brad boehmke'
name_as_list = list(name_as_string)
name_as_tuple = tuple(name_as_list)

name_as_tuple
('b', 'r', 'a', 'd', ' ', 'b', 'o', 'e', 'h', 'm', 'k', 'e')

Indexing & slicing#

Similar to lists, we can index and slice using [] notation and specifying the elements of interest. The only difference is when slicing, we get a tuple in return rather than a list.

# Last letter
name_as_tuple[-1]
'e'
name_as_tuple[0:3]
('b', 'r', 'a')

Operators#

As with lists we can concatenate tuples with the + operator.

(1, 2, 3) + (4, )
(1, 2, 3, 4)

Membership operators work the same as with lists.

'z' in name_as_tuple
False

Mutability#

As we stated at the beginning of this section, tuples are immutable. This means once we’ve created a tuple we can not change the existing elements inside it.

name_as_tuple[0] = 'B'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8f/c06lv6q17tjbyjv2nkt0_s4s1sh0tg/T/ipykernel_98564/197687315.py in <module>
----> 1 name_as_tuple[0] = 'B'

TypeError: 'tuple' object does not support item assignment

Tuple unpacking#

Tuples allow for a special assignment process. We call this tuple unpacking and it allows us to assign individual items from a tuple to their own variable.

Note

This is useful when we want to return more than one value from a function and further using the values as stored in different variables. We will make use of this later in this class.

my_name = ('Brad', 'Boehmke')
first, last = my_name
first
'Brad'
last
'Boehmke'

Knowledge check#

Questions

Given the following tuple schooling = ('UC', 'BANA', '6043')

  1. Use indexing to grab the word “BANA”.

  2. Change the value of “BANA” to “Business Analytics”. What happens?

  3. Unpack the schooling tuple into three variables: university, program, class_id.

Video 🎥:

Dictionaries#

Dictionaries are collections of key-value pairs. Think of a real dictionary – you look up a word (a key), to find its definition (a value). Any given key can have only one value.

This concept has many names depending on language: map, associative array, dictionary, and more.

Creation#

In Python, dictionaries are represented with curly braces {}. Colons separate a key from its value, and (like lists and tuples) commas delimit elements.

brad = {'first_name': 'Brad',
        'last_name': 'Boehmke',
        'alma_mater': 'NDSU',
        'employer': '84.51˚',
        'zip_code': 45385}
brad
{'first_name': 'Brad',
 'last_name': 'Boehmke',
 'alma_mater': 'NDSU',
 'employer': '84.51˚',
 'zip_code': 45385}

Dictionaries, like lists, are very flexible. Keys are generally strings (though some other types are allowed), and values can be anything – including lists or other dictionaries!

ethan = {
    'first_name': 'Ethan',
    'last_name': 'Swan',
    'alma_mater': 'Notre Dame',
    'employer': '84.51˚',
    'zip_code': 45208
    }

# A dictionary of dictionaries!
instructors = {'brad': brad, 'ethan': ethan}
instructors
{'brad': {'first_name': 'Brad',
  'last_name': 'Boehmke',
  'alma_mater': 'NDSU',
  'employer': '84.51˚',
  'zip_code': 45385},
 'ethan': {'first_name': 'Ethan',
  'last_name': 'Swan',
  'alma_mater': 'Notre Dame',
  'employer': '84.51˚',
  'zip_code': 45208}}

Indexing#

Similar to lists and tuples, we can index using brackets. However, rather than indexing with an element number we index by passing the key in the brackets (my_dict['key']).

brad['employer']
'84.51˚'

You’ll get a KeyError if you try to access a non-existent key:

brad['undergrad']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/8f/c06lv6q17tjbyjv2nkt0_s4s1sh0tg/T/ipykernel_98564/1301033579.py in <module>
----> 1 brad['undergrad']

KeyError: 'undergrad'

Although not necessarily indexing, we can also use the keys() and values() methods to extract the keys-values information.

Note

Don’t worry about the what type of object these outputs are, just realize that we can extract them in this manner.

brad.keys()
dict_keys(['first_name', 'last_name', 'alma_mater', 'employer', 'zip_code'])
brad.values()
dict_values(['Brad', 'Boehmke', 'NDSU', '84.51˚', 45385])

Operators#

Dictionaries do not support concatenation operators like lists and tuples…

brad + {'number': '800-867-5309'} # not my number so don't actually call it!
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/8f/c06lv6q17tjbyjv2nkt0_s4s1sh0tg/T/ipykernel_98564/2920717387.py in <module>
----> 1 brad + {'number': '800-867-5309'} # not my number so don't actually call it!

TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

But they do support membership operators; however, keep in mind that membership operators are focusing on the keys, not the values.

'zip_code' in brad
True

Mutability#

Dictionaries are mutable. This means that they can be changed in place. For example, if we want to add an element to a dictionary, we use simple syntax.

brad['first_name'] = 'Bradley'   # Change an existing value
brad['number'] = '800-867-5309'  # Add a new key-value pair
brad
{'first_name': 'Bradley',
 'last_name': 'Boehmke',
 'alma_mater': 'NDSU',
 'employer': '84.51˚',
 'zip_code': 45385,
 'number': '800-867-5309'}

Knowledge check#

Questions:

Imagine you need a way to quickly determine a company’s CEO given the company name. You could use a dictionary such that ceos[‘company_name’] = ‘CEO name’.

  1. Create a dictionary ceos with two company CEOs:

    • Apple: Tim Cook

    • Microsoft: Satya Nadella

  2. Now add Bob Iger as the CEO of Disney.

  3. Now you realize that Bob Iger is no longer the CEO of Disney; rather, it is Bob Chapek. Update the Disney CEO to reflect this change.

Video 🎥:

Quick Review#

English name

Type

Type category

Description

Example

list

list

Sequence type

a collection of objects - mutable & ordered

['Brad', 2022, ['another', 'list']]

tuple

tuple

Sequence type

a collection of objects - immutable & ordered

('Brad', 2022, ['embedded', 'list'])

dictionary

dict

Mapping type

mapping of key-value pairs - mutable & ordered

{'name': 'BANA', 'code': 6043, 'credits': 2}

Exercises#

Questions:

  1. Create a string that contains your name. Convert this string to a list. Check if the letter ‘p’ is in this list.

  2. Why does the following cell return an error?

    t = (1, 2, 3, 4, 5)
    t[-1] = 6
    
    
  3. Given this nest dictionary grab the word “BANA”

    d = {
        "a_list": [1, 2, 3,],
         "a_dict": {"first": ["this", "is", "inception"], 
                    "second": [1, 2, 3, "BANA"]}
    }
     
    

Computing environment#

Hide code cell source
%load_ext watermark
%watermark -v -p jupyterlab,pandas
Python implementation: CPython
Python version       : 3.9.4
IPython version      : 7.26.0

jupyterlab: 3.1.4
pandas    : 1.2.4