5  Data Types for Multiple Values

5.1 Introduction

In Chapter 4 we learned about the int, float, str and bool data types, which all had single values. But we often have to deal with many values. For example, suppose you wanted to analyze the past daily sales of a company in recent years. It would not be very convenient to assign each of the hundreds of values of sales to different variables and work with them. Python has other data types available to deal with multiple values: lists, tuples, dictionaries, sets and frozen sets. These will be the focus of this chapter.

In later chapters we will also see that there are Python modules that have other data types. In this chapter we will focus on the data types that come built in.

5.2 Lists

A very common way to store a sequence of values (which could be integers, floats, strings or Booleans), is in a list. You can create a list by putting the values in between square brackets, separated by commas:

a = [2, 4, 6]
a
[2, 4, 6]

Lists can also be composed of floats, Booleans, or strings, or even a combination of them:

a = [1, 1.1, True, 'hello']

Lists can even have other lists as elements:

a = [1, 2, [3, 4]]

Although we see 4 numbers, this list actually has only 3 elements:

len(a)
3

where the len() function returns the length of its argument. The [3, 4] is actually considered one element and is itself a list. This kind of a list is called a nested list.

5.2.1 List Operations

If we use the + operator on lists it just creates a longer list with one appended to the other:

a = [2, 4, 6]
b = [8, 10]
a + b
[2, 4, 6, 8, 10]

If we use the * operator it repeats the list:

a = [2, 4, 6]
a * 3 
[2, 4, 6, 2, 4, 6, 2, 4, 6]

We cannot use the - and / operators on lists. In these ways lists behave like strings.

5.2.2 List Indexing

Lists are ordered so the order in which we place the elements matters. To extract a particular value from a list based on its position in the list we can use a method called indexing. If we want the first element of the list we can extract it with a[0]:

a[0]
2

The 0 here is the index. It means to take element 0 from the list. In Python and several other programming languages (like C and C++), counting starts at 0 instead of 1. So element 0 is actually the 1st element. This is something that might take some getting used to, so be careful when using indexing.

Even though the list has 3 elements, the last element is extracted with a[2]:

a[2]
6

If you try to do a[3] you get an IndexError saying the list index is out of range.

We can also use negative indexing to extract elements from the end of a list. For example, to get the last element of a list a we can use a[-1], to get the 2nd-last element we can use a[2], and so on:

a = [1, 2, 3, 4, 5, 6]
a[-1]
6
a[-2]
5

Indexing like this also works for strings, and many other objects with multiple values. For example, to get the first character in a string we can get the value at index 0:

a = 'hello'
a[0]
'h'

We can also use indexing to change values in a list:

a = [2, 4, 6]
a[0] = 8
a
[8, 4, 6]

Because lists have this property, we say they are mutable. This is unlike strings which are immutable. You can’t change a character in a string using indexing (try it out with the commands a = 'hello' and a[0] = g).

5.2.3 List Slicing

To get all elements starting from 1 up to but not including index 3 (the 2nd and 3rd element) we can do:

a = [1, 2, 3, 4, 5]
a[1:3]
[2, 3]

To get all elements starting from index 2 (the 3rd element onwards) we can do:

a[2:]
[3, 4, 5]

To get all elements up to but not including index 2 (the 1st and 2nd element) we can do:

a[:2]
[1, 2]

Finally, the following just returns the original list:

a[:]
[1, 2, 3, 4, 5]

5.2.4 List Methods

Lists, like many objects in Python, have methods. A method in Python is like a function but instead of using the object as an argument to the function, we apply the function to the object. We’ll see what we mean by this with an example. Suppose we wanted to add another number to our list at the end, like the number 8. Instead of recreating the entire list with a = [2, 4, 6, 8], we can append 8 to the end of the list using the append() method. Methods are invoked by placing them after the object separated with a . like this:

a = [2, 4, 6]
a.append(8)
a
[2, 4, 6, 8]

Notice that we didn’t need to assign the output of append() to an object with =. It altered a in place. This is what the method does.

To remove an element from a list we can use the pop() method. For example, to remove the 2nd element (element with index 1), we can do:

a = [2, 4, 6]
a.pop(1)
a
[2, 6]

Another list method is reverse() which reverses the ordering of the list:

a = [2, 6, 4]
a.reverse()
a
[4, 6, 2]

To sort a list ascending we can use sort():

a = [1, 3, 2]
a.sort()
a
[1, 2, 3]

To see the full list of methods available for your list, you can use the command dir(a).

Using the sort() method changes our original list. Sometimes we want to see the sorted version of a list but then go back to the original ordering. In this case you shouldn’t use the sort() method on the list but use the sorted() function to create a sorted version of the list. The sorted() function returns its input sorted:

a = [1, 3, 2]
b = sorted(a)
b
[1, 2, 3]

5.2.5 Iterating over Items in a List

A useful feature of a list is that we can iterate over each element, performing the same operation or set of operations on each element one by one. For example, suppose we wanted to see what the square of each element in the list was. We can use what is called a for loop to do this. Here is how to code it:

a = [2, 4, 6, 8]
for i in a:
    print(i ** 2)
4
16
36
64

In words, what is happening is “for all i in the list a, print i^2”. We use i as a sort of temporary variable for each element in a. The next line then prints i ** 2 which squares i. You will notice that the print() command is indented with 4 spaces. This is to tell Python that this command is part of the loop. When there is code under the for loop that is not indented, Python interprets this as not being part of the loop.

To understand this, compare the following two snippets, which are almost the same except the first print('hello') is indented and the second is not:

a = [2, 4, 6, 8]
for i in a:
    print(i ** 2)
    print('hello')
4
hello
16
hello
36
hello
64
hello
a = [2, 4, 6]
for i in a:
    print(i ** 2)
print('hello')
4
16
36
hello

The first code prints 'hello' 3 times, and the second only once, even though the code looks almost the same except for the indentations. This is because in the first case, the indentation tells Python that that print() call is part of the loop. In each iteration of the loop, we have to print the square of i and print hello. The loop iterates 3 times, so we see 'hello' 3 times.

In the second case, the lack of indentation tells Python that the print('hello') is not in the loop. Python first finishes the loop (squaring each element of a and printing it). It only then gets to the next part of the code and prints 'hello'.

Therefore it is very important to be careful with indentation with Python. You should indent with 4 spaces (not tabs) for content in a loop.

Another thing to note here is that a for loop is a situation where the code is no longer running line-by-line from top to bottom. The code goes to the end of the loop and if there are iterations remaining to be done it goes back to the start of the loop. Only when it has completed all the iterations does it go to the next line after the loop.

5.2.6 List Comprehensions

Suppose we wanted to save the square of each element of a into a new list called b. One way to do that would be to create an empty list called b with b = []. This is a list with no elements. Then we could use the for loop to append the values to b, like this:

a = [2, 4, 6]
b = []
for i in a:
  b.append(i ** 2)
b
[4, 16, 36]

This works just fine, but the code is a bit “clunky”. Moreover, if your list is very very large it would run very slowly. A cleaner and faster way to do this kind of operation is by using list comprehensions.

a = [2, 4, 6]
b = [i ** 2 for i in a]
b
[4, 16, 36]

This is a very neat and compact way to create the new list. It also reads similar to how we would describe what is happening: “make a list which is i^2 for all elements i in the list a”.

5.2.7 List Membership

To see if an element is contained somewhere in a list, we can use the in operator:

a = [2, 4, 6]
4 in a
True
5 in a
False

4 is in a so we get True, but 5 is not so we get False.

5.2.8 Copying Lists

One thing to note about lists, which may be unexpected, is that if we create a list a and set b = a, we are actually telling Python that a and b refer to the same object, not just that they have the same values. This has the consequence that if we change a that b will also change. For example:

a = [2, 4, 6]
b = a
a[0] = 8
b
[8, 4, 6]

We set b = a but otherwise perform no operations on b. We change the first element of a (element 0) to 8, and the first element of b changes to 8 as well!

Often when we are programming we don’t want this to happen. We often want to copy a list to a new one to perform some operations and leave the original list unchanged. What we can do instead is set b equal to a[:] instead of a. This way b won’t change when a changes:

a = [2, 4, 6]
b = a[:]
a[0] = 8
b
[2, 4, 6]

Another way is to use the copy() method:

a = [2, 4, 6]
b = a.copy()
a[0] = 8
b
[2, 4, 6]

Because there are two different ways of copying objects with different consequences, we have two different terms for them:

  1. Deep copy: This copies a to b and recursively copies all of its elements, resulting in a completely independent object.
  2. Shallow copy: This copies a to b but does not recursively copy its elements. Instead it only copies the references to the elements in a (like the address for where in the computer’s memory those elements are stored). This means that changes to elements of a will affect the elements of b.

The b = a example is a shallow copy and the b = a[:] example is like a deep copy. However, it is not a full deep copy. Using a[:] or a.copy() only works if our list is not nested. This method only takes a deep copy of the outermost list. If we copy with a[:] or a.copy() to b with a nested listed and then change an element inside one of the nested lists, then the copied object will change as well. For example:

a = [2, 4, [6, 8]]
b = a[:]
a[2][1] = 5
b
[2, 4, [6, 5]]

b changes as well! The same happens with the copy() method:

a = [2, 4, [6, 8]]
b = a.copy()
a[2][1] = 5
b
[2, 4, [6, 5]]

To make a full deep copy which recursively copies the entire object, we can use the deepcopy() function from the copy module:

import copy
a = [2, 4, [6, 8]]
b = copy.deepcopy(a)
a[2][1] = 5
b
[2, 4, [6, 8]]

5.3 Tuples

A tuple is another data type that is quite similar to a list. One important difference, however, is that they are immutable. We cannot change individual values of a tuple after they are created, and we cannot append values to a tuple.

We can create a tuple in Python using parentheses instead of square brackets:

a = (2, 4, 6)

Indexing and many other operations that work for lists also work with tuples. We index them the same way as lists (using square brackets like a[0]) and we can iterate over the items with for loops in the same way. However the list of methods for tuples is much shorter. We cannot append or pop values because the tuples are immutable.

5.3.1 Tuple Assignment

One useful thing we can do with tuples is tuple assignment. Suppose we have a list x = ['a', 'b', 'c'] and we wanted to create 3 objects from this: x_0 = 'a', x_1 = 'b' and x_2 = 'c'. One way to do this is:

x = ['a', 'b', 'c']
x_0 = x[0]
x_1 = x[1]
x_2 = x[2]

But a much more elegant way to do this is using tuple assignment:

x = ['a', 'b', 'c']
(x_0, x_1, x_2) = x

This assigns 'a' to x_0, 'b' to x_1 and 'c' to x_2 all in one line.

This is especially useful if you have a function that returns multiple objects and we want to assign each output to a different variable. For example, the function divmod(a, b) gives the quotient and remainder from dividing a with b. It essentially calculates a // b and a % b and returns a tuple with both objects:

divmod(7, 3)
(2, 1)

This means 7 divided by 3 is 2 with remainder 1. We can use tuple assignment with the output to get:

(quotient, remainder) = divmod(7, 3)
quotient
2
remainder
1

5.4 Dictionaries

Another common built-in data type is a dictionary. A dictionary maps keys to values, where the keys can be an immutable data type (usually an integer or string) and the values can be any type, for example, single values, lists, or tuples. For example, a company might have supplier IDs for its suppliers and a dictionary mapping those IDs to the actual company name. In this case, the company IDs are the keys and the company names are the values.

We could create a simple dictionary like this as follows:

suppliers = {100001 : 'ABC Ltd.', 100002 : 'EFG Ltd.'}

Dictionaries are created within curly brackets with the structure {key1 : value1, key2 : value2, key3 : value3}.

To find a company name using the company ID we provide the key in the place we would supply an index for a list or tuple:

suppliers[100001]
'ABC Ltd.'

Dictionaries are unordered, so we cannot do suppliers[0] to find the first supplier. There is no first value in a dictionary.

We can also add new keys and values to the dictionary:

suppliers[100003] = 'HIJ Ltd.'
suppliers
{100001: 'ABC Ltd.', 100002: 'EFG Ltd.', 100003: 'HIJ Ltd.'}

We can also modify values:

suppliers[100003] = 'KLM Ltd.'
suppliers
{100001: 'ABC Ltd.', 100002: 'EFG Ltd.', 100003: 'KLM Ltd.'}

To get all the keys in a dictionary we can use the keys() method:

suppliers.keys()
dict_keys([100001, 100002, 100003])

And to get all the values in a dictionary we can use the values() method:

suppliers.values()
dict_values(['ABC Ltd.', 'EFG Ltd.', 'KLM Ltd.'])

Using a for loop with a dictionary implicitly iterates over the keys. So we can loop over the keys of a dictionary in the following way:

for key in suppliers:
    print('Supplier with ID ' + str(key) + ' is ' + suppliers[key])
Supplier with ID 100001 is ABC Ltd.
Supplier with ID 100002 is EFG Ltd.
Supplier with ID 100003 is KLM Ltd.

Finally, to create an empty dictionary we can use {}.

5.5 Sets and Frozen Sets

A set is another way to store multiple items into a single variable. Sets are unordered and unindexed. This means you cannot extract individual elements using their index like a list, nor by their key like a dictionary.

You can create a set by placing items (like integers or strings) inside curly brackets ({}) separated by commas:

myset = {'apple', 'banana', 'cherry'}
myset
{'apple', 'banana', 'cherry'}

Sets cannot have duplicate items. It only keeps the unique values. For example, suppose we provide 'cherry' twice:

myset = {'apple', 'banana', 'cherry', 'cherry'}
myset
{'apple', 'banana', 'cherry'}

It only keeps the first 'cherry'.

You are, however, able to add and remove elements to a set.

myset = {'apple', 'banana', 'cherry'}
myset.add('pear')
myset
{'apple', 'banana', 'cherry', 'pear'}
myset = {'apple', 'banana', 'cherry'}
myset.remove('apple')
myset
{'banana', 'cherry'}

Converting a list to a set is useful if you want to get the list of unique elements:

fruits = ['apple', 'apple', 'apple', 'banana', 'cherry', 'banana']
set(fruits)
{'apple', 'banana', 'cherry'}

We can iterate over sets just like lists (with for i in myset). We can also perform set operations on pairs of sets.

For example, for two sets A and B, we can find A\cap B (the set of elements contained in both sets) using:

set_a = {1, 2, 4, 6, 8, 9}
set_b = {2, 3, 5, 7, 8}
set_a.intersection(set_b)
{2, 8}

The numbers 2 and 8 are the only numbers in both sets.

To find A\cup B (the set of elements contained in either set) we can do:

set_a.union(set_b)
{1, 2, 3, 4, 5, 6, 7, 8, 9}

This gets all the numbers appearing in the two sets (dropping duplicates).

To find A\setminus B (the set of elements in A not contained in B) we can do:

set_a.difference(set_b)
{1, 4, 6, 9}

1, 4, 6 and 9 are in A and not in B. The number 2, for example, is not here because that is also in B.

To create a set that is immutable (so that you cannot add or remove items), you can use the frozenset() function:

myset = frozenset([1, 2, 3])

You can still use the same operations on frozensets as normal sets, except you cannot modify them once they are created.