Open In Colab

Lesson 5: Python Data Structures

Pragmatic AI Labs

alt text

This notebook was produced by Pragmatic AI Labs. You can continue learning about these topics by:

5.1 Use lists and tuples

Sequences

Lists, tuples, and strings are all Python sequences, and share many of the same methods.

Creating an empty list

empty = []
empty
[]

Using square brackets with initial values

numbers = [1, 2, 3]
numbers

[1, 2, 3]

Casting an iterable

Any iterable can be cast to a list

numbers = list(range(10))
numbers
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Creating using multiplication

num_players = 10
scores = [0] * num_players
scores
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Mixing data types

Lists can contain multple data types

mixed = ['a', 1, 2.0, [13], {}]
mixed
['a', 1, 2.0, [13], {}]

Indexing

Items in lists can be accessed using indices in a similar fashion to strings.

Access first item

numbers[0]

0

Access last item

numbers[-2]
8

Access any item

numbers[4]
4

Adding to a list

Append to the end of a list

letters = ['a']
letters.append('c')
letters
['a', 'c']

Insert at beginning of list

letters.insert(0, 'b')
letters
['b', 'a', 'c']

Insert at arbitrary position

letters.insert(2, 'c')
letters
['b', 'a', 'c', 'c']

Extending with another list

more_letters = ['e', 'f', 'g']
letters.extend(more_letters)
letters
['b', 'a', 'c', 'c', 'e', 'f', 'g']

Change item at some position

letters[3] = 'd'
letters
['b', 'a', 'c', 'd', 'e', 'f', 'g']

Swap two items

letters[0], letters[1] = letters[1], letters[0]
letters
['a', 'b', 'c', 'd', 'e', 'f', 'g']

Removing items from a list

Pop from the end

letters = ['a', 'b', 'c', 'd', 'e', 'f']
letters.pop()
letters
['a', 'b', 'c', 'd', 'e']

Pop by index

letters.pop(2)
letters
['a', 'b', 'd', 'e']

Remove specific item

letters.remove('d')
letters
['a', 'b', 'e']

Create tuple using brackets

tup = (1, 2, 3)
tup
(1, 2, 3)

Create tuple with commas

tup = 1, 2, 3
tup
(1, 2, 3)

Create empty tuple

tup = ()
tup
()

Create tuple with single item

tup = 1,
tup
(1,)

Behaviours shared by lists and tuples

The following sequence behaviors are shared by lists and tuples

Check item in sequence

3 in (1, 2, 3, 4, 5)
True

Check item not in sequence

'a' not in [1, 2, 3, 4, 5]
True

Slicing

Setting start, slice to the end

letters = 'a', 'b', 'c', 'd', 'e', 'f'
letters[3:4]

('d',)

Set end, slice from beginning

letters[:4]
('a', 'b', 'c', 'd')

Index from end of sequence

letters[-4:]
('c', 'd', 'e', 'f')

Setting step

letters[1::-2]
('b',)

Unpacking

first, middle = [1, 2, 3]

f"first = {first},  middle = {middle},  last = {last}"

    ---------------------------------------------------------------------------

    ValueError                                Traceback (most recent call last)

    <ipython-input-38-c24a37f354b9> in <module>()
    ----> 1 first, middle = [1, 2, 3]
          2 
          3 f"first = {first},  middle = {middle},  last = {last}"


    ValueError: too many values to unpack (expected 2)


Extended unpacking

first, *middle, last = (1, 2, 3, 4, 5)

f"first = {first},  middle = {middle},  last = {last}"
'first = 1,  middle = [2, 3, 4],  last = 5'

Using list as Stack

A stack is a LIFO (last in, first out) data structure which can be simulated using a list

Push onto the stack using append

stack = []
stack.append('first on')
stack.append('second on')
stack.append('third on')
stack
['first on', 'second on', 'third on']

Retrieve items, last one first using pop

f"Retrieved first: {stack.pop()!r}, retrieved second: {stack.pop()!r}, retrieved last: {stack.pop()!r}"
"Retrieved first: 'third on', retrieved second: 'second on', retrieved last: 'first on'"

5.2 Explore dictionaries

Dictionaries are mappings of key value pairs.

Create an empty dict using constructor

dictionary = {}
dictionary
{}

Create a dictionary based on key/value pairs

key_values = [['key-1','value-1'], ['key-2', 'value-2']]
dictionary = dict(key_values)
dictionary
{'key-1': 'value-1', 'key-2': 'value-2'}

Create an empty dict using curley braces

dictionary = {}
dictionary
{}

Use curley braces to create a dictionary with initial key/values

dictionary = {'key-1': 'value-1',
              'key-2': 'value-2'}

dictionary
{'key-1': 'value-1', 'key-2': 'value-2'}

Access value using key

dictionary['key-1']
'value-1'

Add a key/value pair to an existing dictionary

dictionary['key-3'] = 'value-3'

dictionary
{'key-1': 'value-1', 'key-2': 'value-2', 'key-3': 'value-3'}

Update value for existing key

dictionary['key-2'] = 'new-value-2'
dictionary['key-2']
'new-value-2'

Get keys

list(dictionary.keys())
['key-1', 'key-2', 'key-3']

Get values

dictionary.values()
dict_values(['value-1', 'new-value-2', 'value-3'])

Get iterable keys and items

dictionary.items()
dict_items([('key-1', 'value-1'), ('key-2', 'new-value-2'), ('key-3', 'value-3')])

Use items in for loop

for key, value in dictionary.items():
  print(f"{key}: {value}")
key-1: value-1
key-2: new-value-2
key-3: value-3

Check if dictionary has key

The ‘in’ syntax we used with sequences checks the dicts keys for membership.

'key-5' in dictionary
False

Get method

dictionary.get("bad key", "default value")
'default value'

Remove item

del(dictionary['key-1'])
dictionary
{'key-2': 'new-value-2', 'key-3': 'value-3'}

Keys must be immutable

List as key

Lists are mutable and not hashable

items = ['item-1', 'item-2', 'item-3']

map = {}

map[items] = "some-value"

    ---------------------------------------------------------------------------

    TypeError                                 Traceback (most recent call last)

    <ipython-input-66-25faa77a670a> in <module>()
          3 map = {}
          4 
    ----> 5 map[items] = "some-value"
    

    TypeError: unhashable type: 'list'


Tuple as a key

Tuples are immutable and hence hashable

items = 'item-1', 'item-2', 'item-3'
map = {}
map[items] = "some-value"

map
{('item-1', 'item-2', 'item-3'): 'some-value'}

5.3 Dive into sets

Create set from tuple or list

letters = 'a', 'a', 'a', 'b', 'c'
unique_letters = set(letters)
unique_letters
{'a', 'b', 'c'}

Create set from a string

unique_chars = set('mississippi')
unique_chars
{'i', 'm', 'p', 's'}

Create set using curley braces

unique_num = {1, 1, 2, 3, 4, 5, 5}
unique_num
{1, 2, 3, 4, 5}

Adding to a set

unique_num.add(6)
unique_num
{1, 2, 3, 4, 5, 6}

Popping from a set

Pop method removes and returns a random element of the set

unique_num.pop()
2

Indexing

Sets have no order, and hence cannot be accessed via indexing

unique_num[4]

    ---------------------------------------------------------------------------

    TypeError                                 Traceback (most recent call last)

    <ipython-input-75-c928415e5703> in <module>()
    ----> 1 unique_num[4]
    

    TypeError: 'set' object does not support indexing


Checking membership

3 in unique_num
True

Set operations

s1 = { 1 ,2 ,3 ,4, 5, 6, 7}
s2 = { 0, 2, 4, 6, 8 }

Items in first set, but not in the second

s1 - s2
{1, 3, 5, 7}

Items in either or both sets

s1 | s2
{0, 1, 2, 3, 4, 5, 6, 7, 8}

Items in both sets

s1 & s2
{2, 4, 6}

Items in either set, but not both

s1 ^ s2
{0, 1, 3, 5, 7, 8}

5.4 Work with the numpy array

Numpy is an opened source numerical computing libary for python. The numpy array is a datastructure representing multidimension arrays which is optimized for both memory and performance.

Create a numpy array from a list of lists

import numpy as np
list_of_lists = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]

np_array = np.array(list_of_lists)

np_array
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

Initialize an array of zeros

zeros_array = np.zeros( (4, 5) )
zeros_array
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Initialize and array of ones

ones_array = np.ones( (6, 6) )
ones_array
array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

Using arrange

nine = np.arange( 9 )
nine
array([0, 1, 2, 3, 4, 5, 6, 7, 8])

Using reshape

nine.reshape(3,3)
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Introspection

Get the data type

np_array.dtype
dtype('int64')

Get the array’s shape

np_array.shape
(4, 4)

Get the number of items in the array

np_array.size
16

Get the size of the array in bytes

np_array.nbytes
128

Setting the data type

dtype parameter

np_array = np.array(list_of_lists, dtype=np.int8)
np_array
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]], dtype=int8)

Size reduction

np_array.nbytes
16

The data type setting is immutible

Data may be truncated if the data type is restrictive.

np_array[0][0] = 1.7344567
np_array[0][0]
1

Array Slicing

  • Slicing can be used to get a view reprsenting a sub-array.
  • The slice is a view to the original array, the data is not copied to a new data structure
  • The slice is taken in the form: array[ rows, columns ]
np_array
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]], dtype=int8)
np_array[2:, :3]
array([[ 9, 10, 11],
       [13, 14, 15]], dtype=int8)

Math operations

  • Unlike a unlike nested lists, matrix operations perform mathimatical operations on data

Create two 3 x 3 arrays

np_array_1 = np.arange(9).reshape(3,3)
np_array_1

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
np_array_2 = np.arange(10, 19).reshape(3,3)
np_array_2
array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

Multiply the arrays

np_array_1 * np_array_2
array([[  0,  11,  24],
       [ 39,  56,  75],
       [ 96, 119, 144]])

Add the arrays

np_array_1 + np_array_2
array([[10, 12, 14],
       [16, 18, 20],
       [22, 24, 26]])

Matrix operations

Transpose

np_array.T
array([[ 1,  5,  9, 13],
       [ 2,  6, 10, 14],
       [ 3,  7, 11, 15],
       [ 4,  8, 12, 16]], dtype=int8)

Dot product

np_array_1.dot(np_array_2)

array([[ 45,  48,  51],
       [162, 174, 186],
       [279, 300, 321]])

5.5 Use the Pandas DataFrame

  • One of the most highly leveraged data structures for data science
  • A table-like two dimensional data structure.

Create a DataFrame

import pandas as pd
first_names = ['henry', 'rolly', 'molly', 'frank', 'david', 'steven', 'gwen', 'arthur']
last_names = ['smith', 'brocker', 'stein', 'bach', 'spencer', 'de wilde', 'mason', 'davis']
ages = [43, 23, 78, 56, 26, 14, 46, 92]

df = pd.DataFrame({ 'first': first_names, 'last': last_names, 'age': ages})
df
age first last
0 43 henry smith
1 23 rolly brocker
2 78 molly stein
3 56 frank bach
4 26 david spencer
5 14 steven de wilde
6 46 gwen mason
7 92 arthur davis

Head - looking at the top

df.head(10)
age first last
0 43 henry smith
1 23 rolly brocker
2 78 molly stein
3 56 frank bach
4 26 david spencer
5 14 steven de wilde
6 46 gwen mason
7 92 arthur davis

Setting number of rows returned with head

df.head(3)

Tail - looking at the bottom

df.tail(2)
age first last
6 46 gwen mason
7 92 arthur davis

Describe - descriptive statistics

df.describe()
age
count 8.000000
mean 47.250000
std 27.227874
min 14.000000
25% 25.250000
50% 44.500000
75% 61.500000
max 92.000000

Access one column

df['first']
0     henry
1     rolly
2     molly
3     frank
4     david
5    steven
6      gwen
7    arthur
Name: first, dtype: object

Slice a column

df['first'][4:]
4     david
5    steven
6      gwen
7    arthur
Name: first, dtype: object

Use conditions to filter

df[df['age'] > 50]
age first last
2 78 molly stein
3 56 frank bach
7 92 arthur davis

5.6 Use the pandas Series

  • A one dimensional labeled array
  • Contains data of only one type
  • Similar to a column in a spreedsheet

Create a series

pd_series = pd.Series( [1, 2, 3 ] )
pd_series
0    1
1    2
2    3
dtype: int64

Series introspection methods

f"This series is made up of {pd_series.size} items whose data type is {pd_series.dtype}"
'This series is made up of 3 items whose data type is int64'

A Pandas DataFrame is composed of Pandas Series.

age = df.age
type( age )
pandas.core.series.Series

Some useful helper methods of a Series

mean

pd_series = pd.Series([ 1, 2, 3, 5, 6, 6, 6, 7, 8])
pd_series.mean()
4.888888888888889

Unique

pd_series.unique()
array([1, 2, 3, 5, 6, 7, 8])

Max

pd_series.min()
1

Notes:

Lists

Tuples and sequences

Dictionaries

Numpy arrays

Pandas DataFrame

Pandas Series