<a href="https://colab.research.google.com/github/paiml/python_for_datascience/blob/master/Lesson5_Python_For_Data_Science_Python_Data_structure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lesson 5: Python Data Structures


## Pragmatic AI Labs



![alt text](https://paiml.com/images/logo_with_slogan_white_background.png)

This notebook was produced by [Pragmatic AI Labs](https://paiml.com/).  You can continue learning about these topics by:

*   Buying a copy of [Pragmatic AI: An Introduction to Cloud-Based Machine Learning](http://www.informit.com/store/pragmatic-ai-an-introduction-to-cloud-based-machine-9780134863917)
*   Reading an online copy of [Pragmatic AI:Pragmatic AI: An Introduction to Cloud-Based Machine Learning](https://www.safaribooksonline.com/library/view/pragmatic-ai-an/9780134863924/)
*  Watching video [Essential Machine Learning and AI with Python and Jupyter Notebook-Video-SafariOnline](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118) on Safari Books Online.
* Watching video [AWS Certified Machine Learning-Speciality](https://learning.oreilly.com/videos/aws-certified-machine/9780135556597)
* Purchasing video [Essential Machine Learning and AI with Python and Jupyter Notebook- Purchase Video](http://www.informit.com/store/essential-machine-learning-and-ai-with-python-and-jupyter-9780135261095)
*   Viewing more content at [noahgift.com](https://noahgift.com/)


## 5.1 Use lists and tuples

### Sequences
Lists, tuples, and strings are all Python sequences, and share many of the same methods.

### Creating an empty list

In [0]:
empty = []
empty

[]

### Using square brackets with initial values

In [0]:
numbers = [1, 2, 3]
numbers


[1, 2, 3]

### Casting an iterable
Any iterable can be cast to a list

In [0]:
numbers = list(range(10))
numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### Creating using multiplication

In [0]:
num_players = 10
scores = [0] * num_players
scores

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

### Mixing data types
Lists can contain multple data types

In [0]:
mixed = ['a', 1, 2.0, [13], {}]
mixed

['a', 1, 2.0, [13], {}]

### Indexing
Items in lists can be accessed using indices in a similar fashion to strings.

#### Access first item

In [0]:
numbers[0]


0

#### Access last item

In [0]:
numbers[-2]

8

#### Access any item

In [0]:
numbers[4]

4

### Adding to a list

#### Append to the end of a list

In [0]:
letters = ['a']
letters.append('c')
letters

['a', 'c']

#### Insert at beginning of list

In [0]:
letters.insert(0, 'b')
letters

['b', 'a', 'c']

#### Insert at arbitrary position

In [0]:
letters.insert(2, 'c')
letters

['b', 'a', 'c', 'c']

#### Extending with another list

In [0]:
more_letters = ['e', 'f', 'g']
letters.extend(more_letters)
letters

['b', 'a', 'c', 'c', 'e', 'f', 'g']

### Change item at some position

In [0]:
letters[3] = 'd'
letters

['b', 'a', 'c', 'd', 'e', 'f', 'g']

### Swap two items

In [0]:
letters[0], letters[1] = letters[1], letters[0]
letters

['a', 'b', 'c', 'd', 'e', 'f', 'g']

### Removing items from a list

#### Pop from the end

In [0]:
letters = ['a', 'b', 'c', 'd', 'e', 'f']
letters.pop()
letters

['a', 'b', 'c', 'd', 'e']

#### Pop by index

In [0]:
letters.pop(2)
letters

['a', 'b', 'd', 'e']

#### Remove specific item

In [0]:
letters.remove('d')
letters

['a', 'b', 'e']

### Create tuple using brackets

In [0]:
tup = (1, 2, 3)
tup

(1, 2, 3)

### Create tuple with commas

In [0]:
tup = 1, 2, 3
tup

(1, 2, 3)

### Create empty tuple

In [0]:
tup = ()
tup

()

### Create tuple with single item

In [0]:
tup = 1,
tup

(1,)

### Behaviours shared by lists and tuples
The following sequence behaviors are shared by lists and tuples

### Check item in sequence

In [0]:
3 in (1, 2, 3, 4, 5)

True

### Check item not in sequence

In [0]:
'a' not in [1, 2, 3, 4, 5]

True

### Slicing

#### Setting start, slice to the end

In [0]:
letters = 'a', 'b', 'c', 'd', 'e', 'f'
letters[3:4]


('d',)

#### Set end, slice from beginning

In [0]:
letters[:4]

('a', 'b', 'c', 'd')

#### Index from end of sequence

In [0]:
letters[-4:]

('c', 'd', 'e', 'f')

#### Setting step

In [0]:
letters[1::-2]

('b',)

### Unpacking

In [0]:
first, middle = [1, 2, 3]

f"first = {first},  middle = {middle},  last = {last}"

ValueError: ignored

### Extended unpacking

In [0]:
first, *middle, last = (1, 2, 3, 4, 5)

f"first = {first},  middle = {middle},  last = {last}"

'first = 1,  middle = [2, 3, 4],  last = 5'

### Using list as Stack
A stack is a LIFO (last in, first out) data structure which can be simulated using a list

#### Push onto the stack using append

In [0]:
stack = []
stack.append('first on')
stack.append('second on')
stack.append('third on')
stack

['first on', 'second on', 'third on']

#### Retrieve items, last one first using **pop**

In [0]:
f"Retrieved first: {stack.pop()!r}, retrieved second: {stack.pop()!r}, retrieved last: {stack.pop()!r}"

"Retrieved first: 'third on', retrieved second: 'second on', retrieved last: 'first on'"

## 5.2 Explore dictionaries 
Dictionaries are mappings of key value pairs.

### Create an empty dict using constructor

In [0]:
dictionary = {}
dictionary

{}

### Create a dictionary based on key/value pairs

In [0]:
key_values = [['key-1','value-1'], ['key-2', 'value-2']]
dictionary = dict(key_values)
dictionary

{'key-1': 'value-1', 'key-2': 'value-2'}

### Create an empty dict using curley braces

In [0]:
dictionary = {}
dictionary

{}

### Use curley braces to create a dictionary with initial key/values

In [0]:
dictionary = {'key-1': 'value-1',
              'key-2': 'value-2'}

dictionary

{'key-1': 'value-1', 'key-2': 'value-2'}

### Access value using key

In [0]:
dictionary['key-1']

'value-1'

### Add a key/value pair to an existing dictionary

In [0]:
dictionary['key-3'] = 'value-3'

dictionary

{'key-1': 'value-1', 'key-2': 'value-2', 'key-3': 'value-3'}

### Update value for existing key

In [0]:
dictionary['key-2'] = 'new-value-2'
dictionary['key-2']

'new-value-2'

### Get keys

In [0]:
list(dictionary.keys())

['key-1', 'key-2', 'key-3']

### Get values

In [0]:
dictionary.values()

dict_values(['value-1', 'new-value-2', 'value-3'])

### Get iterable keys and items

In [0]:
dictionary.items()

dict_items([('key-1', 'value-1'), ('key-2', 'new-value-2'), ('key-3', 'value-3')])

### Use items in for loop

In [0]:
for key, value in dictionary.items():
  print(f"{key}: {value}")

key-1: value-1
key-2: new-value-2
key-3: value-3


### Check if dictionary has key
The 'in' syntax we used with sequences checks the dicts keys for membership.

In [0]:
'key-5' in dictionary

False

### Get method

In [0]:
dictionary.get("bad key", "default value")

'default value'

### Remove item

In [0]:
del(dictionary['key-1'])
dictionary

{'key-2': 'new-value-2', 'key-3': 'value-3'}

### Keys must be immutable

#### List as key
Lists are mutable and not hashable

In [0]:
items = ['item-1', 'item-2', 'item-3']

map = {}

map[items] = "some-value"

TypeError: ignored

#### Tuple as a key
Tuples are immutable and hence hashable

In [0]:
items = 'item-1', 'item-2', 'item-3'
map = {}
map[items] = "some-value"

map

{('item-1', 'item-2', 'item-3'): 'some-value'}

## 5.3 Dive into sets

### Create set from tuple or list

In [0]:
letters = 'a', 'a', 'a', 'b', 'c'
unique_letters = set(letters)
unique_letters

{'a', 'b', 'c'}

### Create set from a string

In [0]:
unique_chars = set('mississippi')
unique_chars

{'i', 'm', 'p', 's'}

### Create set using curley braces

In [0]:
unique_num = {1, 1, 2, 3, 4, 5, 5}
unique_num

{1, 2, 3, 4, 5}

### Adding to a set

In [0]:
unique_num.add(6)
unique_num

{1, 2, 3, 4, 5, 6}

### Popping from a set
Pop method removes and returns a random element of the set

In [0]:
unique_num.pop()

2

### Indexing
Sets have no order, and hence cannot be accessed via indexing

In [0]:
unique_num[4]

TypeError: ignored

### Checking membership

In [0]:
3 in unique_num

True

### Set operations

In [0]:
s1 = { 1 ,2 ,3 ,4, 5, 6, 7}
s2 = { 0, 2, 4, 6, 8 }

#### Items in first set, but not in the second

In [0]:
s1 - s2

{1, 3, 5, 7}

#### Items in either or both sets

In [0]:
s1 | s2

{0, 1, 2, 3, 4, 5, 6, 7, 8}

#### Items in both sets

In [0]:
s1 & s2

{2, 4, 6}

#### Items in either set, but not both

In [0]:
s1 ^ s2

{0, 1, 3, 5, 7, 8}

## 5.4 Work with the numpy array


Numpy is an opened source numerical computing libary for python. The numpy array is a datastructure representing multidimension arrays which is optimized for both memory and performance.

### Create a numpy array from a list of lists

In [0]:
import numpy as np
list_of_lists = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]

np_array = np.array(list_of_lists)

np_array

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

### Initialize an array of zeros

In [0]:
zeros_array = np.zeros( (4, 5) )
zeros_array

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

### Initialize and array of ones

In [0]:
ones_array = np.ones( (6, 6) )
ones_array

array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

### Using arrange

In [0]:
nine = np.arange( 9 )
nine

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

### Using reshape

In [0]:
nine.reshape(3,3)

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

### Introspection

#### Get the data type

In [0]:
np_array.dtype

dtype('int64')

#### Get the array's shape

In [0]:
np_array.shape

(4, 4)

#### Get the number of items in the array

In [0]:
np_array.size

16

#### Get the size of the array in bytes

In [0]:
np_array.nbytes

128

### Setting the data type

#### dtype parameter

In [0]:
np_array = np.array(list_of_lists, dtype=np.int8)
np_array

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]], dtype=int8)

#### Size reduction

In [0]:
np_array.nbytes

16

#### The data type setting is immutible 
Data may be truncated if the data type is restrictive.

In [0]:
np_array[0][0] = 1.7344567
np_array[0][0]

1

### Array Slicing


*   Slicing can be used to get a view reprsenting a sub-array. 
*   The slice is a view to the original array, the data is not copied to a new data structure
*   The slice is taken in the form: array[ rows, columns ]






In [0]:
np_array

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]], dtype=int8)

In [0]:
np_array[2:, :3]

array([[ 9, 10, 11],
       [13, 14, 15]], dtype=int8)

### Math operations


*   Unlike a unlike nested lists, matrix operations perform mathimatical operations on data



#### Create two 3 x 3 arrays

In [0]:
np_array_1 = np.arange(9).reshape(3,3)
np_array_1


array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [0]:
np_array_2 = np.arange(10, 19).reshape(3,3)
np_array_2

array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

#### Multiply the arrays

In [0]:
np_array_1 * np_array_2

array([[  0,  11,  24],
       [ 39,  56,  75],
       [ 96, 119, 144]])

#### Add the arrays

In [0]:
np_array_1 + np_array_2

array([[10, 12, 14],
       [16, 18, 20],
       [22, 24, 26]])

### Matrix operations

#### Transpose

In [0]:
np_array.T

array([[ 1,  5,  9, 13],
       [ 2,  6, 10, 14],
       [ 3,  7, 11, 15],
       [ 4,  8, 12, 16]], dtype=int8)

#### Dot product

In [0]:
np_array_1.dot(np_array_2)


array([[ 45,  48,  51],
       [162, 174, 186],
       [279, 300, 321]])

## 5.5 Use the Pandas DataFrame
*   One of the most highly leveraged data structures for data science
*   A table-like two dimensional data structure. 


### Create a DataFrame

In [0]:
import pandas as pd
first_names = ['henry', 'rolly', 'molly', 'frank', 'david', 'steven', 'gwen', 'arthur']
last_names = ['smith', 'brocker', 'stein', 'bach', 'spencer', 'de wilde', 'mason', 'davis']
ages = [43, 23, 78, 56, 26, 14, 46, 92]

df = pd.DataFrame({ 'first': first_names, 'last': last_names, 'age': ages})
df

Unnamed: 0,age,first,last
0,43,henry,smith
1,23,rolly,brocker
2,78,molly,stein
3,56,frank,bach
4,26,david,spencer
5,14,steven,de wilde
6,46,gwen,mason
7,92,arthur,davis


### Head - looking at the top

In [0]:
df.head(10)

Unnamed: 0,age,first,last
0,43,henry,smith
1,23,rolly,brocker
2,78,molly,stein
3,56,frank,bach
4,26,david,spencer
5,14,steven,de wilde
6,46,gwen,mason
7,92,arthur,davis


### Setting number of rows returned with head

In [0]:
df.head(3)

### Tail - looking at the bottom

In [0]:
df.tail(2)

Unnamed: 0,age,first,last
6,46,gwen,mason
7,92,arthur,davis


### Describe - descriptive statistics

In [0]:
df.describe()

Unnamed: 0,age
count,8.0
mean,47.25
std,27.227874
min,14.0
25%,25.25
50%,44.5
75%,61.5
max,92.0


### Access one column

In [0]:
df['first']

0     henry
1     rolly
2     molly
3     frank
4     david
5    steven
6      gwen
7    arthur
Name: first, dtype: object

### Slice a column

In [0]:
df['first'][4:]

4     david
5    steven
6      gwen
7    arthur
Name: first, dtype: object

### Use conditions to filter

In [0]:
df[df['age'] > 50]

Unnamed: 0,age,first,last
2,78,molly,stein
3,56,frank,bach
7,92,arthur,davis


## 5.6 Use the pandas Series


*   A one dimensional labeled array
*   Contains data of only one type
*   Similar to a column in a spreedsheet




### Create a series

In [0]:
pd_series = pd.Series( [1, 2, 3 ] )
pd_series

0    1
1    2
2    3
dtype: int64

### Series introspection methods

In [0]:
f"This series is made up of {pd_series.size} items whose data type is {pd_series.dtype}"

'This series is made up of 3 items whose data type is int64'

### A Pandas DataFrame is composed of Pandas Series. 

In [0]:
age = df.age
type( age )

pandas.core.series.Series

### Some useful helper methods of a Series

#### mean

In [0]:
pd_series = pd.Series([ 1, 2, 3, 5, 6, 6, 6, 7, 8])
pd_series.mean()

4.888888888888889

#### Unique

In [0]:
pd_series.unique()

array([1, 2, 3, 5, 6, 7, 8])

#### Max

In [0]:
pd_series.min()

1

# Notes:
[Lists](https://docs.python.org/3/tutorial/datastructures.html)

[Tuples and sequences](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences)

[Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)

[Numpy arrays](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html)

[Pandas DataFrame](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.html)

[Pandas Series](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.Series.html)

