<a href="https://colab.research.google.com/github/paiml/python_for_datascience/blob/master/Lesson11_Python_For_Data_Science_Lazy_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lesson 11: Lazy Evaluation 

## Pragmatic AI Labs



![alt text](https://paiml.com/images/logo_with_slogan_white_background.png)

This notebook was produced by [Pragmatic AI Labs](https://paiml.com/).  You can continue learning about these topics by:

*   Buying a copy of [Pragmatic AI: An Introduction to Cloud-Based Machine Learning](http://www.informit.com/store/pragmatic-ai-an-introduction-to-cloud-based-machine-9780134863917)
*   Reading an online copy of [Pragmatic AI:Pragmatic AI: An Introduction to Cloud-Based Machine Learning](https://www.safaribooksonline.com/library/view/pragmatic-ai-an/9780134863924/)
*  Watching video [Essential Machine Learning and AI with Python and Jupyter Notebook-Video-SafariOnline](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118) on Safari Books Online.
* Watching video [AWS Certified Machine Learning-Speciality](https://learning.oreilly.com/videos/aws-certified-machine/9780135556597)
* Purchasing video [Essential Machine Learning and AI with Python and Jupyter Notebook- Purchase Video](http://www.informit.com/store/essential-machine-learning-and-ai-with-python-and-jupyter-9780135261095)
*   Viewing more content at [noahgift.com](https://noahgift.com/)


## 11.1 Use generators

### Lists and Generators

In [0]:
l_ten = [x for x in range(10)]
g_ten = (x for x in range(10))

print(f"l_ten is a {type(l_ten)} and prints as: {l_ten}")
print(f"g_ten is a {type(g_ten)} and prints as: {g_ten}")

l_ten is a <class 'list'> and prints as: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
g_ten is a <class 'generator'> and prints as: <generator object <genexpr> at 0x7fec69bbfc50>


### Next

In [0]:
next(g_ten)

1

### Iteration

In [0]:
for x in g_ten:
  print(x)


0
1
2
3
4
5
6
7
8
9


### Indexing

In [0]:

g_ten[3]

TypeError: ignored

### Size

In [0]:
import sys
x = 100000000
l_big = [x for x in range(x)]
g_big = (x for x in range(x))

print( f"l_big is {sys.getsizeof(l_big)} bytes")
print( f"g_big is {sys.getsizeof(g_big)} bytes")

l_big is 859724472 bytes
g_big is 88 bytes


## 11.2 Design generator pipelines

### Stringing generators together

In [0]:
evens = (x*2 for x in range(5000000))
three_factors = (x//3 for x in evens if x%3 == 0)
titles = (f"this number is {x}" for x in three_factors)
capped = (x.title() for x in titles)

print(f"The first call to capped: {next(capped)}")
print(f"The second call to capped: {next(capped)}")
print(f"The third call to capped: {next(capped)}")

The first call to capped: This Number Is 0
The second call to capped: This Number Is 2
The third call to capped: This Number Is 4


### Why use lazy evaluation
Processing large datasets in smaller pieces.
Example: Salt and protein of organic foods

#### Define generator to read file line by line

In [0]:
def row_reader(file_path):
  for line in open(file_path, 'r'):
    yield line

#### 

In [0]:
file_path = './features.en.openfoodfacts.org.products.csv'

rows = row_reader(file_path)
rows

<generator object row_reader at 0x7fd2e9af5f68>

In [0]:
next(rows)

'3,57.14,17.86,3.57,17.86,1.22428,2540,2835.7,92.86,0,Organic Salted Nut Mix\n'

#### Generator pipeline to process one line at a time

In [0]:
def row_reader(file_path):
  line_reader = (x for x in open(file_path, 'r'))
  
  organics_only = (x.split(',') for x in line_reader if x.split(',')[-1].startswith('Organic'))

  name_salt_protein = ((x[-1], x[-6], x[-7]) for x in organics_only)
  
  return name_salt_protein



rows = row_reader(file_path)

In [0]:
next(rows)

('Organic Oat Groats\n', '0.0254', '16.67')

In [0]:
import pandas
organics = pandas.DataFrame(columns=['Name', 'Salt', 'Protein'])

rows = row_reader(file_path)

for new_row in rows:
  organics.loc[len(organics)] = new_row
  
organics

Unnamed: 0,Name,Salt,Protein
0,Organic Salted Nut Mix\n,1.22428,17.86
1,Organic Muesli\n,0.1397,14.06
2,Organic Hazelnuts\n,0.01016,14.29
3,Organic Oat Groats\n,0.0254,16.67
4,Organic Quinoa Coconut Granola With Mango\n,0.02286,10.91
5,Organic Unswt Berry Coconut Granola\n,0.28194,12.96
6,Organic Red Quinoa\n,0.01016,13.33
7,Organic Blueberry Almond Granola\n,0.04572,10.91
8,Organic Coconut Chips\n,0.09398,6
9,Organic Garbanzo Beans\n,0.05334,17.02


## 11.3 Implement lazy evaluation functions

### Generator functions

In [0]:
def square_them(numbers):
  for number in numbers:
    yield number * number
    
  
s = square_them(range(10000))

print(next(s))
print(next(s))
print(next(s))
print(next(s))

0
1
4
9


### Infinite generators

In [0]:
def counter(d):
  
  while True:
    d += 1
    yield d

In [0]:
c = counter(10)

print(next(c))
print(next(c))
print(next(c))

11
12
13


### Other forms of lazy evaluation

In [0]:
def some_expensive_connection():
  import time
  time.sleep(10)
  return {}

_DB = None

def DB():
  global _DB
  if _DB is None:
    _DB = some_expensive_connection()
    
    
    
    

  

# File setup

In [0]:
from google.colab import files
# /Users/kbehrman/Google-Drive/projects/pragailabs/python-for-data-science/food/data
files.upload()
!ls

'features.en.openfoodfacts.org.products (1).csv'
'features.en.openfoodfacts.org.products (2).csv'
 features.en.openfoodfacts.org.products.csv
 sample_data
