Lesson 11: Lazy Evaluation
Pragmatic AI Labs
This notebook was produced by Pragmatic AI Labs. You can continue learning about these topics by:
- Buying a copy of Pragmatic AI: An Introduction to Cloud-Based Machine Learning
- Reading an online copy of Pragmatic AI:Pragmatic AI: An Introduction to Cloud-Based Machine Learning
- Watching video Essential Machine Learning and AI with Python and Jupyter Notebook-Video-SafariOnline on Safari Books Online.
- Watching video AWS Certified Machine Learning-Speciality
- Purchasing video Essential Machine Learning and AI with Python and Jupyter Notebook- Purchase Video
- Viewing more content at noahgift.com
11.1 Use generators
Lists and Generators
l_ten = [x for x in range(10)]
g_ten = (x for x in range(10))
print(f"l_ten is a {type(l_ten)} and prints as: {l_ten}")
print(f"g_ten is a {type(g_ten)} and prints as: {g_ten}")
l_ten is a <class 'list'> and prints as: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
g_ten is a <class 'generator'> and prints as: <generator object <genexpr> at 0x7fec69bbfc50>
Next
next(g_ten)
1
Iteration
for x in g_ten:
print(x)
0
1
2
3
4
5
6
7
8
9
Indexing
g_ten[3]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-95-e7b8f961aa33> in <module>()
1
----> 2 g_ten[3]
TypeError: 'generator' object is not subscriptable
Size
import sys
x = 100000000
l_big = [x for x in range(x)]
g_big = (x for x in range(x))
print( f"l_big is {sys.getsizeof(l_big)} bytes")
print( f"g_big is {sys.getsizeof(g_big)} bytes")
l_big is 859724472 bytes
g_big is 88 bytes
11.2 Design generator pipelines
Stringing generators together
evens = (x*2 for x in range(5000000))
three_factors = (x//3 for x in evens if x%3 == 0)
titles = (f"this number is {x}" for x in three_factors)
capped = (x.title() for x in titles)
print(f"The first call to capped: {next(capped)}")
print(f"The second call to capped: {next(capped)}")
print(f"The third call to capped: {next(capped)}")
The first call to capped: This Number Is 0
The second call to capped: This Number Is 2
The third call to capped: This Number Is 4
Why use lazy evaluation
Processing large datasets in smaller pieces. Example: Salt and protein of organic foods
Define generator to read file line by line
def row_reader(file_path):
for line in open(file_path, 'r'):
yield line
####
file_path = './features.en.openfoodfacts.org.products.csv'
rows = row_reader(file_path)
rows
<generator object row_reader at 0x7fd2e9af5f68>
next(rows)
'3,57.14,17.86,3.57,17.86,1.22428,2540,2835.7,92.86,0,Organic Salted Nut Mix\n'
Generator pipeline to process one line at a time
def row_reader(file_path):
line_reader = (x for x in open(file_path, 'r'))
organics_only = (x.split(',') for x in line_reader if x.split(',')[-1].startswith('Organic'))
name_salt_protein = ((x[-1], x[-6], x[-7]) for x in organics_only)
return name_salt_protein
rows = row_reader(file_path)
next(rows)
('Organic Oat Groats\n', '0.0254', '16.67')
import pandas
organics = pandas.DataFrame(columns=['Name', 'Salt', 'Protein'])
rows = row_reader(file_path)
for new_row in rows:
organics.loc[len(organics)] = new_row
organics
Name | Salt | Protein | |
---|---|---|---|
0 | Organic Salted Nut Mix\n | 1.22428 | 17.86 |
1 | Organic Muesli\n | 0.1397 | 14.06 |
2 | Organic Hazelnuts\n | 0.01016 | 14.29 |
3 | Organic Oat Groats\n | 0.0254 | 16.67 |
4 | Organic Quinoa Coconut Granola With Mango\n | 0.02286 | 10.91 |
5 | Organic Unswt Berry Coconut Granola\n | 0.28194 | 12.96 |
6 | Organic Red Quinoa\n | 0.01016 | 13.33 |
7 | Organic Blueberry Almond Granola\n | 0.04572 | 10.91 |
8 | Organic Coconut Chips\n | 0.09398 | 6 |
9 | Organic Garbanzo Beans\n | 0.05334 | 17.02 |
10 | Organic Yellow Split Peas\n | 0.05588 | 28.89 |
11 | Organic Trail Mix\n | 0.127 | 13.33 |
12 | Organic Raw Pumpkin Seeds\n | 0.04318 | 30 |
13 | Organic Tamari Pumpkin Seed\n | 0.97028 | 26.47 |
14 | Organic Harvest Pilaf\n | 0.02794 | 15.56 |
15 | Organic Salted Pistachios\n | 1.45034 | 21.43 |
16 | Organic Medjool Dates\n | 0.0127 | 2.2 |
17 | Organic Whole Cashews\n | 0.0381 | 14.71 |
18 | Organic Flourless Sprouted 7-Grain Bread\n | 0.6731 | 11.76 |
19 | Organic Sunny Days Snack Bars\n | 0.60198 | 5.26 |
20 | Organic Nine Grain All Natural Bread\n | 1.0033 | 11.63 |
21 | Organic 100% Whole Wheat\n | 0.82804 | 9.3 |
22 | Organic Great Seed\n | 0.88646 | 11.63 |
23 | Organic Tortellini Pasta\n | 0.381 | 10 |
24 | Organic Ravioli\n | 0.5588 | 9 |
25 | Organic Broccoli Florets\n | 0.07366 | 3.53 |
26 | Organic Creamy Tomato Bisque\n | 0.75692 | 1.22 |
27 | Organic Green Peas\n | 0.5715 | 5.62 |
28 | Organic Mixed Vegetable\n | 0.19304 | 2.35 |
29 | Organic Beef Burger\n | 0.14224 | 17.88 |
... | ... | ... | ... |
855 | Organic Gummy Bears & Worms\n | 0 | 2.38 |
856 | Organic Fruit Flavored Snacks\n | 0.10922 | 4.35 |
857 | Organic Gummy Bears\n | 0 | 4.35 |
858 | Organic Lollipops\n | 0 | 0 |
859 | Organic Sour Head\n | 0 | 0 |
860 | Organic Buttermilk Pancake Mix\n | 2.27838 | 7.69 |
861 | Organic Yellow Cake Mix\n | 1.905 | 4.55 |
862 | Organic Double Chocolate Brownie Mix\n | 1.08966 | 3.57 |
863 | Organic Whole Grain Muffin Mix\n | 2.04724 | 5.56 |
864 | Organic 1% Lowfat Milk\n | 0.14478 | 3.39 |
865 | Organic Whole Milk\n | 0.1397 | 3.39 |
866 | Organic 2% Reduced Fat Milk\n | 0.1397 | 3.39 |
867 | Organic Fat Free Skim Milk\n | 0.14478 | 3.81 |
868 | Organic Fruit\n | 0 | 1.43 |
869 | Organic Vegetable Chili\n | 0.63246 | 2.45 |
870 | Organic Salted Butter Made With Organic Sweet ... | 1.63322 | 0 |
871 | Organic Whole Milk\n | 0.127 | 3.33 |
872 | Organic 2% Reduced Fat Milk\n | 0.13208 | 3.33 |
873 | Organic Lowfat Milk\n | 0.13208 | 3.33 |
874 | Organic Milk\n | 0.13208 | 3.33 |
875 | Organic Milk\n | 0.13462 | 3.38 |
876 | Organic Brown Flax Seeds\n | 0.07874 | 15.38 |
877 | Organic Raw Shelled Pumpkin Seed\n | 0.04572 | 25 |
878 | Organic Raw Sunflower Meat\n | 0.02794 | 21.43 |
879 | Organic Maple Syrup\n | 0.02032 | 0 |
880 | Organic Super Sweet Whole Kernel Corn\n | 0.4064 | 1.6 |
881 | Organic Sweet Peas\n | 0.6096 | 3.2 |
882 | Organic Cut Green Beans\n | 0.61468 | 0.83 |
883 | Organic Black Beans\n | 0.254 | 6.15 |
884 | Organic Dark Red Kidney Beans\n | 0.27432 | 6.92 |
885 rows × 3 columns
11.3 Implement lazy evaluation functions
Generator functions
def square_them(numbers):
for number in numbers:
yield number * number
s = square_them(range(10000))
print(next(s))
print(next(s))
print(next(s))
print(next(s))
0
1
4
9
Infinite generators
def counter(d):
while True:
d += 1
yield d
c = counter(10)
print(next(c))
print(next(c))
print(next(c))
11
12
13
Other forms of lazy evaluation
def some_expensive_connection():
import time
time.sleep(10)
return {}
_DB = None
def DB():
global _DB
if _DB is None:
_DB = some_expensive_connection()
File setup
from google.colab import files
# /Users/kbehrman/Google-Drive/projects/pragailabs/python-for-data-science/food/data
files.upload()
!ls
'features.en.openfoodfacts.org.products (1).csv'
'features.en.openfoodfacts.org.products (2).csv'
features.en.openfoodfacts.org.products.csv
sample_data