Open In Colab

Lesson 11: Lazy Evaluation

Pragmatic AI Labs

alt text

This notebook was produced by Pragmatic AI Labs. You can continue learning about these topics by:

11.1 Use generators

Lists and Generators

l_ten = [x for x in range(10)]
g_ten = (x for x in range(10))

print(f"l_ten is a {type(l_ten)} and prints as: {l_ten}")
print(f"g_ten is a {type(g_ten)} and prints as: {g_ten}")
l_ten is a <class 'list'> and prints as: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
g_ten is a <class 'generator'> and prints as: <generator object <genexpr> at 0x7fec69bbfc50>

Next

next(g_ten)
1

Iteration

for x in g_ten:
  print(x)

0
1
2
3
4
5
6
7
8
9

Indexing


g_ten[3]

    ---------------------------------------------------------------------------

    TypeError                                 Traceback (most recent call last)

    <ipython-input-95-e7b8f961aa33> in <module>()
          1 
    ----> 2 g_ten[3]
    

    TypeError: 'generator' object is not subscriptable


Size

import sys
x = 100000000
l_big = [x for x in range(x)]
g_big = (x for x in range(x))

print( f"l_big is {sys.getsizeof(l_big)} bytes")
print( f"g_big is {sys.getsizeof(g_big)} bytes")
l_big is 859724472 bytes
g_big is 88 bytes

11.2 Design generator pipelines

Stringing generators together

evens = (x*2 for x in range(5000000))
three_factors = (x//3 for x in evens if x%3 == 0)
titles = (f"this number is {x}" for x in three_factors)
capped = (x.title() for x in titles)

print(f"The first call to capped: {next(capped)}")
print(f"The second call to capped: {next(capped)}")
print(f"The third call to capped: {next(capped)}")
The first call to capped: This Number Is 0
The second call to capped: This Number Is 2
The third call to capped: This Number Is 4

Why use lazy evaluation

Processing large datasets in smaller pieces. Example: Salt and protein of organic foods

Define generator to read file line by line

def row_reader(file_path):
  for line in open(file_path, 'r'):
    yield line

####

file_path = './features.en.openfoodfacts.org.products.csv'

rows = row_reader(file_path)
rows
<generator object row_reader at 0x7fd2e9af5f68>
next(rows)
'3,57.14,17.86,3.57,17.86,1.22428,2540,2835.7,92.86,0,Organic Salted Nut Mix\n'

Generator pipeline to process one line at a time

def row_reader(file_path):
  line_reader = (x for x in open(file_path, 'r'))
  
  organics_only = (x.split(',') for x in line_reader if x.split(',')[-1].startswith('Organic'))

  name_salt_protein = ((x[-1], x[-6], x[-7]) for x in organics_only)
  
  return name_salt_protein



rows = row_reader(file_path)
next(rows)
('Organic Oat Groats\n', '0.0254', '16.67')
import pandas
organics = pandas.DataFrame(columns=['Name', 'Salt', 'Protein'])

rows = row_reader(file_path)

for new_row in rows:
  organics.loc[len(organics)] = new_row
  
organics
Name Salt Protein
0 Organic Salted Nut Mix\n 1.22428 17.86
1 Organic Muesli\n 0.1397 14.06
2 Organic Hazelnuts\n 0.01016 14.29
3 Organic Oat Groats\n 0.0254 16.67
4 Organic Quinoa Coconut Granola With Mango\n 0.02286 10.91
5 Organic Unswt Berry Coconut Granola\n 0.28194 12.96
6 Organic Red Quinoa\n 0.01016 13.33
7 Organic Blueberry Almond Granola\n 0.04572 10.91
8 Organic Coconut Chips\n 0.09398 6
9 Organic Garbanzo Beans\n 0.05334 17.02
10 Organic Yellow Split Peas\n 0.05588 28.89
11 Organic Trail Mix\n 0.127 13.33
12 Organic Raw Pumpkin Seeds\n 0.04318 30
13 Organic Tamari Pumpkin Seed\n 0.97028 26.47
14 Organic Harvest Pilaf\n 0.02794 15.56
15 Organic Salted Pistachios\n 1.45034 21.43
16 Organic Medjool Dates\n 0.0127 2.2
17 Organic Whole Cashews\n 0.0381 14.71
18 Organic Flourless Sprouted 7-Grain Bread\n 0.6731 11.76
19 Organic Sunny Days Snack Bars\n 0.60198 5.26
20 Organic Nine Grain All Natural Bread\n 1.0033 11.63
21 Organic 100% Whole Wheat\n 0.82804 9.3
22 Organic Great Seed\n 0.88646 11.63
23 Organic Tortellini Pasta\n 0.381 10
24 Organic Ravioli\n 0.5588 9
25 Organic Broccoli Florets\n 0.07366 3.53
26 Organic Creamy Tomato Bisque\n 0.75692 1.22
27 Organic Green Peas\n 0.5715 5.62
28 Organic Mixed Vegetable\n 0.19304 2.35
29 Organic Beef Burger\n 0.14224 17.88
... ... ... ...
855 Organic Gummy Bears & Worms\n 0 2.38
856 Organic Fruit Flavored Snacks\n 0.10922 4.35
857 Organic Gummy Bears\n 0 4.35
858 Organic Lollipops\n 0 0
859 Organic Sour Head\n 0 0
860 Organic Buttermilk Pancake Mix\n 2.27838 7.69
861 Organic Yellow Cake Mix\n 1.905 4.55
862 Organic Double Chocolate Brownie Mix\n 1.08966 3.57
863 Organic Whole Grain Muffin Mix\n 2.04724 5.56
864 Organic 1% Lowfat Milk\n 0.14478 3.39
865 Organic Whole Milk\n 0.1397 3.39
866 Organic 2% Reduced Fat Milk\n 0.1397 3.39
867 Organic Fat Free Skim Milk\n 0.14478 3.81
868 Organic Fruit\n 0 1.43
869 Organic Vegetable Chili\n 0.63246 2.45
870 Organic Salted Butter Made With Organic Sweet ... 1.63322 0
871 Organic Whole Milk\n 0.127 3.33
872 Organic 2% Reduced Fat Milk\n 0.13208 3.33
873 Organic Lowfat Milk\n 0.13208 3.33
874 Organic Milk\n 0.13208 3.33
875 Organic Milk\n 0.13462 3.38
876 Organic Brown Flax Seeds\n 0.07874 15.38
877 Organic Raw Shelled Pumpkin Seed\n 0.04572 25
878 Organic Raw Sunflower Meat\n 0.02794 21.43
879 Organic Maple Syrup\n 0.02032 0
880 Organic Super Sweet Whole Kernel Corn\n 0.4064 1.6
881 Organic Sweet Peas\n 0.6096 3.2
882 Organic Cut Green Beans\n 0.61468 0.83
883 Organic Black Beans\n 0.254 6.15
884 Organic Dark Red Kidney Beans\n 0.27432 6.92

885 rows × 3 columns

11.3 Implement lazy evaluation functions

Generator functions

def square_them(numbers):
  for number in numbers:
    yield number * number
    
  
s = square_them(range(10000))

print(next(s))
print(next(s))
print(next(s))
print(next(s))
0
1
4
9

Infinite generators

def counter(d):
  
  while True:
    d += 1
    yield d
c = counter(10)

print(next(c))
print(next(c))
print(next(c))
11
12
13

Other forms of lazy evaluation

def some_expensive_connection():
  import time
  time.sleep(10)
  return {}

_DB = None

def DB():
  global _DB
  if _DB is None:
    _DB = some_expensive_connection()
    
    
    
    

  

File setup

from google.colab import files
# /Users/kbehrman/Google-Drive/projects/pragailabs/python-for-data-science/food/data
files.upload()
!ls
Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.
'features.en.openfoodfacts.org.products (1).csv'
'features.en.openfoodfacts.org.products (2).csv'
 features.en.openfoodfacts.org.products.csv
 sample_data