<a href="https://colab.research.google.com/github/paiml/python_for_datascience/blob/master/Lesson8_Python_For_Data_Science_Functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lesson 8: Functions

## Pragmatic AI Labs



![alt text](https://paiml.com/images/logo_with_slogan_white_background.png)

This notebook was produced by [Pragmatic AI Labs](https://paiml.com/).  You can continue learning about these topics by:

*   Buying a copy of [Pragmatic AI: An Introduction to Cloud-Based Machine Learning](http://www.informit.com/store/pragmatic-ai-an-introduction-to-cloud-based-machine-9780134863917)
*   Reading an online copy of [Pragmatic AI:Pragmatic AI: An Introduction to Cloud-Based Machine Learning](https://www.safaribooksonline.com/library/view/pragmatic-ai-an/9780134863924/)
*  Watching video [Essential Machine Learning and AI with Python and Jupyter Notebook-Video-SafariOnline](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118) on Safari Books Online.
* Watching video [AWS Certified Machine Learning-Speciality](https://learning.oreilly.com/videos/aws-certified-machine/9780135556597)
* Purchasing video [Essential Machine Learning and AI with Python and Jupyter Notebook- Purchase Video](http://www.informit.com/store/essential-machine-learning-and-ai-with-python-and-jupyter-9780135261095)
*   Viewing more content at [noahgift.com](https://noahgift.com/)


## 8.1 Write and use functions

### Building blocks of distributed computing

In [0]:
def work(input):
  """Processes input and returns output"""
  
  output = input + 1
  return output


In [0]:
work(1)

2

### Key Components of Functions

#### Docstrings

In [0]:
def docstring():
  """Triple Quoted documentation!"""
  

In [0]:
docstring?

#### Arguments:  Keyword and Positional

* *Positional*:  Order based processing
* *Keyword*:  Key/Value processing


##### Positional

In [0]:
def positional(first,second,third):
  """Processes arguments to function in order"""
  
  print(f"Processed first {first}")
  print(f"Processed second {second}")
  print(f"Processed third {third}")
  
  

In [0]:
positional(1, 2, 3)

Processed first 1
Processed second 2
Processed third 3


In [0]:
positional(2, 3, 1)

Processed first 2
Processed second 3
Processed third 1


##### Keyword

In [0]:
def keyword(first=1, second=2, third=3):
  """Processed in any order"""
  
  print(f"Processed first {first}")
  print(f"Processed second {second}")
  print(f"Processed third {third}")

In [0]:
keyword(1,2,3)

Processed first 1
Processed second 2
Processed third 3


In [0]:
keyword(second=2, third=3, first=1)

Processed first 1
Processed second 2
Processed third 3


In [0]:
keyword(second=2)

Processed first 1
Processed second 2
Processed third 3


#### Return

Default is None

In [0]:
def bridge_to_nowhere():pass
  

In [0]:
bridge_to_nowhere() == None

True

In [0]:
type(bridge_to_nowhere())

NoneType

Most useful functions return something

In [0]:
def more_than_zero():
  
  return 1

In [0]:
more_than_zero() == 1

True

Functions can return functions

In [0]:
def inner_peace():
  """A deep function"""
  
  def peace():
    return "piece"
  
  return peace

In [0]:
inner = inner_peace()
print(f"Hey, I need that {inner()}")

Hey, I need that piece


In [0]:
inner2 = inner_peace()

In [0]:
type(inner2)

function

## 8.2 Write and use decorators

### Using Decorators

Very common to use for dispatching a function via:


*   Command-line tools
*   Web Routes
*   Speeding up Python code



#### Command-line Tools

In [0]:
%%python
import click

def less_than_zero():
  
  return {"iron_man": -1}

@click.command()
def run():
  
  rdj = less_than_zero()
  click.echo(f"Robert Downey Junior is versatile {rdj}")
  
if __name__== "__main__":
  run()

Robert Downey Junior is versatile {'iron_man': -1}


#### Web App

In [0]:
%%writefile run.py
from flask import Flask
app = Flask(__name__)

def less_than_zero():
  
  return {"iron_man": -1}

@app.route('/')
def runit():
  return less_than_zero()
  


Overwriting run.py


curl localhost:5000/ {'iron_man': -1}



#### Using Numba

Using numba Just in Time Compiler (JIT) can dramatically speed up code

In [0]:
def crunchy_normal():
  count = 0
  num = 10000000
  for i in range(num):
    count += num  
  return count

In [0]:
%%time
crunchy_normal()

CPU times: user 906 ms, sys: 581 Âµs, total: 907 ms
Wall time: 908 ms


100000000000000

In [0]:
from numba import jit

@jit(nopython=True)
def crunchy():
  count = 0
  num = 10000000
  for i in range(num):
    count += num  
  return count

In [0]:
%%time
crunchy()

CPU times: user 113 ms, sys: 15.9 ms, total: 129 ms
Wall time: 194 ms


100000000000000

### Writing Decorators

#### Instrumentation Decorator

Using a decorator to time, debug or instrument code is very common

In [0]:
from functools import wraps
import time

def instrument(f):
    @wraps(f)
    def wrap(*args, **kw):
        ts = time.time()
        result = f(*args, **kw)
        te = time.time()
        print(f"function: {f.__name__}, args: [{args}, {kw}] took: {te-ts} sec")
        return result
    return wrap

Using decorator to time execution of a function

In [0]:
from time import sleep

@instrument
def simulated_work(count, task):
  """simulates work"""
  
  print("Starting work")
  sleep(count)
  processed = f"one {task} leap"
  return processed
  

In [0]:
simulated_work(3, task="small")  

Starting work
function: simulated_work, args: [(3,), {'task': 'small'}] took: 3.0027008056640625 sec


'one small leap'

## 8.3 Compose closure functions

### Functions with state

In [0]:
def calorie_counter():
    """Counts calories"""
    
    protein = 0
    fat = 0
    carbohydrate = 0
    total = 0
    def calorie_counter_inner(food):
        nonlocal protein
        nonlocal fat
        nonlocal carbohydrate
        if food == "protein":
          protein += 4
        elif food == "carbohydrate":
          carbohydrate += 4
        elif food == "fat":
          fat += 9
        total = protein + carbohydrate + fat
        print(f"Consumed {total} calories of protein: {protein}, carbohydrate: {carbohydrate}, fat: {fat}")
    return calorie_counter_inner

In [0]:
meal = calorie_counter()
type(meal)


function

In [0]:
meal("carbohydrate")

Consumed 4 calories of protein: 0, carbohydrate: 4, fat: 0


In [0]:
meal("fat")

Consumed 13 calories of protein: 0, carbohydrate: 4, fat: 9


In [0]:
meal("protein")

Consumed 17 calories of protein: 4, carbohydrate: 4, fat: 9


## 8.4 Use lambda

### YAGNI


**Y**ou **A**in't **G**onna **N**eed** I**t


In [0]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [0]:
func = lambda x: x**x
func(4)

256

In [0]:
def expo(x):
  return x**x

expo(4)

256

### Close Encounters with Lambdas

Used in series or DataFrame 

In [0]:
import pandas as pd

series = pd.Series([1, 5, 10])
series.apply(lambda x: x**x)

0              1
1           3125
2    10000000000
dtype: int64

In [0]:
def expo(x):
  return x**x

expo(4)

In [0]:
import pandas as pd

series = pd.Series([1, 5, 10])
series.apply(expo)

0              1
1           3125
2    10000000000
dtype: int64

## 8.5 Advanced Use of Functions

### Applying a Function to a Pandas DataFrame

In [0]:
import pandas as pd
df = pd.read_csv(
    "https://raw.githubusercontent.com/noahgift/food/master/data/features.en.openfoodfacts.org.products.csv")
df.drop(["Unnamed: 0", "exceeded", "g_sum", "energy_100g"], axis=1, inplace=True) #drop two rows we don't need
df.head()

Unnamed: 0,fat_100g,carbohydrates_100g,sugars_100g,proteins_100g,salt_100g,reconstructed_energy,product
0,28.57,64.29,14.29,3.57,0.0,2267.85,Banana Chips Sweetened (Whole)
1,17.86,60.71,17.86,17.86,0.635,2032.23,Peanuts
2,57.14,17.86,3.57,17.86,1.22428,2835.7,Organic Salted Nut Mix
3,18.75,57.81,15.62,14.06,0.1397,1953.04,Organic Muesli
4,36.67,36.67,3.33,16.67,1.60782,2336.91,Zen Party Mix


In [0]:
def high_protein(row):
  """Creates a high or low protein category"""
  
  if row > 80:
    return "high_protein"
  return "low_protein"

In [0]:
df["high_protein"] = df["proteins_100g"].apply(high_protein)
df.head()

Unnamed: 0,fat_100g,carbohydrates_100g,sugars_100g,proteins_100g,salt_100g,reconstructed_energy,product,high_protein
0,28.57,64.29,14.29,3.57,0.0,2267.85,Banana Chips Sweetened (Whole),low_protein
1,17.86,60.71,17.86,17.86,0.635,2032.23,Peanuts,low_protein
2,57.14,17.86,3.57,17.86,1.22428,2835.7,Organic Salted Nut Mix,low_protein
3,18.75,57.81,15.62,14.06,0.1397,1953.04,Organic Muesli,low_protein
4,36.67,36.67,3.33,16.67,1.60782,2336.91,Zen Party Mix,low_protein


In [0]:
df.describe()

Unnamed: 0,fat_100g,carbohydrates_100g,sugars_100g,proteins_100g,salt_100g,reconstructed_energy
count,45028.0,45028.0,45028.0,45028.0,45028.0,45028.0
mean,10.76591,34.054788,16.005614,6.619437,1.469631,1111.332304
std,14.930087,29.557017,21.495512,7.93677,12.794943,791.621634
min,0.0,0.0,-1.2,-3.57,0.0,0.0
25%,0.0,7.44,1.57,0.0,0.0635,334.52
50%,3.17,22.39,5.88,4.0,0.635,1121.54
75%,17.86,61.54,23.08,9.52,1.44018,1678.46
max,100.0,100.0,100.0,100.0,2032.0,4475.0


### Partial Functions

In [0]:
from functools import partial

def multiple_sort(column_one, column_two):
  """Performs multiple sort on a pandas DataFrame"""
  
  sorted_df = df.sort_values(by=[column_one, column_two], 
                 ascending=[False, False])
  return sorted_df
  
multisort = partial(multiple_sort, "sugars_100g")
type(multisort)

functools.partial

Find sugary and fatty food

In [0]:
df = multisort("fat_100g")
df.head()

Unnamed: 0,fat_100g,carbohydrates_100g,sugars_100g,proteins_100g,salt_100g,reconstructed_energy,product,high_protein
8254,25.0,100.0,100.0,0.0,0.0,2675.0,Princess Mix Decorations,low_protein
8255,25.0,100.0,100.0,0.0,0.0,2675.0,Frosted Mix,low_protein
8253,12.5,100.0,100.0,0.0,0.0,2187.5,Holiday Happiness Mix,low_protein
9371,1.79,85.71,100.0,7.14,0.04572,1648.26,Organic Just Cherries,low_protein
222,0.0,100.0,100.0,0.0,0.0,1700.0,Tnt Exploding Candy,low_protein


Find sugary and salty food

In [0]:
df = multisort("salt_100g")
df.head()

Unnamed: 0,fat_100g,carbohydrates_100g,sugars_100g,proteins_100g,salt_100g,reconstructed_energy,product,high_protein
33151,0.0,0.0,100.0,0.0,71.12,0.0,"Turkey Brine Kit, Garlic & Herb",low_protein
24783,0.0,100.0,100.0,0.0,24.13,1700.0,Seasoning,low_protein
4073,0.0,100.0,100.0,0.0,7.62,1700.0,"Seasoning Rub, Sweet & Spicy Seafood",low_protein
10282,0.0,100.0,100.0,0.0,2.54,1700.0,Instant Pectin,low_protein
17880,0.0,100.0,100.0,0.0,0.635,1700.0,Cranberry Cosmos Cocktail Rimming Sugar,low_protein
