Lesson 2 Introduction to Colab
Pragmatic AI Labs
This notebook was produced by Pragmatic AI Labs. You can continue learning about these topics by:
- Watch Python for Data Science Complete Video Course
- Buying a copy of Pragmatic AI: An Introduction to Cloud-Based Machine Learning
- Reading an online copy of Pragmatic AI:Pragmatic AI: An Introduction to Cloud-Based Machine Learning
- Watching video Essential Machine Learning and AI with Python and Jupyter Notebook-Video-SafariOnline on Safari Books Online.
- Watching video AWS Certified Machine Learning-Speciality
- Purchasing video Essential Machine Learning and AI with Python and Jupyter Notebook- Purchase Video
- Viewing more content at noahgift.com
2.1 First Colab Document
What is Colab?
- Hosted Jupyter Notebooks
- GPU/TPU enabled runtimes
- Google Docs Integration
Creating Colab Notebooks
Three main interfaces:
- New Notebook (Python2 or Python3)
- Upload Notebooks
- Open Notebooks (Github, Drive, Upload)
Key Features
import pandas as pd
df = pd.read_csv("mlb_weight_ht.csv")
df.head()
Name | Team | Position | Height(inches) | Weight(pounds) | Age | |
---|---|---|---|---|---|---|
0 | Adam_Donachie | BAL | Catcher | 74 | 180.0 | 22.99 |
1 | Paul_Bako | BAL | Catcher | 74 | 215.0 | 34.69 |
2 | Ramon_Hernandez | BAL | Catcher | 72 | 210.0 | 30.78 |
3 | Kevin_Millar | BAL | First_Baseman | 72 | 210.0 | 35.43 |
4 | Chris_Gomez | BAL | First_Baseman | 73 | 188.0 | 35.71 |
- Iron Icon
- Table of Contents
- Code snippits
- Files
Forms in Colab
Use_Python = False #@param ["False", "True"] {type:"raw"}
print(f"You select it is {Use_Python} you use Python")
You select it is False you use Python
Upload to Colab
from google.colab import files
uploaded = files.upload()
Python executable
Can run scripts, REPL and even run python statements with -c flag and semicolon to string together multiple statements
!python -c "import os;print(os.listdir())"
['.config', 'mlb_weight_ht.csv', 'mlb_weight_ht (1).csv', 'sample_data']
!ls -l
total 100
-rw-r--r-- 1 root root 46535 Feb 11 18:11 'mlb_weight_ht (1).csv'
-rw-r--r-- 1 root root 46535 Feb 11 18:09 mlb_weight_ht.csv
drwxr-xr-x 1 root root 4096 Feb 6 17:31 sample_data
!pip install yellowbrick
Requirement already satisfied: yellowbrick in /usr/local/lib/python3.6/dist-packages (0.9)
Requirement already satisfied: cycler>=0.10.0 in /usr/local/lib/python3.6/dist-packages (from yellowbrick) (0.10.0)
Requirement already satisfied: matplotlib<3.0,>=1.5.1 in /usr/local/lib/python3.6/dist-packages (from yellowbrick) (2.1.2)
Requirement already satisfied: numpy>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from yellowbrick) (1.14.6)
Requirement already satisfied: scipy>=1.0.0 in /usr/local/lib/python3.6/dist-packages (from yellowbrick) (1.1.0)
Requirement already satisfied: scikit-learn>=0.20 in /usr/local/lib/python3.6/dist-packages (from yellowbrick) (0.20.2)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from cycler>=0.10.0->yellowbrick) (1.11.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib<3.0,>=1.5.1->yellowbrick) (2.3.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib<3.0,>=1.5.1->yellowbrick) (2.5.3)
Requirement already satisfied: pytz in /usr/local/lib/python3.6/dist-packages (from matplotlib<3.0,>=1.5.1->yellowbrick) (2018.7)
#this is how you capture input to a program
import sys;sys.argv
['/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py',
'-f',
'/root/.local/share/jupyter/runtime/kernel-990d7124-1599-4f5e-b6d9-1e66b9359d20.json']
GitHub Integration
- Load Public Notebooks from Github
Original URL: https://github.com/paiml/python_for_datascience/blob/master/Lesson2_Python_For_Data_Science_Introduction_to_Colab.ipynb
Colab Load URL: https://colab.research.google.com/github.com/paiml/python_for_datascience/blob/master/Lesson2_Python_For_Data_Science_Introduction_to_Colab.ipynb
- Browsing Github Repos
All of Github:
http://colab.research.google.com/github
An organization or user
http://colab.research.google.com/github/paiml/
- Open in Colab Badge
-
Saving to Github
- Github repo
- Gist
2.2 Managing Colab Documents
Mount GDrive Workflow
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
Mounted at /content/gdrive
import os;os.listdir("/content/gdrive/My Drive/awsml")
['kaggle.json', 'credentials', 'config']
Load AWS API Keys (Colab Notebook)
Put keys in local or remote GDrive:
cp ~/.aws/credentials /Users/myname/Google\ Drive/awsml/
Install Boto
!pip -q install boto3
Create API Config
!mkdir -p ~/.aws &&\
cp /content/gdrive/My\ Drive/awsml/credentials ~/.aws/credentials
Test Comprehend API Call
import boto3
comprehend = boto3.client(service_name='comprehend', region_name="us-east-1")
text = "There is smoke in San Francisco and it makes me angry"
comprehend.detect_sentiment(Text=text, LanguageCode='en')
{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
'content-length': '164',
'content-type': 'application/x-amz-json-1.1',
'date': 'Wed, 20 Feb 2019 01:06:20 GMT',
'x-amzn-requestid': 'bbb0aadb-34ab-11e9-a354-03fa70a71749'},
'HTTPStatusCode': 200,
'RequestId': 'bbb0aadb-34ab-11e9-a354-03fa70a71749',
'RetryAttempts': 0},
'Sentiment': 'NEGATIVE',
'SentimentScore': {'Mixed': 0.010819978080689907,
'Negative': 0.9212133288383484,
'Neutral': 0.06721948087215424,
'Positive': 0.0007472822326235473}}
Kaggle Load Recipe
Mount GDrive
from google.colab import drive
drive.mount('/content/gdrive')
list in python
import os;os.listdir("/content/gdrive/My Drive/awsml")
['kaggle.json', 'credentials', 'config']
list in bash
!ls -l /content/gdrive/My\ Drive/awsml
total 2
-rw------- 1 root root 43 Nov 22 00:05 config
-rw------- 1 root root 117 Nov 22 00:01 credentials
-rw------- 1 root root 64 Nov 21 22:24 kaggle.json
Wire up Kaggle
!pip install -U -q kaggle
!mkdir -p ~/.kaggle
!cp /content/gdrive/My\ Drive/awsml/kaggle.json ~/.kaggle/kaggle.json
Get Kaggle MNIST Data
!kaggle datasets download -d oddrationale/mnist-in-csv
!ls -l /content
!unzip /content/mnist-in-csv.zip
404 - Not Found
total 112
drwx------ 4 root root 4096 Feb 20 01:04 gdrive
drwxr-xr-x 1 root root 4096 Feb 15 17:21 sample_data
-rw-r--r-- 1 root root 106324 Feb 20 01:03 'Screen Shot 2019-02-19 at 3.44.05 PM.png'
unzip: cannot find or open /content/mnist-in-csv.zip, /content/mnist-in-csv.zip.zip or /content/mnist-in-csv.zip.ZIP.
Load into Pandas
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#train
train_path = "/content/mnist_train.csv"
data_train = pd.read_csv(train_path)
y_train = np.array(data_train.iloc[:, 0])
x_train = np.array(data_train.iloc[:, 1:])
#test
test_path = "/content/mnist_test.csv"
data_test = pd.read_csv(test_path)
x_test = np.array(data_test)
#features
n_features_train = x_train.shape[1]
n_samples_train = x_train.shape[0]
n_features_test = x_test.shape[1]
n_samples_test = x_test.shape[0]
print(n_features_train, n_samples_train, n_features_test, n_samples_test)
print(x_train.shape, y_train.shape, x_test.shape)
784 60000 785 10000
(60000, 784) (60000,) (10000, 785)
Show Image
def show_img(x):
size_img = 28
plt.figure(figsize=(8,7))
num_images = 16
n_samples = x.shape[0]
x = x.reshape(n_samples, size_img, size_img)
for i in range(num_images):
plt.subplot(4, 4, i+1)
plt.imshow(x[i])
plt.show()
show_img(x_train)
Changing Runtime(s)
- GPU
- TPU
- Python 2
- Python 3
- Local runtime
Universal Images and Data
- Images (Can be stored using the Github Issue Hack)
-
Files and Data can be stored used large file hack
- large file
- workflow: https://git-lfs.github.com/
git lfs install
git lfs track "*.csv"
git add .gitattributes
git add file.psd
git commit -m "Add design file"
git push origin maste
Colab to Colab Cell Copy Hack
2.3 Using magic functions
%timeit
import numpy as np
too_many_decimals = 1.912345897
print("built in Python Round")
%timeit round(too_many_decimals, 2)
print("numpy round")
%timeit np.round(too_many_decimals, 2)
built in Python Round
The slowest run took 13.28 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 507 ns per loop
numpy round
The slowest run took 9.87 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.09 µs per loop
%alias
alias lscsv ls -l sample_data/*.csv
lscsv
-rw-r--r-- 1 root root 301141 Feb 6 17:31 sample_data/california_housing_test.csv
-rw-r--r-- 1 root root 1706430 Feb 6 17:31 sample_data/california_housing_train.csv
-rw-r--r-- 1 root root 18289443 Feb 6 17:31 sample_data/mnist_test.csv
-rw-r--r-- 1 root root 36523880 Feb 6 17:31 sample_data/mnist_train_small.csv
[Reference These] https://ipython.readthedocs.io/en/stable/interactive/magics.html
%who
Print variables
who_ls
['Use_Python',
'alt',
'boto3',
'cars',
'comprehend',
'data',
'data_test',
'data_train',
'df',
'drive',
'files',
'n_features_test',
'n_features_train',
'n_samples_test',
'n_samples_train',
'np',
'os',
'pd',
'plt',
'show_img',
'sys',
'test_path',
'text',
'too_many_decimals',
'train_path',
'uploaded',
'x_test',
'x_train',
'y_train']
cars
Acceleration | Cylinders | Displacement | Horsepower | Miles_per_Gallon | Name | Origin | Weight_in_lbs | Year | |
---|---|---|---|---|---|---|---|---|---|
0 | 12.0 | 8 | 307.0 | 130.0 | 18.0 | chevrolet chevelle malibu | USA | 3504 | 1970-01-01 |
1 | 11.5 | 8 | 350.0 | 165.0 | 15.0 | buick skylark 320 | USA | 3693 | 1970-01-01 |
2 | 11.0 | 8 | 318.0 | 150.0 | 18.0 | plymouth satellite | USA | 3436 | 1970-01-01 |
3 | 12.0 | 8 | 304.0 | 150.0 | 16.0 | amc rebel sst | USA | 3433 | 1970-01-01 |
4 | 10.5 | 8 | 302.0 | 140.0 | 17.0 | ford torino | USA | 3449 | 1970-01-01 |
5 | 10.0 | 8 | 429.0 | 198.0 | 15.0 | ford galaxie 500 | USA | 4341 | 1970-01-01 |
6 | 9.0 | 8 | 454.0 | 220.0 | 14.0 | chevrolet impala | USA | 4354 | 1970-01-01 |
7 | 8.5 | 8 | 440.0 | 215.0 | 14.0 | plymouth fury iii | USA | 4312 | 1970-01-01 |
8 | 10.0 | 8 | 455.0 | 225.0 | 14.0 | pontiac catalina | USA | 4425 | 1970-01-01 |
9 | 8.5 | 8 | 390.0 | 190.0 | 15.0 | amc ambassador dpl | USA | 3850 | 1970-01-01 |
10 | 17.5 | 4 | 133.0 | 115.0 | NaN | citroen ds-21 pallas | Europe | 3090 | 1970-01-01 |
11 | 11.5 | 8 | 350.0 | 165.0 | NaN | chevrolet chevelle concours (sw) | USA | 4142 | 1970-01-01 |
12 | 11.0 | 8 | 351.0 | 153.0 | NaN | ford torino (sw) | USA | 4034 | 1970-01-01 |
13 | 10.5 | 8 | 383.0 | 175.0 | NaN | plymouth satellite (sw) | USA | 4166 | 1970-01-01 |
14 | 11.0 | 8 | 360.0 | 175.0 | NaN | amc rebel sst (sw) | USA | 3850 | 1970-01-01 |
15 | 10.0 | 8 | 383.0 | 170.0 | 15.0 | dodge challenger se | USA | 3563 | 1970-01-01 |
16 | 8.0 | 8 | 340.0 | 160.0 | 14.0 | plymouth 'cuda 340 | USA | 3609 | 1970-01-01 |
17 | 8.0 | 8 | 302.0 | 140.0 | NaN | ford mustang boss 302 | USA | 3353 | 1970-01-01 |
18 | 9.5 | 8 | 400.0 | 150.0 | 15.0 | chevrolet monte carlo | USA | 3761 | 1970-01-01 |
19 | 10.0 | 8 | 455.0 | 225.0 | 14.0 | buick estate wagon (sw) | USA | 3086 | 1970-01-01 |
20 | 15.0 | 4 | 113.0 | 95.0 | 24.0 | toyota corona mark ii | Japan | 2372 | 1970-01-01 |
21 | 15.5 | 6 | 198.0 | 95.0 | 22.0 | plymouth duster | USA | 2833 | 1970-01-01 |
22 | 15.5 | 6 | 199.0 | 97.0 | 18.0 | amc hornet | USA | 2774 | 1970-01-01 |
23 | 16.0 | 6 | 200.0 | 85.0 | 21.0 | ford maverick | USA | 2587 | 1970-01-01 |
24 | 14.5 | 4 | 97.0 | 88.0 | 27.0 | datsun pl510 | Japan | 2130 | 1970-01-01 |
25 | 20.5 | 4 | 97.0 | 46.0 | 26.0 | volkswagen 1131 deluxe sedan | Europe | 1835 | 1970-01-01 |
26 | 17.5 | 4 | 110.0 | 87.0 | 25.0 | peugeot 504 | Europe | 2672 | 1970-01-01 |
27 | 14.5 | 4 | 107.0 | 90.0 | 24.0 | audi 100 ls | Europe | 2430 | 1970-01-01 |
28 | 17.5 | 4 | 104.0 | 95.0 | 25.0 | saab 99e | Europe | 2375 | 1970-01-01 |
29 | 12.5 | 4 | 121.0 | 113.0 | 26.0 | bmw 2002 | Europe | 2234 | 1970-01-01 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
376 | 18.6 | 4 | 112.0 | 88.0 | 27.0 | chevrolet cavalier wagon | USA | 2640 | 1982-01-01 |
377 | 18.0 | 4 | 112.0 | 88.0 | 34.0 | chevrolet cavalier 2-door | USA | 2395 | 1982-01-01 |
378 | 16.2 | 4 | 112.0 | 85.0 | 31.0 | pontiac j2000 se hatchback | USA | 2575 | 1982-01-01 |
379 | 16.0 | 4 | 135.0 | 84.0 | 29.0 | dodge aries se | USA | 2525 | 1982-01-01 |
380 | 18.0 | 4 | 151.0 | 90.0 | 27.0 | pontiac phoenix | USA | 2735 | 1982-01-01 |
381 | 16.4 | 4 | 140.0 | 92.0 | 24.0 | ford fairmont futura | USA | 2865 | 1982-01-01 |
382 | 20.5 | 4 | 151.0 | NaN | 23.0 | amc concord dl | USA | 3035 | 1982-01-01 |
383 | 15.3 | 4 | 105.0 | 74.0 | 36.0 | volkswagen rabbit l | Europe | 1980 | 1982-01-01 |
384 | 18.2 | 4 | 91.0 | 68.0 | 37.0 | mazda glc custom l | Japan | 2025 | 1982-01-01 |
385 | 17.6 | 4 | 91.0 | 68.0 | 31.0 | mazda glc custom | Japan | 1970 | 1982-01-01 |
386 | 14.7 | 4 | 105.0 | 63.0 | 38.0 | plymouth horizon miser | USA | 2125 | 1982-01-01 |
387 | 17.3 | 4 | 98.0 | 70.0 | 36.0 | mercury lynx l | USA | 2125 | 1982-01-01 |
388 | 14.5 | 4 | 120.0 | 88.0 | 36.0 | nissan stanza xe | Japan | 2160 | 1982-01-01 |
389 | 14.5 | 4 | 107.0 | 75.0 | 36.0 | honda Accelerationord | Japan | 2205 | 1982-01-01 |
390 | 16.9 | 4 | 108.0 | 70.0 | 34.0 | toyota corolla | Japan | 2245 | 1982-01-01 |
391 | 15.0 | 4 | 91.0 | 67.0 | 38.0 | honda civic | Japan | 1965 | 1982-01-01 |
392 | 15.7 | 4 | 91.0 | 67.0 | 32.0 | honda civic (auto) | Japan | 1965 | 1982-01-01 |
393 | 16.2 | 4 | 91.0 | 67.0 | 38.0 | datsun 310 gx | Japan | 1995 | 1982-01-01 |
394 | 16.4 | 6 | 181.0 | 110.0 | 25.0 | buick century limited | USA | 2945 | 1982-01-01 |
395 | 17.0 | 6 | 262.0 | 85.0 | 38.0 | oldsmobile cutlass ciera (diesel) | USA | 3015 | 1982-01-01 |
396 | 14.5 | 4 | 156.0 | 92.0 | 26.0 | chrysler lebaron medallion | USA | 2585 | 1982-01-01 |
397 | 14.7 | 6 | 232.0 | 112.0 | 22.0 | ford granada l | USA | 2835 | 1982-01-01 |
398 | 13.9 | 4 | 144.0 | 96.0 | 32.0 | toyota celica gt | Japan | 2665 | 1982-01-01 |
399 | 13.0 | 4 | 135.0 | 84.0 | 36.0 | dodge charger 2.2 | USA | 2370 | 1982-01-01 |
400 | 17.3 | 4 | 151.0 | 90.0 | 27.0 | chevrolet camaro | USA | 2950 | 1982-01-01 |
401 | 15.6 | 4 | 140.0 | 86.0 | 27.0 | ford mustang gl | USA | 2790 | 1982-01-01 |
402 | 24.6 | 4 | 97.0 | 52.0 | 44.0 | vw pickup | Europe | 2130 | 1982-01-01 |
403 | 11.6 | 4 | 135.0 | 84.0 | 32.0 | dodge rampage | USA | 2295 | 1982-01-01 |
404 | 18.6 | 4 | 120.0 | 79.0 | 28.0 | ford ranger | USA | 2625 | 1982-01-01 |
405 | 19.4 | 4 | 119.0 | 82.0 | 31.0 | chevy s-10 | USA | 2720 | 1982-01-01 |
406 rows × 9 columns
%writefile
%%writefile magic_stuff.txt
import pandas as pd
df = pd.read_csv(
"https://raw.githubusercontent.com/noahgift/food/master/data/features.en.openfoodfacts.org.products.csv")
df.drop(["Unnamed: 0", "exceeded", "g_sum", "energy_100g"], axis=1, inplace=True) #drop two rows we don't need
df = df.drop(df.index[[1,11877]]) #drop outlier
df.rename(index=str, columns={"reconstructed_energy": "energy_100g"}, inplace=True)
df.head()
Writing magic_stuff.txt
cat magic_stuff.txt
import pandas as pd
df = pd.read_csv(
"https://raw.githubusercontent.com/noahgift/food/master/data/features.en.openfoodfacts.org.products.csv")
df.drop(["Unnamed: 0", "exceeded", "g_sum", "energy_100g"], axis=1, inplace=True) #drop two rows we don't need
df = df.drop(df.index[[1,11877]]) #drop outlier
df.rename(index=str, columns={"reconstructed_energy": "energy_100g"}, inplace=True)
df.head()
Bash
%%bash
uname -a
Linux c14846d13496 4.14.79+ #1 SMP Wed Dec 19 21:19:13 PST 2018 x86_64 x86_64 x86_64 GNU/Linux
Python2
%%python2
print "old school"
old school
print "old school"
File "<ipython-input-39-ed16c9002a7c>", line 1
print "old school"
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("old school")?
HTML
%%html
<h1>Only The Best Tags</h>
Only The Best Tags</h>
</div>
## 2.4 Compatibility with Jupyter
[Watch Lesson 2.4](https://learning.oreilly.com/videos/python-for-data/9780135687253/9780135687253-pfds_01_02_04_00)
### Jupyter Import/Export
* Upload Jupyter Notebooks
* Download Jupyter Notebooks
### Using Plotly
#### Install Latest Plotly
{:.input_area}
```
import plotly
plotly.__version__
```
{:.output .output_data_text}
```
'1.12.12'
```
{:.input_area}
```
!pip uninstall -q -y plotly
!pip install plotly==3.6.0
```
{:.output .output_stream}
```
Collecting plotly==3.6.0
[?25l Downloading https://files.pythonhosted.org/packages/4d/59/63a5a05532a67b1c49283e8b7885bbe55454a1eef8443e97a7479bb9964b/plotly-3.6.0.tar.gz (31.1MB)
[K 100% |████████████████████████████████| 31.1MB 1.2MB/s
[?25hRequirement already satisfied: decorator>=4.0.6 in /usr/local/lib/python3.6/dist-packages (from plotly==3.6.0) (4.3.2)
Requirement already satisfied: nbformat>=4.2 in /usr/local/lib/python3.6/dist-packages (from plotly==3.6.0) (4.4.0)
Requirement already satisfied: pytz in /usr/local/lib/python3.6/dist-packages (from plotly==3.6.0) (2018.9)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from plotly==3.6.0) (2.18.4)
Collecting retrying>=1.3.3 (from plotly==3.6.0)
Downloading https://files.pythonhosted.org/packages/44/ef/beae4b4ef80902f22e3af073397f079c96969c69b2c7d52a57ea9ae61c9d/retrying-1.3.3.tar.gz
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from plotly==3.6.0) (1.11.0)
Requirement already satisfied: jupyter-core in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2->plotly==3.6.0) (4.4.0)
Requirement already satisfied: traitlets>=4.1 in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2->plotly==3.6.0) (4.3.2)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2->plotly==3.6.0) (2.6.0)
Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2->plotly==3.6.0) (0.2.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->plotly==3.6.0) (2018.11.29)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->plotly==3.6.0) (1.22)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->plotly==3.6.0) (3.0.4)
Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->plotly==3.6.0) (2.6)
Building wheels for collected packages: plotly, retrying
Building wheel for plotly (setup.py) ... [?25ldone
[?25h Stored in directory: /root/.cache/pip/wheels/67/0b/29/08c7f5caed2d1ac446db982ff607b326d49bfa0bd3a67ef8c7
Building wheel for retrying (setup.py) ... [?25ldone
[?25h Stored in directory: /root/.cache/pip/wheels/d7/a9/33/acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c
Successfully built plotly retrying
Installing collected packages: retrying, plotly
Successfully installed plotly-3.6.0 retrying-1.3.3
```
{:.input_area}
```
import plotly
plotly.__version__
```
{:.output .output_data_text}
```
'3.6.0'
```
{:.input_area}
```
def enable_plotly_in_cell():
import IPython
from plotly.offline import init_notebook_mode
display(IPython.core.display.HTML('''
'''))
init_notebook_mode(connected=False)
```
#### Plot
{:.input_area}
```
import pandas as pd
df = pd.read_csv(
"https://raw.githubusercontent.com/noahgift/food/master/data/features.en.openfoodfacts.org.products.csv")
df.drop(["Unnamed: 0", "exceeded", "g_sum", "energy_100g"], axis=1, inplace=True) #drop two rows we don't need
df = df.drop(df.index[[1,11877]]) #drop outlier
df.rename(index=str, columns={"reconstructed_energy": "energy_100g"}, inplace=True)
df.head()
```
fat_100g
carbohydrates_100g
sugars_100g
proteins_100g
salt_100g
energy_100g
product
0
28.57
64.29
14.29
3.57
0.00000
2267.85
Banana Chips Sweetened (Whole)
2
57.14
17.86
3.57
17.86
1.22428
2835.70
Organic Salted Nut Mix
3
18.75
57.81
15.62
14.06
0.13970
1953.04
Organic Muesli
4
36.67
36.67
3.33
16.67
1.60782
2336.91
Zen Party Mix
5
18.18
60.00
21.82
14.55
0.02286
1976.37
Cinnamon Nut Granola
{:.input_area}
```
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
df_cluster_features = df.drop("product", axis=1)
scaler = MinMaxScaler()
scaler.fit(df_cluster_features)
k_means = KMeans(n_clusters=3)
kmeans = k_means.fit(scaler.transform(df_cluster_features))
df['cluster'] = kmeans.labels_
df.head()
```
fat_100g
carbohydrates_100g
sugars_100g
proteins_100g
salt_100g
energy_100g
product
cluster
0
28.57
64.29
14.29
3.57
0.00000
2267.85
Banana Chips Sweetened (Whole)
0
2
57.14
17.86
3.57
17.86
1.22428
2835.70
Organic Salted Nut Mix
0
3
18.75
57.81
15.62
14.06
0.13970
1953.04
Organic Muesli
0
4
36.67
36.67
3.33
16.67
1.60782
2336.91
Zen Party Mix
0
5
18.18
60.00
21.82
14.55
0.02286
1976.37
Cinnamon Nut Granola
0
{:.input_area}
```
import plotly.offline as py
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode
enable_plotly_in_cell()
trace1 = go.Scatter3d(
x=df["fat_100g"],
y=df["carbohydrates_100g"],
z=df["proteins_100g"],
mode='markers',
text=df["product"],
marker=dict(
size=12,
color=df["cluster"], # set color to an array/list of desired values
colorscale='Viridis', # choose a colorscale
opacity=0.8
)
)
#print(trace1)
data = [trace1]
layout = go.Layout(
showlegend=False,
title="Protein-Fat-Carb: Food Energy Types",
scene = dict(
xaxis = dict(title='X: Fat Content-100g'),
yaxis = dict(title="Y: Carbohydrate Content-100g"),
zaxis = dict(title="Z: Protein Content-100g"),
),
width=1000,
height=900,
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='3d-scatter-colorscale')
```
### Installing Software
{:.input_area}
```
!pip install requests
```
{:.output .output_stream}
```
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (2.18.4)
Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests) (2.6)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests) (3.0.4)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests) (1.22)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests) (2018.11.29)
```
{:.input_area}
```
!pip install -q requests
```
fat_100g | carbohydrates_100g | sugars_100g | proteins_100g | salt_100g | energy_100g | product | |
---|---|---|---|---|---|---|---|
0 | 28.57 | 64.29 | 14.29 | 3.57 | 0.00000 | 2267.85 | Banana Chips Sweetened (Whole) |
2 | 57.14 | 17.86 | 3.57 | 17.86 | 1.22428 | 2835.70 | Organic Salted Nut Mix |
3 | 18.75 | 57.81 | 15.62 | 14.06 | 0.13970 | 1953.04 | Organic Muesli |
4 | 36.67 | 36.67 | 3.33 | 16.67 | 1.60782 | 2336.91 | Zen Party Mix |
5 | 18.18 | 60.00 | 21.82 | 14.55 | 0.02286 | 1976.37 | Cinnamon Nut Granola |
fat_100g | carbohydrates_100g | sugars_100g | proteins_100g | salt_100g | energy_100g | product | cluster | |
---|---|---|---|---|---|---|---|---|
0 | 28.57 | 64.29 | 14.29 | 3.57 | 0.00000 | 2267.85 | Banana Chips Sweetened (Whole) | 0 |
2 | 57.14 | 17.86 | 3.57 | 17.86 | 1.22428 | 2835.70 | Organic Salted Nut Mix | 0 |
3 | 18.75 | 57.81 | 15.62 | 14.06 | 0.13970 | 1953.04 | Organic Muesli | 0 |
4 | 36.67 | 36.67 | 3.33 | 16.67 | 1.60782 | 2336.91 | Zen Party Mix | 0 |
5 | 18.18 | 60.00 | 21.82 | 14.55 | 0.02286 | 1976.37 | Cinnamon Nut Granola | 0 |