Open In Colab

Lesson 2 Introduction to Colab

Pragmatic AI Labs

alt text

This notebook was produced by Pragmatic AI Labs. You can continue learning about these topics by:

2.1 First Colab Document

Watch Lesson 2.1

What is Colab?

  • Hosted Jupyter Notebooks
  • GPU/TPU enabled runtimes
  • Google Docs Integration

Creating Colab Notebooks

Three main interfaces:

  • New Notebook (Python2 or Python3)
  • Upload Notebooks
  • Open Notebooks (Github, Drive, Upload)

Key Features

import pandas as pd
df = pd.read_csv("mlb_weight_ht.csv")
Name Team Position Height(inches) Weight(pounds) Age
0 Adam_Donachie BAL Catcher 74 180.0 22.99
1 Paul_Bako BAL Catcher 74 215.0 34.69
2 Ramon_Hernandez BAL Catcher 72 210.0 30.78
3 Kevin_Millar BAL First_Baseman 72 210.0 35.43
4 Chris_Gomez BAL First_Baseman 73 188.0 35.71
  • Iron Icon
  • Table of Contents
  • Code snippits
  • Files

Forms in Colab

Use_Python = False #@param ["False", "True"] {type:"raw"}
print(f"You select it is {Use_Python} you use Python")
You select it is False you use Python

Upload to Colab

from google.colab import files
uploaded = files.upload()
Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.

Python executable

Can run scripts, REPL and even run python statements with -c flag and semicolon to string together multiple statements

!python -c "import os;print(os.listdir())"
['.config', 'mlb_weight_ht.csv', 'mlb_weight_ht (1).csv', 'sample_data']

!ls -l
total 100
-rw-r--r-- 1 root root 46535 Feb 11 18:11 'mlb_weight_ht (1).csv'
-rw-r--r-- 1 root root 46535 Feb 11 18:09  mlb_weight_ht.csv
drwxr-xr-x 1 root root  4096 Feb  6 17:31  sample_data

!pip install yellowbrick
Requirement already satisfied: yellowbrick in /usr/local/lib/python3.6/dist-packages (0.9)
Requirement already satisfied: cycler>=0.10.0 in /usr/local/lib/python3.6/dist-packages (from yellowbrick) (0.10.0)
Requirement already satisfied: matplotlib<3.0,>=1.5.1 in /usr/local/lib/python3.6/dist-packages (from yellowbrick) (2.1.2)
Requirement already satisfied: numpy>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from yellowbrick) (1.14.6)
Requirement already satisfied: scipy>=1.0.0 in /usr/local/lib/python3.6/dist-packages (from yellowbrick) (1.1.0)
Requirement already satisfied: scikit-learn>=0.20 in /usr/local/lib/python3.6/dist-packages (from yellowbrick) (0.20.2)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from cycler>=0.10.0->yellowbrick) (1.11.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib<3.0,>=1.5.1->yellowbrick) (2.3.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib<3.0,>=1.5.1->yellowbrick) (2.5.3)
Requirement already satisfied: pytz in /usr/local/lib/python3.6/dist-packages (from matplotlib<3.0,>=1.5.1->yellowbrick) (2018.7)

#this is how you capture input to a program
import sys;sys.argv

GitHub Integration

  • Load Public Notebooks from Github

Original URL:

Colab Load URL:

  • Browsing Github Repos

All of Github:

An organization or user

  • Open in Colab Badge

Open In Colab

  • Saving to Github

  • Github repo
  • Gist

2.2 Managing Colab Documents

Watch Lesson 2.2

Mount GDrive Workflow

from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
Mounted at /content/gdrive

import os;os.listdir("/content/gdrive/My Drive/awsml")
['kaggle.json', 'credentials', 'config']

Load AWS API Keys (Colab Notebook)

Put keys in local or remote GDrive:

cp ~/.aws/credentials /Users/myname/Google\ Drive/awsml/

Install Boto

!pip -q install boto3

Create API Config

!mkdir -p ~/.aws &&\
  cp /content/gdrive/My\ Drive/awsml/credentials ~/.aws/credentials 

Test Comprehend API Call

import boto3
comprehend = boto3.client(service_name='comprehend', region_name="us-east-1")
text = "There is smoke in San Francisco and it makes me angry"
comprehend.detect_sentiment(Text=text, LanguageCode='en')
{'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
   'content-length': '164',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Wed, 20 Feb 2019 01:06:20 GMT',
   'x-amzn-requestid': 'bbb0aadb-34ab-11e9-a354-03fa70a71749'},
  'HTTPStatusCode': 200,
  'RequestId': 'bbb0aadb-34ab-11e9-a354-03fa70a71749',
  'RetryAttempts': 0},
 'Sentiment': 'NEGATIVE',
 'SentimentScore': {'Mixed': 0.010819978080689907,
  'Negative': 0.9212133288383484,
  'Neutral': 0.06721948087215424,
  'Positive': 0.0007472822326235473}}

Kaggle Load Recipe

Mount GDrive

from google.colab import drive

list in python

import os;os.listdir("/content/gdrive/My Drive/awsml")
['kaggle.json', 'credentials', 'config']

list in bash

!ls -l /content/gdrive/My\ Drive/awsml
total 2
-rw------- 1 root root  43 Nov 22 00:05 config
-rw------- 1 root root 117 Nov 22 00:01 credentials
-rw------- 1 root root  64 Nov 21 22:24 kaggle.json

Wire up Kaggle

!pip install -U -q kaggle
!mkdir -p ~/.kaggle
!cp /content/gdrive/My\ Drive/awsml/kaggle.json ~/.kaggle/kaggle.json

Get Kaggle MNIST Data

!kaggle datasets download -d oddrationale/mnist-in-csv
!ls -l /content
!unzip /content/

404 - Not Found
total 112
drwx------ 4 root root   4096 Feb 20 01:04  gdrive
drwxr-xr-x 1 root root   4096 Feb 15 17:21  sample_data
-rw-r--r-- 1 root root 106324 Feb 20 01:03 'Screen Shot 2019-02-19 at 3.44.05 PM.png'
unzip:  cannot find or open /content/, /content/ or /content/

Load into Pandas

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

train_path = "/content/mnist_train.csv"
data_train = pd.read_csv(train_path)
y_train = np.array(data_train.iloc[:, 0])
x_train = np.array(data_train.iloc[:, 1:])

test_path = "/content/mnist_test.csv"
data_test = pd.read_csv(test_path)
x_test = np.array(data_test)

n_features_train = x_train.shape[1]
n_samples_train = x_train.shape[0]
n_features_test = x_test.shape[1]
n_samples_test = x_test.shape[0]
print(n_features_train, n_samples_train, n_features_test, n_samples_test)
print(x_train.shape, y_train.shape, x_test.shape)
784 60000 785 10000
(60000, 784) (60000,) (10000, 785)

Show Image

def show_img(x):
    size_img = 28
    num_images = 16
    n_samples = x.shape[0]
    x = x.reshape(n_samples, size_img, size_img)
    for i in range(num_images):
        plt.subplot(4, 4, i+1)


Changing Runtime(s)

  • GPU
  • TPU
  • Python 2
  • Python 3
  • Local runtime

Universal Images and Data

  • Images (Can be stored using the Github Issue Hack)

Stored in github issue

  • Files and Data can be stored used large file hack

  • large file
  • workflow:
git lfs install
git lfs track "*.csv"
git add .gitattributes
git add file.psd
git commit -m "Add design file"
git push origin maste

Colab to Colab Cell Copy Hack

cell copy

2.3 Using magic functions

Watch Lesson 2.3


import numpy as np
too_many_decimals = 1.912345897

print("built in Python Round")
%timeit round(too_many_decimals, 2)

print("numpy round")
%timeit np.round(too_many_decimals, 2)

built in Python Round
The slowest run took 13.28 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 507 ns per loop
numpy round
The slowest run took 9.87 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.09 µs per loop


alias lscsv ls -l sample_data/*.csv 
-rw-r--r-- 1 root root   301141 Feb  6 17:31 sample_data/california_housing_test.csv
-rw-r--r-- 1 root root  1706430 Feb  6 17:31 sample_data/california_housing_train.csv
-rw-r--r-- 1 root root 18289443 Feb  6 17:31 sample_data/mnist_test.csv
-rw-r--r-- 1 root root 36523880 Feb  6 17:31 sample_data/mnist_train_small.csv

[Reference These]


Print variables

Acceleration Cylinders Displacement Horsepower Miles_per_Gallon Name Origin Weight_in_lbs Year
0 12.0 8 307.0 130.0 18.0 chevrolet chevelle malibu USA 3504 1970-01-01
1 11.5 8 350.0 165.0 15.0 buick skylark 320 USA 3693 1970-01-01
2 11.0 8 318.0 150.0 18.0 plymouth satellite USA 3436 1970-01-01
3 12.0 8 304.0 150.0 16.0 amc rebel sst USA 3433 1970-01-01
4 10.5 8 302.0 140.0 17.0 ford torino USA 3449 1970-01-01
5 10.0 8 429.0 198.0 15.0 ford galaxie 500 USA 4341 1970-01-01
6 9.0 8 454.0 220.0 14.0 chevrolet impala USA 4354 1970-01-01
7 8.5 8 440.0 215.0 14.0 plymouth fury iii USA 4312 1970-01-01
8 10.0 8 455.0 225.0 14.0 pontiac catalina USA 4425 1970-01-01
9 8.5 8 390.0 190.0 15.0 amc ambassador dpl USA 3850 1970-01-01
10 17.5 4 133.0 115.0 NaN citroen ds-21 pallas Europe 3090 1970-01-01
11 11.5 8 350.0 165.0 NaN chevrolet chevelle concours (sw) USA 4142 1970-01-01
12 11.0 8 351.0 153.0 NaN ford torino (sw) USA 4034 1970-01-01
13 10.5 8 383.0 175.0 NaN plymouth satellite (sw) USA 4166 1970-01-01
14 11.0 8 360.0 175.0 NaN amc rebel sst (sw) USA 3850 1970-01-01
15 10.0 8 383.0 170.0 15.0 dodge challenger se USA 3563 1970-01-01
16 8.0 8 340.0 160.0 14.0 plymouth 'cuda 340 USA 3609 1970-01-01
17 8.0 8 302.0 140.0 NaN ford mustang boss 302 USA 3353 1970-01-01
18 9.5 8 400.0 150.0 15.0 chevrolet monte carlo USA 3761 1970-01-01
19 10.0 8 455.0 225.0 14.0 buick estate wagon (sw) USA 3086 1970-01-01
20 15.0 4 113.0 95.0 24.0 toyota corona mark ii Japan 2372 1970-01-01
21 15.5 6 198.0 95.0 22.0 plymouth duster USA 2833 1970-01-01
22 15.5 6 199.0 97.0 18.0 amc hornet USA 2774 1970-01-01
23 16.0 6 200.0 85.0 21.0 ford maverick USA 2587 1970-01-01
24 14.5 4 97.0 88.0 27.0 datsun pl510 Japan 2130 1970-01-01
25 20.5 4 97.0 46.0 26.0 volkswagen 1131 deluxe sedan Europe 1835 1970-01-01
26 17.5 4 110.0 87.0 25.0 peugeot 504 Europe 2672 1970-01-01
27 14.5 4 107.0 90.0 24.0 audi 100 ls Europe 2430 1970-01-01
28 17.5 4 104.0 95.0 25.0 saab 99e Europe 2375 1970-01-01
29 12.5 4 121.0 113.0 26.0 bmw 2002 Europe 2234 1970-01-01
... ... ... ... ... ... ... ... ... ...
376 18.6 4 112.0 88.0 27.0 chevrolet cavalier wagon USA 2640 1982-01-01
377 18.0 4 112.0 88.0 34.0 chevrolet cavalier 2-door USA 2395 1982-01-01
378 16.2 4 112.0 85.0 31.0 pontiac j2000 se hatchback USA 2575 1982-01-01
379 16.0 4 135.0 84.0 29.0 dodge aries se USA 2525 1982-01-01
380 18.0 4 151.0 90.0 27.0 pontiac phoenix USA 2735 1982-01-01
381 16.4 4 140.0 92.0 24.0 ford fairmont futura USA 2865 1982-01-01
382 20.5 4 151.0 NaN 23.0 amc concord dl USA 3035 1982-01-01
383 15.3 4 105.0 74.0 36.0 volkswagen rabbit l Europe 1980 1982-01-01
384 18.2 4 91.0 68.0 37.0 mazda glc custom l Japan 2025 1982-01-01
385 17.6 4 91.0 68.0 31.0 mazda glc custom Japan 1970 1982-01-01
386 14.7 4 105.0 63.0 38.0 plymouth horizon miser USA 2125 1982-01-01
387 17.3 4 98.0 70.0 36.0 mercury lynx l USA 2125 1982-01-01
388 14.5 4 120.0 88.0 36.0 nissan stanza xe Japan 2160 1982-01-01
389 14.5 4 107.0 75.0 36.0 honda Accelerationord Japan 2205 1982-01-01
390 16.9 4 108.0 70.0 34.0 toyota corolla Japan 2245 1982-01-01
391 15.0 4 91.0 67.0 38.0 honda civic Japan 1965 1982-01-01
392 15.7 4 91.0 67.0 32.0 honda civic (auto) Japan 1965 1982-01-01
393 16.2 4 91.0 67.0 38.0 datsun 310 gx Japan 1995 1982-01-01
394 16.4 6 181.0 110.0 25.0 buick century limited USA 2945 1982-01-01
395 17.0 6 262.0 85.0 38.0 oldsmobile cutlass ciera (diesel) USA 3015 1982-01-01
396 14.5 4 156.0 92.0 26.0 chrysler lebaron medallion USA 2585 1982-01-01
397 14.7 6 232.0 112.0 22.0 ford granada l USA 2835 1982-01-01
398 13.9 4 144.0 96.0 32.0 toyota celica gt Japan 2665 1982-01-01
399 13.0 4 135.0 84.0 36.0 dodge charger 2.2 USA 2370 1982-01-01
400 17.3 4 151.0 90.0 27.0 chevrolet camaro USA 2950 1982-01-01
401 15.6 4 140.0 86.0 27.0 ford mustang gl USA 2790 1982-01-01
402 24.6 4 97.0 52.0 44.0 vw pickup Europe 2130 1982-01-01
403 11.6 4 135.0 84.0 32.0 dodge rampage USA 2295 1982-01-01
404 18.6 4 120.0 79.0 28.0 ford ranger USA 2625 1982-01-01
405 19.4 4 119.0 82.0 31.0 chevy s-10 USA 2720 1982-01-01

406 rows × 9 columns


%%writefile magic_stuff.txt
import pandas as pd
df = pd.read_csv(
df.drop(["Unnamed: 0", "exceeded", "g_sum", "energy_100g"], axis=1, inplace=True) #drop two rows we don't need
df = df.drop(df.index[[1,11877]]) #drop outlier
df.rename(index=str, columns={"reconstructed_energy": "energy_100g"}, inplace=True)
Writing magic_stuff.txt

cat magic_stuff.txt
import pandas as pd
df = pd.read_csv(
df.drop(["Unnamed: 0", "exceeded", "g_sum", "energy_100g"], axis=1, inplace=True) #drop two rows we don't need
df = df.drop(df.index[[1,11877]]) #drop outlier
df.rename(index=str, columns={"reconstructed_energy": "energy_100g"}, inplace=True)


uname -a
Linux c14846d13496 4.14.79+ #1 SMP Wed Dec 19 21:19:13 PST 2018 x86_64 x86_64 x86_64 GNU/Linux


print "old school"

old school

print "old school"

      File "<ipython-input-39-ed16c9002a7c>", line 1
        print "old school"
    SyntaxError: Missing parentheses in call to 'print'. Did you mean print("old school")?


<h1>Only The Best Tags</h>

Only The Best Tags</h> </div> ## 2.4 Compatibility with Jupyter [Watch Lesson 2.4]( ### Jupyter Import/Export * Upload Jupyter Notebooks * Download Jupyter Notebooks ### Using Plotly #### Install Latest Plotly {:.input_area} ``` import plotly plotly.__version__ ``` {:.output .output_data_text} ``` '1.12.12' ``` {:.input_area} ``` !pip uninstall -q -y plotly !pip install plotly==3.6.0 ``` {:.output .output_stream} ``` Collecting plotly==3.6.0 [?25l Downloading (31.1MB)  100% |████████████████████████████████| 31.1MB 1.2MB/s [?25hRequirement already satisfied: decorator>=4.0.6 in /usr/local/lib/python3.6/dist-packages (from plotly==3.6.0) (4.3.2) Requirement already satisfied: nbformat>=4.2 in /usr/local/lib/python3.6/dist-packages (from plotly==3.6.0) (4.4.0) Requirement already satisfied: pytz in /usr/local/lib/python3.6/dist-packages (from plotly==3.6.0) (2018.9) Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from plotly==3.6.0) (2.18.4) Collecting retrying>=1.3.3 (from plotly==3.6.0) Downloading Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from plotly==3.6.0) (1.11.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2->plotly==3.6.0) (4.4.0) Requirement already satisfied: traitlets>=4.1 in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2->plotly==3.6.0) (4.3.2) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2->plotly==3.6.0) (2.6.0) Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.2->plotly==3.6.0) (0.2.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->plotly==3.6.0) (2018.11.29) Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->plotly==3.6.0) (1.22) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->plotly==3.6.0) (3.0.4) Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->plotly==3.6.0) (2.6) Building wheels for collected packages: plotly, retrying Building wheel for plotly ( ... [?25ldone [?25h Stored in directory: /root/.cache/pip/wheels/67/0b/29/08c7f5caed2d1ac446db982ff607b326d49bfa0bd3a67ef8c7 Building wheel for retrying ( ... [?25ldone [?25h Stored in directory: /root/.cache/pip/wheels/d7/a9/33/acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c Successfully built plotly retrying Installing collected packages: retrying, plotly Successfully installed plotly-3.6.0 retrying-1.3.3 ``` {:.input_area} ``` import plotly plotly.__version__ ``` {:.output .output_data_text} ``` '3.6.0' ``` {:.input_area} ``` def enable_plotly_in_cell(): import IPython from plotly.offline import init_notebook_mode display(IPython.core.display.HTML(''' ''')) init_notebook_mode(connected=False) ``` #### Plot {:.input_area} ``` import pandas as pd df = pd.read_csv( "") df.drop(["Unnamed: 0", "exceeded", "g_sum", "energy_100g"], axis=1, inplace=True) #drop two rows we don't need df = df.drop(df.index[[1,11877]]) #drop outlier df.rename(index=str, columns={"reconstructed_energy": "energy_100g"}, inplace=True) df.head() ```
fat_100g carbohydrates_100g sugars_100g proteins_100g salt_100g energy_100g product
0 28.57 64.29 14.29 3.57 0.00000 2267.85 Banana Chips Sweetened (Whole)
2 57.14 17.86 3.57 17.86 1.22428 2835.70 Organic Salted Nut Mix
3 18.75 57.81 15.62 14.06 0.13970 1953.04 Organic Muesli
4 36.67 36.67 3.33 16.67 1.60782 2336.91 Zen Party Mix
5 18.18 60.00 21.82 14.55 0.02286 1976.37 Cinnamon Nut Granola
{:.input_area} ``` from sklearn.preprocessing import MinMaxScaler from sklearn.cluster import KMeans df_cluster_features = df.drop("product", axis=1) scaler = MinMaxScaler() k_means = KMeans(n_clusters=3) kmeans = df['cluster'] = kmeans.labels_ df.head() ```
fat_100g carbohydrates_100g sugars_100g proteins_100g salt_100g energy_100g product cluster
0 28.57 64.29 14.29 3.57 0.00000 2267.85 Banana Chips Sweetened (Whole) 0
2 57.14 17.86 3.57 17.86 1.22428 2835.70 Organic Salted Nut Mix 0
3 18.75 57.81 15.62 14.06 0.13970 1953.04 Organic Muesli 0
4 36.67 36.67 3.33 16.67 1.60782 2336.91 Zen Party Mix 0
5 18.18 60.00 21.82 14.55 0.02286 1976.37 Cinnamon Nut Granola 0
{:.input_area} ``` import plotly.offline as py import plotly.graph_objs as go from plotly.offline import init_notebook_mode enable_plotly_in_cell() trace1 = go.Scatter3d( x=df["fat_100g"], y=df["carbohydrates_100g"], z=df["proteins_100g"], mode='markers', text=df["product"], marker=dict( size=12, color=df["cluster"], # set color to an array/list of desired values colorscale='Viridis', # choose a colorscale opacity=0.8 ) ) #print(trace1) data = [trace1] layout = go.Layout( showlegend=False, title="Protein-Fat-Carb: Food Energy Types", scene = dict( xaxis = dict(title='X: Fat Content-100g'), yaxis = dict(title="Y: Carbohydrate Content-100g"), zaxis = dict(title="Z: Protein Content-100g"), ), width=1000, height=900, ) fig = go.Figure(data=data, layout=layout) py.iplot(fig, filename='3d-scatter-colorscale') ```
### Installing Software {:.input_area} ``` !pip install requests ``` {:.output .output_stream} ``` Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (2.18.4) Requirement already satisfied: idna<2.7,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests) (2.6) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests) (3.0.4) Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests) (1.22) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests) (2018.11.29) ``` {:.input_area} ``` !pip install -q requests ```