{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Lesson15-Python For Data Science-CaseStudies.ipynb",
"version": "0.3.2",
"provenance": [],
"collapsed_sections": [
"NvoiEwiAWrWy",
"wR_L2OPkuqH4",
"JJNSA3n0u3Zf",
"qTL1K5SSXD1U",
"SEIH6ESVXKOb",
"ANeUczzxXOf2",
"bRKFiJVvPyZi",
"3b1-VTl8jubf",
"wQ8dueD5jubm",
"nxPSuxe3jubp",
"wv5b_Nhljubx"
],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
""
]
},
{
"metadata": {
"id": "spdivf2TMnGC",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Lesson 16: Case Studies"
]
},
{
"metadata": {
"id": "c_Id55m6Jsbu",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Pragmatic AI Labs\n",
"\n"
]
},
{
"metadata": {
"id": "e5p96AqpSDZa",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"\n",
"\n",
"This notebook was produced by [Pragmatic AI Labs](https://paiml.com/). You can continue learning about these topics by:\n",
"\n",
"* Buying a copy of [Pragmatic AI: An Introduction to Cloud-Based Machine Learning](http://www.informit.com/store/pragmatic-ai-an-introduction-to-cloud-based-machine-9780134863917)\n",
"* Reading an online copy of [Pragmatic AI:Pragmatic AI: An Introduction to Cloud-Based Machine Learning](https://www.safaribooksonline.com/library/view/pragmatic-ai-an/9780134863924/)\n",
"* Watching video [Essential Machine Learning and AI with Python and Jupyter Notebook-Video-SafariOnline](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118) on Safari Books Online.\n",
"* Watching video [AWS Certified Machine Learning-Speciality](https://learning.oreilly.com/videos/aws-certified-machine/9780135556597)\n",
"* Purchasing video [Essential Machine Learning and AI with Python and Jupyter Notebook- Purchase Video](http://www.informit.com/store/essential-machine-learning-and-ai-with-python-and-jupyter-9780135261095)\n",
"* Viewing more content at [noahgift.com](https://noahgift.com/)\n"
]
},
{
"metadata": {
"id": "pBTeTbnRKG_k",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
""
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "NvoiEwiAWrWy",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## 16.4 Ludwig (Open Source AutoML)"
]
},
{
"metadata": {
"id": "jbnbFKcTXNOn",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"**Github Project URL**: https://uber.github.io/ludwig/\n",
"\n",
""
]
},
{
"metadata": {
"id": "Aa-KnZKcfvkV",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Install Ludwig"
]
},
{
"metadata": {
"id": "Q3FrtesdfyV9",
"colab_type": "code",
"outputId": "1db2fd0d-8904-489e-ac1f-70bc70c9704a",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 925
}
},
"cell_type": "code",
"source": [
"!pip install --upgrade numpy #must restart colab runtime\n",
"!pip install --upgrade scikit-image\n",
"!pip install -q ludwig\n",
"!python -m spacy download en "
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Requirement already up-to-date: numpy in /usr/local/lib/python3.6/dist-packages (1.16.1)\n",
"Collecting scikit-image\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/24/06/d560630eb9e36d90d69fe57d9ff762d8f501664ce478b8a0ae132b3c3008/scikit_image-0.14.2-cp36-cp36m-manylinux1_x86_64.whl (25.3MB)\n",
"\u001b[K 100% |████████████████████████████████| 25.3MB 1.9MB/s \n",
"\u001b[?25hCollecting pillow>=4.3.0 (from scikit-image)\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/85/5e/e91792f198bbc5a0d7d3055ad552bc4062942d27eaf75c3e2783cf64eae5/Pillow-5.4.1-cp36-cp36m-manylinux1_x86_64.whl (2.0MB)\n",
"\u001b[K 100% |████████████████████████████████| 2.0MB 18.3MB/s \n",
"\u001b[?25hRequirement already satisfied, skipping upgrade: scipy>=0.17.0 in /usr/local/lib/python3.6/dist-packages (from scikit-image) (1.1.0)\n",
"Requirement already satisfied, skipping upgrade: matplotlib>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from scikit-image) (3.0.2)\n",
"Requirement already satisfied, skipping upgrade: six>=1.10.0 in /usr/local/lib/python3.6/dist-packages (from scikit-image) (1.11.0)\n",
"Requirement already satisfied, skipping upgrade: cloudpickle>=0.2.1 in /usr/local/lib/python3.6/dist-packages (from scikit-image) (0.6.1)\n",
"Requirement already satisfied, skipping upgrade: PyWavelets>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from scikit-image) (1.0.1)\n",
"Requirement already satisfied, skipping upgrade: networkx>=1.8 in /usr/local/lib/python3.6/dist-packages (from scikit-image) (2.2)\n",
"Collecting dask[array]>=1.0.0 (from scikit-image)\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/7c/2b/cf9e5477bec3bd3b4687719876ea38e9d8c9dc9d3526365c74e836e6a650/dask-1.1.1-py2.py3-none-any.whl (701kB)\n",
"\u001b[K 100% |████████████████████████████████| 706kB 25.2MB/s \n",
"\u001b[?25hRequirement already satisfied, skipping upgrade: numpy>=1.8.2 in /usr/local/lib/python3.6/dist-packages (from scipy>=0.17.0->scikit-image) (1.16.1)\n",
"Requirement already satisfied, skipping upgrade: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=2.0.0->scikit-image) (0.10.0)\n",
"Requirement already satisfied, skipping upgrade: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=2.0.0->scikit-image) (2.3.1)\n",
"Requirement already satisfied, skipping upgrade: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=2.0.0->scikit-image) (1.0.1)\n",
"Requirement already satisfied, skipping upgrade: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=2.0.0->scikit-image) (2.5.3)\n",
"Requirement already satisfied, skipping upgrade: decorator>=4.3.0 in /usr/local/lib/python3.6/dist-packages (from networkx>=1.8->scikit-image) (4.3.2)\n",
"Requirement already satisfied, skipping upgrade: toolz>=0.7.3; extra == \"array\" in /usr/local/lib/python3.6/dist-packages (from dask[array]>=1.0.0->scikit-image) (0.9.0)\n",
"Requirement already satisfied, skipping upgrade: setuptools in /usr/local/lib/python3.6/dist-packages (from kiwisolver>=1.0.1->matplotlib>=2.0.0->scikit-image) (40.8.0)\n",
"\u001b[31mfeaturetools 0.4.1 has requirement pandas>=0.23.0, but you'll have pandas 0.22.0 which is incompatible.\u001b[0m\n",
"\u001b[31malbumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.8 which is incompatible.\u001b[0m\n",
"Installing collected packages: pillow, dask, scikit-image\n",
" Found existing installation: Pillow 4.0.0\n",
" Uninstalling Pillow-4.0.0:\n",
" Successfully uninstalled Pillow-4.0.0\n",
" Found existing installation: dask 0.20.2\n",
" Uninstalling dask-0.20.2:\n",
" Successfully uninstalled dask-0.20.2\n",
" Found existing installation: scikit-image 0.13.1\n",
" Uninstalling scikit-image-0.13.1:\n",
" Successfully uninstalled scikit-image-0.13.1\n",
"Successfully installed dask-1.1.1 pillow-5.4.1 scikit-image-0.14.2\n"
],
"name": "stdout"
},
{
"output_type": "display_data",
"data": {
"application/vnd.colab-display-data+json": {
"pip_warning": {
"packages": [
"PIL"
]
}
}
},
"metadata": {
"tags": []
}
},
{
"output_type": "stream",
"text": [
"Requirement already satisfied: en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0 in /usr/local/lib/python3.6/dist-packages (2.0.0)\n",
"\n",
"\u001b[93m Linking successful\u001b[0m\n",
" /usr/local/lib/python3.6/dist-packages/en_core_web_sm -->\n",
" /usr/local/lib/python3.6/dist-packages/spacy/data/en\n",
"\n",
" You can now load the model via spacy.load('en')\n",
"\n"
],
"name": "stdout"
}
]
},
{
"metadata": {
"id": "yVLSSAUViyiX",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Basic Ideas"
]
},
{
"metadata": {
"id": "qHaRqAN5i1iV",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"* **Training Models**\n",
"* **Prediction (Inference)**\n",
"* **Datatypes**\n",
" - binary\n",
" - numerical\n",
" - category\n",
" - set\n",
" - bag\n",
" - sequence\n",
" - text\n",
" - timeseries\n",
" - image\n",
"\n"
]
},
{
"metadata": {
"id": "W3G5sZ3yo-yK",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Topic Modeling Example"
]
},
{
"metadata": {
"id": "aIbXYrxU8ySd",
"colab_type": "code",
"outputId": "241c61f9-ad81-4c4d-82dd-42bef0502fdf",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 407
}
},
"cell_type": "code",
"source": [
"!wget https://raw.githubusercontent.com/uchidalab/book-dataset/master/Task1/book30-listing-train.csv\n",
"!wget https://raw.githubusercontent.com/noahgift/recommendations/master/model_definition.yaml"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"--2019-02-18 02:44:21-- https://raw.githubusercontent.com/uchidalab/book-dataset/master/Task1/book30-listing-train.csv\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 9728786 (9.3M) [text/plain]\n",
"Saving to: ‘book30-listing-train.csv.3’\n",
"\n",
"book30-listing-trai 100%[===================>] 9.28M --.-KB/s in 0.1s \n",
"\n",
"2019-02-18 02:44:23 (64.4 MB/s) - ‘book30-listing-train.csv.3’ saved [9728786/9728786]\n",
"\n",
"--2019-02-18 02:44:24-- https://raw.githubusercontent.com/noahgift/recommendations/master/model_definition.yaml\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 180 [text/plain]\n",
"Saving to: ‘model_definition.yaml.2’\n",
"\n",
"model_definition.ya 100%[===================>] 180 --.-KB/s in 0s \n",
"\n",
"2019-02-18 02:44:25 (34.7 MB/s) - ‘model_definition.yaml.2’ saved [180/180]\n",
"\n"
],
"name": "stdout"
}
]
},
{
"metadata": {
"id": "v-w5Zkzcumoi",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Ingest"
]
},
{
"metadata": {
"id": "Ef8dbaV4tHrz",
"colab_type": "code",
"outputId": "e7bbaff9-edcf-43df-f142-f8e5e916338f",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 197
}
},
"cell_type": "code",
"source": [
"import pandas as pd\n",
"df = pd.read_csv(\"https://media.githubusercontent.com/media/noahgift/recommendations/master/data/book30-listing-train-with-headers.csv\")\n",
"df = df.drop(\"Unnamed: 0\", axis=1)\n",
"df.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n", " | ASIN | \n", "FILENAME | \n", "IMAGE URL | \n", "TITLE | \n", "AUTHOR | \n", "CATEGORYID | \n", "CATEGORY | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "1404803335 | \n", "1404803335.jpg | \n", "http://ecx.images-amazon.com/images/I/51UJnL3T... | \n", "Magnets: Pulling Together, Pushing Apart (Amaz... | \n", "Natalie M. Rosinsky | \n", "4 | \n", "Children's Books | \n", "
1 | \n", "1446276082 | \n", "1446276082.jpg | \n", "http://ecx.images-amazon.com/images/I/51MGUKhk... | \n", "Energy Security (SAGE Library of International... | \n", "NaN | \n", "10 | \n", "Engineering & Transportation | \n", "
2 | \n", "1491522666 | \n", "1491522666.jpg | \n", "http://ecx.images-amazon.com/images/I/51qKvjsi... | \n", "An Amish Gathering: Life in Lancaster County | \n", "Beth Wiseman | \n", "9 | \n", "Christian Books & Bibles | \n", "
3 | \n", "970096410 | \n", "0970096410.jpg | \n", "http://ecx.images-amazon.com/images/I/51qoUENb... | \n", "City of Rocks Idaho: A Climber's Guide (Region... | \n", "Dave Bingham | \n", "26 | \n", "Sports & Outdoors | \n", "
4 | \n", "8436808053 | \n", "8436808053.jpg | \n", "http://ecx.images-amazon.com/images/I/41aDW5pz... | \n", "Como vencer el insomnio. Tecnicas, reglas y co... | \n", "Choliz Montanes | \n", "11 | \n", "Health, Fitness & Dieting | \n", "