{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Lesson6-Python For Data Science Data Conversion Recipes",
"version": "0.3.2",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"
"
]
},
{
"metadata": {
"id": "4yLBz-G0ACvF",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Lesson 6: Data Conversion Recipes"
]
},
{
"metadata": {
"id": "c_Id55m6Jsbu",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Pragmatic AI Labs\n",
"\n"
]
},
{
"metadata": {
"id": "e5p96AqpSDZa",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"\n",
"\n",
"This notebook was produced by [Pragmatic AI Labs](https://paiml.com/). You can continue learning about these topics by:\n",
"\n",
"* Buying a copy of [Pragmatic AI: An Introduction to Cloud-Based Machine Learning](http://www.informit.com/store/pragmatic-ai-an-introduction-to-cloud-based-machine-9780134863917)\n",
"* Reading an online copy of [Pragmatic AI:Pragmatic AI: An Introduction to Cloud-Based Machine Learning](https://www.safaribooksonline.com/library/view/pragmatic-ai-an/9780134863924/)\n",
"* Watching video [Essential Machine Learning and AI with Python and Jupyter Notebook-Video-SafariOnline](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118) on Safari Books Online.\n",
"* Watching video [AWS Certified Machine Learning-Speciality](https://learning.oreilly.com/videos/aws-certified-machine/9780135556597)\n",
"* Purchasing video [Essential Machine Learning and AI with Python and Jupyter Notebook- Purchase Video](http://www.informit.com/store/essential-machine-learning-and-ai-with-python-and-jupyter-9780135261095)\n",
"* Viewing more content at [noahgift.com](https://noahgift.com/)\n"
]
},
{
"metadata": {
"id": "pBTeTbnRKG_k",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
""
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "m1oI49wGAJj2",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## 6.1 Convert lists to dicts, and dicts to lists "
]
},
{
"metadata": {
"id": "ODNn01QhGy4q",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Converting Lists to Dictionaries"
]
},
{
"metadata": {
"id": "kC0DL7jccwTz",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Create basic dictionary"
]
},
{
"metadata": {
"id": "XKmoizujJ3BM",
"colab_type": "code",
"outputId": "e362192a-cf8d-42c7-a9f0-c7004c1f8cf9",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"key_values = [('one', 1), ('two', 2), ('three', 3)]\n",
"d = dict( key_values )\n",
"d"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'one': 1, 'three': 3, 'two': 2}"
]
},
"metadata": {
"tags": []
},
"execution_count": 1
}
]
},
{
"metadata": {
"id": "db_EeMOXcsMl",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Zip two lists"
]
},
{
"metadata": {
"id": "xLT9Vq-ScC3u",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"instruments = [ 'violin', 'lute', 'banjo', 'accordian']\n",
"players = [ 'Anne-Sophie Mutter', 'Julian Bream', 'Noam Pikelny', 'Astor Pantaleón Piazzolla']\n",
"\n",
"d = dict(zip(instruments, players))\n",
"d"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "hnrbuW1OeoaP",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### From keys"
]
},
{
"metadata": {
"id": "oOHL-nQBcEiM",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"racers = ['Tom', 'Bill', 'Will', 'Jill']\n",
"start_distance = 0\n",
"d = dict.fromkeys(racers, start_distance)\n",
"d"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "zDeFh7DeHV_k",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Converting Dictionaries to Lists"
]
},
{
"metadata": {
"id": "s13Km8SmH0da",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"d = {'name': 'toby', 'id' : 14}"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "ZcA63r-bKUGZ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Get a list of keys"
]
},
{
"metadata": {
"id": "HdldSNWZKaKQ",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"list(d)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "TSDFCcDjKhun",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Get keys in sorted order"
]
},
{
"metadata": {
"id": "Z-HCbayYKkDi",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"sorted(d)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "DvTN0dDKKux-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Get list of values"
]
},
{
"metadata": {
"id": "Hd2aiFbTKxE2",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"list(d.values())"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "iA3xuV1YOo_M",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## 6.2 Convert dicts to pandas Dataframe"
]
},
{
"metadata": {
"id": "xBThaF6Lf6-l",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Create DataFrame using data parameter"
]
},
{
"metadata": {
"id": "1k7y_fGvca73",
"colab_type": "code",
"outputId": "689c74db-3b50-4535-fb12-4e5c03d596c6",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 142
}
},
"cell_type": "code",
"source": [
"from pandas import DataFrame\n",
"\n",
"d = {'first': ['Jill', 'Solma', 'Elizabeth'], 'last': ['Stein', 'Smith', 'Tudor']}\n",
"DataFrame(data=d)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" first | \n",
" last | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Jill | \n",
" Stein | \n",
"
\n",
" \n",
" 1 | \n",
" Solma | \n",
" Smith | \n",
"
\n",
" \n",
" 2 | \n",
" Elizabeth | \n",
" Tudor | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" first last\n",
"0 Jill Stein\n",
"1 Solma Smith\n",
"2 Elizabeth Tudor"
]
},
"metadata": {
"tags": []
},
"execution_count": 13
}
]
},
{
"metadata": {
"id": "R14wQwcOLMwI",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Use class from_dict method"
]
},
{
"metadata": {
"id": "jkWlDXTncVM-",
"colab_type": "code",
"outputId": "99e08af3-916d-4153-a94d-9cb5dee2a5f8",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 142
}
},
"cell_type": "code",
"source": [
"DataFrame.from_dict(d)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" first | \n",
" last | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Jill | \n",
" Stein | \n",
"
\n",
" \n",
" 1 | \n",
" Solma | \n",
" Smith | \n",
"
\n",
" \n",
" 2 | \n",
" Elizabeth | \n",
" Tudor | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" first last\n",
"0 Jill Stein\n",
"1 Solma Smith\n",
"2 Elizabeth Tudor"
]
},
"metadata": {
"tags": []
},
"execution_count": 14
}
]
},
{
"metadata": {
"id": "ygpgBXkifzj5",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Create DataFrame with index orientation"
]
},
{
"metadata": {
"id": "jKTTCaxafbqp",
"colab_type": "code",
"outputId": "93e5020d-cae4-4246-b0ba-e6c033a1687a",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 142
}
},
"cell_type": "code",
"source": [
"d = {0: ['Edward', 'Tudor'], 1: ['Robert', 'Redford'], 3: ['Earl', 'Scruggs']}\n",
"df = DataFrame.from_dict(d, orient='index')\n",
"df.columns=['first', 'last']\n",
"df"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" first | \n",
" last | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Edward | \n",
" Tudor | \n",
"
\n",
" \n",
" 1 | \n",
" Robert | \n",
" Redford | \n",
"
\n",
" \n",
" 3 | \n",
" Earl | \n",
" Scruggs | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" first last\n",
"0 Edward Tudor\n",
"1 Robert Redford\n",
"3 Earl Scruggs"
]
},
"metadata": {
"tags": []
},
"execution_count": 15
}
]
},
{
"metadata": {
"id": "vO5pY89WLz7k",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Assign column names"
]
},
{
"metadata": {
"id": "A6eFFjrifVmb",
"colab_type": "code",
"outputId": "67db4892-c0b3-40dc-8df3-43a1d9a8aeba",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 142
}
},
"cell_type": "code",
"source": [
"d = {'a': 'A', 'b': 'B', 'c': 'C'}\n",
"df = DataFrame(list(d.items()), columns=['lower', 'upper'])\n",
"df"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" lower | \n",
" upper | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" a | \n",
" A | \n",
"
\n",
" \n",
" 1 | \n",
" b | \n",
" B | \n",
"
\n",
" \n",
" 2 | \n",
" c | \n",
" C | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" lower upper\n",
"0 a A\n",
"1 b B\n",
"2 c C"
]
},
"metadata": {
"tags": []
},
"execution_count": 16
}
]
},
{
"metadata": {
"id": "tPHx60a2O-A_",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## 6.3 Convert characters to integers and back "
]
},
{
"metadata": {
"id": "Nv2oPnYcMlQ2",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Cast str to int"
]
},
{
"metadata": {
"id": "IK3H12LAMnlh",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Base 10"
]
},
{
"metadata": {
"id": "Cz744KkLMpbR",
"colab_type": "code",
"outputId": "04e23bbf-cb44-44c0-be7c-f7c26b2e2f2d",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"int('011')"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"11"
]
},
"metadata": {
"tags": []
},
"execution_count": 17
}
]
},
{
"metadata": {
"id": "c-oF4oJoMsdD",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Base 2"
]
},
{
"metadata": {
"id": "VQ-rlW-UMuKd",
"colab_type": "code",
"outputId": "4eab19a8-e931-4d29-c6f8-be0c92944448",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"int('011', 2)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"3"
]
},
"metadata": {
"tags": []
},
"execution_count": 18
}
]
},
{
"metadata": {
"id": "qTxUMxokM0VD",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Base 6"
]
},
{
"metadata": {
"id": "f4RrKcKaM231",
"colab_type": "code",
"outputId": "3ee7c155-2fbc-4e0f-b7a8-51da3991f6c3",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"int('011', 6)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"7"
]
},
"metadata": {
"tags": []
},
"execution_count": 19
}
]
},
{
"metadata": {
"id": "3KfoL3upM_So",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Base 8"
]
},
{
"metadata": {
"id": "r2XsGuB2NBda",
"colab_type": "code",
"outputId": "4ec14aaf-22b1-40de-9aa1-44682587fc4f",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"int('011', 8)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"9"
]
},
"metadata": {
"tags": []
},
"execution_count": 20
}
]
},
{
"metadata": {
"id": "SY90ktC3NDcg",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Base 16"
]
},
{
"metadata": {
"id": "oR-XdBNXNFIK",
"colab_type": "code",
"outputId": "1e5589e7-e3a1-41c3-9b57-4e625031750e",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"int('011', 16)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"17"
]
},
"metadata": {
"tags": []
},
"execution_count": 21
}
]
},
{
"metadata": {
"id": "3lbw66gCNPTz",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Cast int to string"
]
},
{
"metadata": {
"id": "VihfMVYSk1GB",
"colab_type": "code",
"outputId": "0a33a3e1-e39b-44ba-88fa-eb10a56c56a1",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"one = str(1)\n",
"type(one)\n"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"str"
]
},
"metadata": {
"tags": []
},
"execution_count": 22
}
]
},
{
"metadata": {
"id": "qLNbYaeLPCMG",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## 6.4 Convert between hexadecimal, binary, and floats"
]
},
{
"metadata": {
"id": "p7AjiUsq7Nov",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Cast to str from float\n"
]
},
{
"metadata": {
"id": "vx2E_TYy9Nt-",
"colab_type": "code",
"outputId": "26893e0e-83e1-4c55-cc9c-0f90a17a5948",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"a_str = str(12.4)\n",
"f\" {a_str!r} is a {type(a_str)}\""
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"\" '12.4' is a \""
]
},
"metadata": {
"tags": []
},
"execution_count": 2
}
]
},
{
"metadata": {
"id": "ip3NuTGqNwAv",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Cast to float from str"
]
},
{
"metadata": {
"id": "EMpCWo-tN2pg",
"colab_type": "code",
"outputId": "adf70288-2ad1-46f8-b55e-4e99aca204a8",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"a_str = \"12.3\"\n",
"a_float = float(a_str)\n",
"f\" {a_float!r} is a {type(a_float)}\""
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"\" 12.3 is a \""
]
},
"metadata": {
"tags": []
},
"execution_count": 3
}
]
},
{
"metadata": {
"id": "fjwdh1uO7U58",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Hexadecimal"
]
},
{
"metadata": {
"id": "JRLzqsSMOkQ1",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Int to hex str"
]
},
{
"metadata": {
"id": "35D8N6zR-aAA",
"colab_type": "code",
"outputId": "dd911195-1556-4d63-cc44-15b4f523fe0c",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"int_hex = hex(18)\n",
"f\" hex(18) returns the {type(int_hex)}: {int_hex!r}\"\n"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"\" hex(18) returns the : '0x12'\""
]
},
"metadata": {
"tags": []
},
"execution_count": 4
}
]
},
{
"metadata": {
"id": "J9yOmbk7OrWj",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Float to hex str"
]
},
{
"metadata": {
"id": "EdXjN9MSOtbv",
"colab_type": "code",
"outputId": "e24b2243-3c0a-4c60-b0ea-0f82cfa63156",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"float_hex = 12.0.hex()\n",
"f\" 12.4.hex() returns the {type(float_hex)}: {float_hex!r}\""
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"\" 12.4.hex() returns the : '0x1.8000000000000p+3'\""
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"metadata": {
"id": "UaLcFYpTom-I",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"[hex function](https://docs.python.org/3/library/functions.html)"
]
},
{
"metadata": {
"id": "fMGfHTiR7YQg",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Conversion to and from binary"
]
},
{
"metadata": {
"id": "Gl0_nausl0kh",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Bytes literal\n",
"Similar to strings, but limited to ASCII characters."
]
},
{
"metadata": {
"id": "ENwjzQW2mDmY",
"colab_type": "code",
"outputId": "db651fd9-b84f-4947-b02b-676c9e394e98",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"bytes_str = b\"some bytes literal\"\n",
"type(bytes_str)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"bytes"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"metadata": {
"id": "kYGgEuxnPFBI",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Encode to bytes"
]
},
{
"metadata": {
"id": "-t-2J5Da7tDB",
"colab_type": "code",
"outputId": "faa9dff1-3581-4309-c107-13a2bd412769",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"import base64\n",
"bytes_str = b\"Encode this string\"\n",
"encoded_str = base64.b64encode(bytes_str)\n",
"f\"The encoded string {encoded_str!r} is of type {type(encoded_str)}\"\n",
"\n"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"\"The encoded string b'RW5jb2RlIHRoaXMgc3RyaW5n' is of type \""
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"metadata": {
"id": "CHhv72ZlPKNp",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Decode from bytes"
]
},
{
"metadata": {
"id": "haBNDHcIPOZH",
"colab_type": "code",
"outputId": "e72a2282-27b8-4f4f-b4f9-89df2b47801b",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"cell_type": "code",
"source": [
"base64.b64decode(encoded_str)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"b'Encode this string'"
]
},
"metadata": {
"tags": []
},
"execution_count": 8
}
]
},
{
"metadata": {
"id": "RHqNCx9u-QR2",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"[base64-module](https://docs.python.org/3/library/base64.html#module-base64)"
]
},
{
"metadata": {
"id": "1u9f2snfCQ2D",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"# Notes\n",
"- [Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)\n",
"- [More on lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)\n",
"- [Dataframe.from_dict](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html)\n",
"- https://stackoverflow.com/questions/4576115/convert-a-list-to-a-dictionary-in-python\n",
"- https://thispointer.com/python-how-to-convert-a-list-to-dictionary/\n"
]
}
]
}