{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Lesson6-Python For Data Science Data Conversion Recipes", "version": "0.3.2", "provenance": [], "collapsed_sections": [], "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "metadata": { "id": "4yLBz-G0ACvF", "colab_type": "text" }, "cell_type": "markdown", "source": [ "# Lesson 6: Data Conversion Recipes" ] }, { "metadata": { "id": "c_Id55m6Jsbu", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## Pragmatic AI Labs\n", "\n" ] }, { "metadata": { "id": "e5p96AqpSDZa", "colab_type": "text" }, "cell_type": "markdown", "source": [ "![alt text](https://paiml.com/images/logo_with_slogan_white_background.png)\n", "\n", "This notebook was produced by [Pragmatic AI Labs](https://paiml.com/). You can continue learning about these topics by:\n", "\n", "* Buying a copy of [Pragmatic AI: An Introduction to Cloud-Based Machine Learning](http://www.informit.com/store/pragmatic-ai-an-introduction-to-cloud-based-machine-9780134863917)\n", "* Reading an online copy of [Pragmatic AI:Pragmatic AI: An Introduction to Cloud-Based Machine Learning](https://www.safaribooksonline.com/library/view/pragmatic-ai-an/9780134863924/)\n", "* Watching video [Essential Machine Learning and AI with Python and Jupyter Notebook-Video-SafariOnline](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118) on Safari Books Online.\n", "* Watching video [AWS Certified Machine Learning-Speciality](https://learning.oreilly.com/videos/aws-certified-machine/9780135556597)\n", "* Purchasing video [Essential Machine Learning and AI with Python and Jupyter Notebook- Purchase Video](http://www.informit.com/store/essential-machine-learning-and-ai-with-python-and-jupyter-9780135261095)\n", "* Viewing more content at [noahgift.com](https://noahgift.com/)\n" ] }, { "metadata": { "id": "pBTeTbnRKG_k", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "m1oI49wGAJj2", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## 6.1 Convert lists to dicts, and dicts to lists " ] }, { "metadata": { "id": "ODNn01QhGy4q", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Converting Lists to Dictionaries" ] }, { "metadata": { "id": "kC0DL7jccwTz", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Create basic dictionary" ] }, { "metadata": { "id": "XKmoizujJ3BM", "colab_type": "code", "outputId": "e362192a-cf8d-42c7-a9f0-c7004c1f8cf9", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "key_values = [('one', 1), ('two', 2), ('three', 3)]\n", "d = dict( key_values )\n", "d" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "{'one': 1, 'three': 3, 'two': 2}" ] }, "metadata": { "tags": [] }, "execution_count": 1 } ] }, { "metadata": { "id": "db_EeMOXcsMl", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Zip two lists" ] }, { "metadata": { "id": "xLT9Vq-ScC3u", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "instruments = [ 'violin', 'lute', 'banjo', 'accordian']\n", "players = [ 'Anne-Sophie Mutter', 'Julian Bream', 'Noam Pikelny', 'Astor Pantaleón Piazzolla']\n", "\n", "d = dict(zip(instruments, players))\n", "d" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "hnrbuW1OeoaP", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### From keys" ] }, { "metadata": { "id": "oOHL-nQBcEiM", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "racers = ['Tom', 'Bill', 'Will', 'Jill']\n", "start_distance = 0\n", "d = dict.fromkeys(racers, start_distance)\n", "d" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "zDeFh7DeHV_k", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Converting Dictionaries to Lists" ] }, { "metadata": { "id": "s13Km8SmH0da", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "d = {'name': 'toby', 'id' : 14}" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "ZcA63r-bKUGZ", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Get a list of keys" ] }, { "metadata": { "id": "HdldSNWZKaKQ", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "list(d)" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "TSDFCcDjKhun", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Get keys in sorted order" ] }, { "metadata": { "id": "Z-HCbayYKkDi", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "sorted(d)" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "DvTN0dDKKux-", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Get list of values" ] }, { "metadata": { "id": "Hd2aiFbTKxE2", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "list(d.values())" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "iA3xuV1YOo_M", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## 6.2 Convert dicts to pandas Dataframe" ] }, { "metadata": { "id": "xBThaF6Lf6-l", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Create DataFrame using data parameter" ] }, { "metadata": { "id": "1k7y_fGvca73", "colab_type": "code", "outputId": "689c74db-3b50-4535-fb12-4e5c03d596c6", "colab": { "base_uri": "https://localhost:8080/", "height": 142 } }, "cell_type": "code", "source": [ "from pandas import DataFrame\n", "\n", "d = {'first': ['Jill', 'Solma', 'Elizabeth'], 'last': ['Stein', 'Smith', 'Tudor']}\n", "DataFrame(data=d)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
firstlast
0JillStein
1SolmaSmith
2ElizabethTudor
\n", "
" ], "text/plain": [ " first last\n", "0 Jill Stein\n", "1 Solma Smith\n", "2 Elizabeth Tudor" ] }, "metadata": { "tags": [] }, "execution_count": 13 } ] }, { "metadata": { "id": "R14wQwcOLMwI", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Use class from_dict method" ] }, { "metadata": { "id": "jkWlDXTncVM-", "colab_type": "code", "outputId": "99e08af3-916d-4153-a94d-9cb5dee2a5f8", "colab": { "base_uri": "https://localhost:8080/", "height": 142 } }, "cell_type": "code", "source": [ "DataFrame.from_dict(d)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
firstlast
0JillStein
1SolmaSmith
2ElizabethTudor
\n", "
" ], "text/plain": [ " first last\n", "0 Jill Stein\n", "1 Solma Smith\n", "2 Elizabeth Tudor" ] }, "metadata": { "tags": [] }, "execution_count": 14 } ] }, { "metadata": { "id": "ygpgBXkifzj5", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Create DataFrame with index orientation" ] }, { "metadata": { "id": "jKTTCaxafbqp", "colab_type": "code", "outputId": "93e5020d-cae4-4246-b0ba-e6c033a1687a", "colab": { "base_uri": "https://localhost:8080/", "height": 142 } }, "cell_type": "code", "source": [ "d = {0: ['Edward', 'Tudor'], 1: ['Robert', 'Redford'], 3: ['Earl', 'Scruggs']}\n", "df = DataFrame.from_dict(d, orient='index')\n", "df.columns=['first', 'last']\n", "df" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
firstlast
0EdwardTudor
1RobertRedford
3EarlScruggs
\n", "
" ], "text/plain": [ " first last\n", "0 Edward Tudor\n", "1 Robert Redford\n", "3 Earl Scruggs" ] }, "metadata": { "tags": [] }, "execution_count": 15 } ] }, { "metadata": { "id": "vO5pY89WLz7k", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Assign column names" ] }, { "metadata": { "id": "A6eFFjrifVmb", "colab_type": "code", "outputId": "67db4892-c0b3-40dc-8df3-43a1d9a8aeba", "colab": { "base_uri": "https://localhost:8080/", "height": 142 } }, "cell_type": "code", "source": [ "d = {'a': 'A', 'b': 'B', 'c': 'C'}\n", "df = DataFrame(list(d.items()), columns=['lower', 'upper'])\n", "df" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lowerupper
0aA
1bB
2cC
\n", "
" ], "text/plain": [ " lower upper\n", "0 a A\n", "1 b B\n", "2 c C" ] }, "metadata": { "tags": [] }, "execution_count": 16 } ] }, { "metadata": { "id": "tPHx60a2O-A_", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## 6.3 Convert characters to integers and back " ] }, { "metadata": { "id": "Nv2oPnYcMlQ2", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Cast str to int" ] }, { "metadata": { "id": "IK3H12LAMnlh", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Base 10" ] }, { "metadata": { "id": "Cz744KkLMpbR", "colab_type": "code", "outputId": "04e23bbf-cb44-44c0-be7c-f7c26b2e2f2d", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "int('011')" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "11" ] }, "metadata": { "tags": [] }, "execution_count": 17 } ] }, { "metadata": { "id": "c-oF4oJoMsdD", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Base 2" ] }, { "metadata": { "id": "VQ-rlW-UMuKd", "colab_type": "code", "outputId": "4eab19a8-e931-4d29-c6f8-be0c92944448", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "int('011', 2)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "3" ] }, "metadata": { "tags": [] }, "execution_count": 18 } ] }, { "metadata": { "id": "qTxUMxokM0VD", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Base 6" ] }, { "metadata": { "id": "f4RrKcKaM231", "colab_type": "code", "outputId": "3ee7c155-2fbc-4e0f-b7a8-51da3991f6c3", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "int('011', 6)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "7" ] }, "metadata": { "tags": [] }, "execution_count": 19 } ] }, { "metadata": { "id": "3KfoL3upM_So", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Base 8" ] }, { "metadata": { "id": "r2XsGuB2NBda", "colab_type": "code", "outputId": "4ec14aaf-22b1-40de-9aa1-44682587fc4f", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "int('011', 8)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "9" ] }, "metadata": { "tags": [] }, "execution_count": 20 } ] }, { "metadata": { "id": "SY90ktC3NDcg", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Base 16" ] }, { "metadata": { "id": "oR-XdBNXNFIK", "colab_type": "code", "outputId": "1e5589e7-e3a1-41c3-9b57-4e625031750e", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "int('011', 16)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "17" ] }, "metadata": { "tags": [] }, "execution_count": 21 } ] }, { "metadata": { "id": "3lbw66gCNPTz", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Cast int to string" ] }, { "metadata": { "id": "VihfMVYSk1GB", "colab_type": "code", "outputId": "0a33a3e1-e39b-44ba-88fa-eb10a56c56a1", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "one = str(1)\n", "type(one)\n" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "str" ] }, "metadata": { "tags": [] }, "execution_count": 22 } ] }, { "metadata": { "id": "qLNbYaeLPCMG", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## 6.4 Convert between hexadecimal, binary, and floats" ] }, { "metadata": { "id": "p7AjiUsq7Nov", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Cast to str from float\n" ] }, { "metadata": { "id": "vx2E_TYy9Nt-", "colab_type": "code", "outputId": "26893e0e-83e1-4c55-cc9c-0f90a17a5948", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "a_str = str(12.4)\n", "f\" {a_str!r} is a {type(a_str)}\"" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "\" '12.4' is a \"" ] }, "metadata": { "tags": [] }, "execution_count": 2 } ] }, { "metadata": { "id": "ip3NuTGqNwAv", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Cast to float from str" ] }, { "metadata": { "id": "EMpCWo-tN2pg", "colab_type": "code", "outputId": "adf70288-2ad1-46f8-b55e-4e99aca204a8", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "a_str = \"12.3\"\n", "a_float = float(a_str)\n", "f\" {a_float!r} is a {type(a_float)}\"" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "\" 12.3 is a \"" ] }, "metadata": { "tags": [] }, "execution_count": 3 } ] }, { "metadata": { "id": "fjwdh1uO7U58", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Hexadecimal" ] }, { "metadata": { "id": "JRLzqsSMOkQ1", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Int to hex str" ] }, { "metadata": { "id": "35D8N6zR-aAA", "colab_type": "code", "outputId": "dd911195-1556-4d63-cc44-15b4f523fe0c", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "int_hex = hex(18)\n", "f\" hex(18) returns the {type(int_hex)}: {int_hex!r}\"\n" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "\" hex(18) returns the : '0x12'\"" ] }, "metadata": { "tags": [] }, "execution_count": 4 } ] }, { "metadata": { "id": "J9yOmbk7OrWj", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Float to hex str" ] }, { "metadata": { "id": "EdXjN9MSOtbv", "colab_type": "code", "outputId": "e24b2243-3c0a-4c60-b0ea-0f82cfa63156", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "float_hex = 12.0.hex()\n", "f\" 12.4.hex() returns the {type(float_hex)}: {float_hex!r}\"" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "\" 12.4.hex() returns the : '0x1.8000000000000p+3'\"" ] }, "metadata": { "tags": [] }, "execution_count": 5 } ] }, { "metadata": { "id": "UaLcFYpTom-I", "colab_type": "text" }, "cell_type": "markdown", "source": [ "[hex function](https://docs.python.org/3/library/functions.html)" ] }, { "metadata": { "id": "fMGfHTiR7YQg", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Conversion to and from binary" ] }, { "metadata": { "id": "Gl0_nausl0kh", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Bytes literal\n", "Similar to strings, but limited to ASCII characters." ] }, { "metadata": { "id": "ENwjzQW2mDmY", "colab_type": "code", "outputId": "db651fd9-b84f-4947-b02b-676c9e394e98", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "bytes_str = b\"some bytes literal\"\n", "type(bytes_str)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "bytes" ] }, "metadata": { "tags": [] }, "execution_count": 6 } ] }, { "metadata": { "id": "kYGgEuxnPFBI", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Encode to bytes" ] }, { "metadata": { "id": "-t-2J5Da7tDB", "colab_type": "code", "outputId": "faa9dff1-3581-4309-c107-13a2bd412769", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "import base64\n", "bytes_str = b\"Encode this string\"\n", "encoded_str = base64.b64encode(bytes_str)\n", "f\"The encoded string {encoded_str!r} is of type {type(encoded_str)}\"\n", "\n" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "\"The encoded string b'RW5jb2RlIHRoaXMgc3RyaW5n' is of type \"" ] }, "metadata": { "tags": [] }, "execution_count": 7 } ] }, { "metadata": { "id": "CHhv72ZlPKNp", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Decode from bytes" ] }, { "metadata": { "id": "haBNDHcIPOZH", "colab_type": "code", "outputId": "e72a2282-27b8-4f4f-b4f9-89df2b47801b", "colab": { "base_uri": "https://localhost:8080/", "height": 34 } }, "cell_type": "code", "source": [ "base64.b64decode(encoded_str)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "b'Encode this string'" ] }, "metadata": { "tags": [] }, "execution_count": 8 } ] }, { "metadata": { "id": "RHqNCx9u-QR2", "colab_type": "text" }, "cell_type": "markdown", "source": [ "[base64-module](https://docs.python.org/3/library/base64.html#module-base64)" ] }, { "metadata": { "id": "1u9f2snfCQ2D", "colab_type": "text" }, "cell_type": "markdown", "source": [ "# Notes\n", "- [Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)\n", "- [More on lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)\n", "- [Dataframe.from_dict](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html)\n", "- https://stackoverflow.com/questions/4576115/convert-a-list-to-a-dictionary-in-python\n", "- https://thispointer.com/python-how-to-convert-a-list-to-dictionary/\n" ] } ] }