{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Lesson13-Python For Data Science-Sorting.ipynb", "version": "0.3.2", "provenance": [], "collapsed_sections": [ "qqAIvhP5N4iu", "quAhiuBw-sq9", "4z2e2QHV4EvD", "7PISBoZr4xgp", "1VErz5Z7g0v7", "esYhvVKHg-TM", "iQawn9MQJIph" ], "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "metadata": { "id": "spdivf2TMnGC", "colab_type": "text" }, "cell_type": "markdown", "source": [ "# Lesson 13 Sorting\n" ] }, { "metadata": { "id": "c_Id55m6Jsbu", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## Pragmatic AI Labs\n", "\n" ] }, { "metadata": { "id": "e5p96AqpSDZa", "colab_type": "text" }, "cell_type": "markdown", "source": [ "![alt text](https://paiml.com/images/logo_with_slogan_white_background.png)\n", "\n", "This notebook was produced by [Pragmatic AI Labs](https://paiml.com/). You can continue learning about these topics by:\n", "\n", "* Buying a copy of [Pragmatic AI: An Introduction to Cloud-Based Machine Learning](http://www.informit.com/store/pragmatic-ai-an-introduction-to-cloud-based-machine-9780134863917)\n", "* Reading an online copy of [Pragmatic AI:Pragmatic AI: An Introduction to Cloud-Based Machine Learning](https://www.safaribooksonline.com/library/view/pragmatic-ai-an/9780134863924/)\n", "* Watching video [Essential Machine Learning and AI with Python and Jupyter Notebook-Video-SafariOnline](https://www.safaribooksonline.com/videos/essential-machine-learning/9780135261118) on Safari Books Online.\n", "* Watching video [AWS Certified Machine Learning-Speciality](https://learning.oreilly.com/videos/aws-certified-machine/9780135556597)\n", "* Purchasing video [Essential Machine Learning and AI with Python and Jupyter Notebook- Purchase Video](http://www.informit.com/store/essential-machine-learning-and-ai-with-python-and-jupyter-9780135261095)\n", "* Viewing more content at [noahgift.com](https://noahgift.com/)\n" ] }, { "metadata": { "id": "pBTeTbnRKG_k", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "qqAIvhP5N4iu", "colab_type": "text" }, "cell_type": "markdown", "source": [ "## 13.1 Sort in python" ] }, { "metadata": { "id": "quAhiuBw-sq9", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Understanding Sorting\n", "\n", "Python has powerful built-in sorting\n" ] }, { "metadata": { "id": "wwHjj7Br-sUk", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### World Food Facts DataSet " ] }, { "metadata": { "id": "yY9gKLFD-03J", "colab_type": "text" }, "cell_type": "markdown", "source": [ "* Original Data Source: https://www.kaggle.com/openfoodfacts/world-food-facts\n", "* Modified Source: https://www.kaggle.com/lwodarzek/nutrition-table-clustering/output" ] }, { "metadata": { "id": "Y8a6auMZTzoa", "colab_type": "text" }, "cell_type": "markdown", "source": [ "##### Ingest" ] }, { "metadata": { "id": "co7HfeLmvvZV", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "import pandas as pd" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "77Ih-vxcv-BR", "colab_type": "code", "outputId": "2032ecd4-f6ce-47ce-ac72-8658573261da", "colab": { "base_uri": "https://localhost:8080/", "height": 198 } }, "cell_type": "code", "source": [ "df = pd.read_csv(\n", " \"https://raw.githubusercontent.com/noahgift/food/master/data/features.en.openfoodfacts.org.products.csv\")\n", "df.drop([\"Unnamed: 0\", \"exceeded\", \"g_sum\", \"energy_100g\"], axis=1, inplace=True) #drop two rows we don't need\n", "df = df.drop(df.index[[1,11877]]) #drop outlier\n", "df.rename(index=str, columns={\"reconstructed_energy\": \"energy_100g\"}, inplace=True)\n", "df.head()" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fat_100gcarbohydrates_100gsugars_100gproteins_100gsalt_100genergy_100gproduct
028.5764.2914.293.570.000002267.85Banana Chips Sweetened (Whole)
257.1417.863.5717.861.224282835.70Organic Salted Nut Mix
318.7557.8115.6214.060.139701953.04Organic Muesli
436.6736.673.3316.671.607822336.91Zen Party Mix
518.1860.0021.8214.550.022861976.37Cinnamon Nut Granola
\n", "
" ], "text/plain": [ " fat_100g carbohydrates_100g sugars_100g proteins_100g salt_100g \\\n", "0 28.57 64.29 14.29 3.57 0.00000 \n", "2 57.14 17.86 3.57 17.86 1.22428 \n", "3 18.75 57.81 15.62 14.06 0.13970 \n", "4 36.67 36.67 3.33 16.67 1.60782 \n", "5 18.18 60.00 21.82 14.55 0.02286 \n", "\n", " energy_100g product \n", "0 2267.85 Banana Chips Sweetened (Whole) \n", "2 2835.70 Organic Salted Nut Mix \n", "3 1953.04 Organic Muesli \n", "4 2336.91 Zen Party Mix \n", "5 1976.37 Cinnamon Nut Granola " ] }, "metadata": { "tags": [] }, "execution_count": 43 } ] }, { "metadata": { "id": "GC88vFkPQwLO", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Using built-in sorting\n", "\n", "Convert Pandas DataFrame Columns into a list" ] }, { "metadata": { "id": "-7lMsgk3Qz0m", "colab_type": "code", "outputId": "f2ed2a81-305e-4e3f-de68-83aa98b2d3cc", "colab": { "base_uri": "https://localhost:8080/", "height": 138 } }, "cell_type": "code", "source": [ "food_facts = list(df.columns.values)\n", "food_facts" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['fat_100g',\n", " 'carbohydrates_100g',\n", " 'sugars_100g',\n", " 'proteins_100g',\n", " 'salt_100g',\n", " 'energy_100g',\n", " 'product']" ] }, "metadata": { "tags": [] }, "execution_count": 44 } ] }, { "metadata": { "id": "4z2e2QHV4EvD", "colab_type": "text" }, "cell_type": "markdown", "source": [ "##### Alphabetical Sort" ] }, { "metadata": { "id": "4_PU9Buv3wiT", "colab_type": "code", "outputId": "c3f08d56-ffe6-4687-9602-ed9c2385c2c7", "colab": { "base_uri": "https://localhost:8080/", "height": 138 } }, "cell_type": "code", "source": [ "sorted(food_facts)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['carbohydrates_100g',\n", " 'energy_100g',\n", " 'fat_100g',\n", " 'product',\n", " 'proteins_100g',\n", " 'salt_100g',\n", " 'sugars_100g']" ] }, "metadata": { "tags": [] }, "execution_count": 45 } ] }, { "metadata": { "id": "7PISBoZr4xgp", "colab_type": "text" }, "cell_type": "markdown", "source": [ "##### Reverse Alphabetical Sort" ] }, { "metadata": { "id": "kn27GHO640j7", "colab_type": "code", "outputId": "67d7d8f3-e731-47d5-ffe7-cead4d6914bf", "colab": { "base_uri": "https://localhost:8080/", "height": 138 } }, "cell_type": "code", "source": [ "sorted(food_facts, reverse=True)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['sugars_100g',\n", " 'salt_100g',\n", " 'proteins_100g',\n", " 'product',\n", " 'fat_100g',\n", " 'energy_100g',\n", " 'carbohydrates_100g']" ] }, "metadata": { "tags": [] }, "execution_count": 46 } ] }, { "metadata": { "id": "OqDh2Tdz5xoX", "colab_type": "text" }, "cell_type": "markdown", "source": [ "##### Using built in list sort" ] }, { "metadata": { "id": "mL_jSyb16ANi", "colab_type": "text" }, "cell_type": "markdown", "source": [ "Only works on a list" ] }, { "metadata": { "id": "IYVo0SlR6FFF", "colab_type": "code", "outputId": "d8ded999-a582-45ce-96f7-7547f218fc8e", "colab": { "base_uri": "https://localhost:8080/", "height": 52 } }, "cell_type": "code", "source": [ "food_facts = list(df.columns.values)\n", "print(f\"Before sort: {food_facts}\")\n", "food_facts.sort()\n", "print(f\"After sort: {food_facts}\")\n" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "Before sort: ['fat_100g', 'carbohydrates_100g', 'sugars_100g', 'proteins_100g', 'salt_100g', 'energy_100g', 'product']\n", "After sort: ['carbohydrates_100g', 'energy_100g', 'fat_100g', 'product', 'proteins_100g', 'salt_100g', 'sugars_100g']\n" ], "name": "stdout" } ] }, { "metadata": { "id": "9re2e5r8OghX", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "LRZ_0TGH6oB4", "colab_type": "text" }, "cell_type": "markdown", "source": [ "##### Timing built-in sort function vs list sort method" ] }, { "metadata": { "id": "ytsl-5Mj68XH", "colab_type": "text" }, "cell_type": "markdown", "source": [ "list method" ] }, { "metadata": { "id": "V2nnwt9s7DWL", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "food_facts = list(df.columns.values)" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "vcQ36Evq6z5h", "colab_type": "code", "outputId": "a1113bd3-7899-4365-c1e5-a26a956ef9c6", "colab": { "base_uri": "https://localhost:8080/", "height": 35 } }, "cell_type": "code", "source": [ "%%timeit -n 3 -r 3\n", "food_facts.sort()\n", "\n" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "3 loops, best of 3: 489 ns per loop\n" ], "name": "stdout" } ] }, { "metadata": { "id": "QhPpYNaM7JJb", "colab_type": "text" }, "cell_type": "markdown", "source": [ "built in function" ] }, { "metadata": { "id": "ue1XDqAc7LDJ", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "food_facts = list(df.columns.values)" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "ZaIestd37Mrt", "colab_type": "code", "outputId": "b1aea201-5e5e-4673-d55b-ab46eaa40de9", "colab": { "base_uri": "https://localhost:8080/", "height": 35 } }, "cell_type": "code", "source": [ "%%timeit -n 3 -r 3\n", "sorted(food_facts)" ], "execution_count": 0, "outputs": [ { "output_type": "stream", "text": [ "3 loops, best of 3: 656 ns per loop\n" ], "name": "stdout" } ] }, { "metadata": { "id": "RSggCymO8nny", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Sorting Dictionary" ] }, { "metadata": { "id": "2Q3ukiyB9jLG", "colab_type": "text" }, "cell_type": "markdown", "source": [ "sorting a dictionary" ] }, { "metadata": { "id": "GSBRp6_q9Omq", "colab_type": "code", "outputId": "49280032-3ecf-4f3e-f8e2-10536d8c6613", "colab": { "base_uri": "https://localhost:8080/", "height": 138 } }, "cell_type": "code", "source": [ "food_facts_row = df.head(1).to_dict()\n", "food_facts_row" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "{'carbohydrates_100g': {'0': 64.29},\n", " 'energy_100g': {'0': 2267.85},\n", " 'fat_100g': {'0': 28.57},\n", " 'product': {'0': 'Banana Chips Sweetened (Whole)'},\n", " 'proteins_100g': {'0': 3.57},\n", " 'salt_100g': {'0': 0.0},\n", " 'sugars_100g': {'0': 14.29}}" ] }, "metadata": { "tags": [] }, "execution_count": 55 } ] }, { "metadata": { "id": "SatiHTXJ_Teq", "colab_type": "text" }, "cell_type": "markdown", "source": [ "reverse sort dictionary" ] }, { "metadata": { "id": "qhUojdlf9oSI", "colab_type": "code", "outputId": "620c2a75-4c71-4e5d-f138-a88c99ee81eb", "colab": { "base_uri": "https://localhost:8080/", "height": 138 } }, "cell_type": "code", "source": [ "sorted(food_facts_row, reverse=True)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['sugars_100g',\n", " 'salt_100g',\n", " 'proteins_100g',\n", " 'product',\n", " 'fat_100g',\n", " 'energy_100g',\n", " 'carbohydrates_100g']" ] }, "metadata": { "tags": [] }, "execution_count": 56 } ] }, { "metadata": { "id": "4-TPMSwWADEf", "colab_type": "code", "outputId": "28141a9a-8be0-40be-c4e2-a98ad0dc10fc", "colab": { "base_uri": "https://localhost:8080/", "height": 69 } }, "cell_type": "code", "source": [ "df[\"product\"].head().values" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "array(['Banana Chips Sweetened (Whole)', 'Organic Salted Nut Mix',\n", " 'Organic Muesli', 'Zen Party Mix', 'Cinnamon Nut Granola'],\n", " dtype=object)" ] }, "metadata": { "tags": [] }, "execution_count": 57 } ] }, { "metadata": { "id": "Em50pLNG_cYo", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Sorting A Generator Pipeline" ] }, { "metadata": { "id": "uYo2NlNh_sHR", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "def dataframe_rows(df=df, column=\"product\", chunks=10):\n", " \n", " count_row = df.shape[0]\n", " rows = list(df[column].values)\n", " for i in range(0, count_row, chunks):\n", " yield rows[i:i + chunks]\n", " \n", " " ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "6Po9GpYABmxD", "colab_type": "code", "outputId": "39aad0d3-917b-4ff4-dba5-1fe7d46b6c37", "colab": { "base_uri": "https://localhost:8080/", "height": 190 } }, "cell_type": "code", "source": [ "rows = dataframe_rows()\n", "next(rows)\n" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['Banana Chips Sweetened (Whole)',\n", " 'Organic Salted Nut Mix',\n", " 'Organic Muesli',\n", " 'Zen Party Mix',\n", " 'Cinnamon Nut Granola',\n", " 'Organic Hazelnuts',\n", " 'Organic Oat Groats',\n", " 'Energy Power Mix',\n", " 'Antioxidant Mix - Berries & Chocolate',\n", " 'Organic Quinoa Coconut Granola With Mango']" ] }, "metadata": { "tags": [] }, "execution_count": 59 } ] }, { "metadata": { "id": "1Ka1vYOzHgSn", "colab_type": "code", "outputId": "8f710a40-094f-43e9-e29e-2baab3f6d3f2", "colab": { "base_uri": "https://localhost:8080/", "height": 190 } }, "cell_type": "code", "source": [ "next(rows)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['Fire Roasted Hatch Green Chile Almonds',\n", " 'Peanut Butter Power Chews',\n", " 'Organic Unswt Berry Coconut Granola',\n", " 'Roasted Salted Black Pepper Cashews',\n", " 'Thai Curry Roasted Cashews',\n", " 'Wasabi Tamari Almonds',\n", " 'Organic Red Quinoa',\n", " 'Dark Chocolate Coconut Chews',\n", " 'Organic Unsweetened Granola, Cinnamon Almond',\n", " 'Organic Blueberry Almond Granola']" ] }, "metadata": { "tags": [] }, "execution_count": 60 } ] }, { "metadata": { "id": "YbEsKMqYHpuR", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "sorted_row = (sorted(row) for row in rows )\n", "print(next(sorted_row))" ], "execution_count": 0, "outputs": [] }, { "metadata": { "colab_type": "text", "id": "1VErz5Z7g0v7" }, "cell_type": "markdown", "source": [ "## 13.2 Create custom sorting functions" ] }, { "metadata": { "id": "Sa-OqhQgIwl4", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Building a Shuffle Function" ] }, { "metadata": { "id": "UstkrDyVI1gM", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "food_items = ['Chocolate Nut Crunch', 'Cranberries', 'Curry Lentil Soup Mix', \n", " 'Milk Chocolate Peanut Butter Malt Balls', 'Organic Harvest Pilaf', \n", " 'Organic Tamari Pumpkin Seed', 'Split Pea Soup Mix', \n", " 'Swiss-Style Muesli', \"Whole Wheat 'N Honey Fig Bars\", \n", " 'Yogurt Pretzels']\n" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "3NCAIctKI4ZT", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "from random import sample\n", "\n", "def shuffle_list(items):\n", " \"\"\"Randomly Shuffles List\"\"\"\n", " \n", " shuffled = sample(items, len(items))\n", " return shuffled\n", " " ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "0NjLGWrSKB9y", "colab_type": "code", "outputId": "2738b1a7-04da-4169-c7f7-0e9ff6bbbf0c", "colab": { "base_uri": "https://localhost:8080/", "height": 190 } }, "cell_type": "code", "source": [ "shuffled_food_items = shuffle_list(food_items)\n", "shuffled_food_items" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['Milk Chocolate Peanut Butter Malt Balls',\n", " 'Organic Harvest Pilaf',\n", " 'Curry Lentil Soup Mix',\n", " 'Yogurt Pretzels',\n", " 'Organic Tamari Pumpkin Seed',\n", " 'Chocolate Nut Crunch',\n", " \"Whole Wheat 'N Honey Fig Bars\",\n", " 'Split Pea Soup Mix',\n", " 'Cranberries',\n", " 'Swiss-Style Muesli']" ] }, "metadata": { "tags": [] }, "execution_count": 76 } ] }, { "metadata": { "id": "i_Skh8l8S6re", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Custom Sort Functions" ] }, { "metadata": { "id": "OrWriGn-S_S6", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Highly Customized Sort" ] }, { "metadata": { "id": "IRoNvB9RS_fM", "colab_type": "code", "outputId": "46c9839a-78db-4ee2-ff64-7a9a19303bae", "colab": { "base_uri": "https://localhost:8080/", "height": 190 } }, "cell_type": "code", "source": [ "def best_snack(item):\n", " if item == \"Chocolate Nut Crunch\":\n", " return 1\n", " return len(item) \n", "\n", "sorted(shuffled_food_items, key=best_snack)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['Chocolate Nut Crunch',\n", " 'Cranberries',\n", " 'Yogurt Pretzels',\n", " 'Split Pea Soup Mix',\n", " 'Swiss-Style Muesli',\n", " 'Organic Harvest Pilaf',\n", " 'Curry Lentil Soup Mix',\n", " 'Organic Tamari Pumpkin Seed',\n", " \"Whole Wheat 'N Honey Fig Bars\",\n", " 'Milk Chocolate Peanut Butter Malt Balls']" ] }, "metadata": { "tags": [] }, "execution_count": 77 } ] }, { "metadata": { "id": "31myQds7Ns_6", "colab_type": "text" }, "cell_type": "markdown", "source": [ "#### Sorting Objects" ] }, { "metadata": { "id": "_h35dY358Iln", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "class Food:\n", " def __init__(self, product, protein):\n", " self.product = product\n", " self.protein = protein\n", " def __repr__(self):\n", " return f\"Food: {self.product}, Protein: {self.protein}\"" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "KQ4vP3E-9Cfi", "colab_type": "code", "outputId": "c6c9321a-4bd9-4e96-d05c-f9faf50799b9", "colab": { "base_uri": "https://localhost:8080/", "height": 104 } }, "cell_type": "code", "source": [ "pairs = df[[\"product\", \"proteins_100g\"]].head().values.tolist()\n", "pairs" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[['Banana Chips Sweetened (Whole)', 3.57],\n", " ['Organic Salted Nut Mix', 17.86],\n", " ['Organic Muesli', 14.06],\n", " ['Zen Party Mix', 16.67],\n", " ['Cinnamon Nut Granola', 14.55]]" ] }, "metadata": { "tags": [] }, "execution_count": 79 } ] }, { "metadata": { "id": "CGIVqRgaPbEE", "colab_type": "code", "outputId": "d87f50f1-e268-45f5-9956-ef52f98cf214", "colab": { "base_uri": "https://localhost:8080/", "height": 104 } }, "cell_type": "code", "source": [ "pairs = df[[\"product\", \"proteins_100g\"]].head().values.tolist()\n", "foods = [Food(item[0], item[1]) for item in pairs]\n", "foods" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[Food: Banana Chips Sweetened (Whole), Protein: 3.57,\n", " Food: Organic Salted Nut Mix, Protein: 17.86,\n", " Food: Organic Muesli, Protein: 14.06,\n", " Food: Zen Party Mix, Protein: 16.67,\n", " Food: Cinnamon Nut Granola, Protein: 14.55]" ] }, "metadata": { "tags": [] }, "execution_count": 80 } ] }, { "metadata": { "id": "mQ-Pb-AHUZQz", "colab_type": "code", "outputId": "965add5f-4bc5-4c1f-8bce-3342151336af", "colab": { "base_uri": "https://localhost:8080/", "height": 104 } }, "cell_type": "code", "source": [ "sorted(foods, key=lambda food: food.protein)\n" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[Food: Banana Chips Sweetened (Whole), Protein: 3.57,\n", " Food: Organic Muesli, Protein: 14.06,\n", " Food: Cinnamon Nut Granola, Protein: 14.55,\n", " Food: Zen Party Mix, Protein: 16.67,\n", " Food: Organic Salted Nut Mix, Protein: 17.86]" ] }, "metadata": { "tags": [] }, "execution_count": 81 } ] }, { "metadata": { "id": "lCljI0HYRM-k", "colab_type": "code", "outputId": "07430fac-1d6b-41a9-e2b1-4c1990e81f11", "colab": { "base_uri": "https://localhost:8080/", "height": 35 } }, "cell_type": "code", "source": [ "foods[0].__dict__\n", "type(foods[0])" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "__main__.Food" ] }, "metadata": { "tags": [] }, "execution_count": 85 } ] }, { "metadata": { "colab_type": "text", "id": "esYhvVKHg-TM" }, "cell_type": "markdown", "source": [ "## 13.3 Sort in pandas" ] }, { "metadata": { "id": "iQawn9MQJIph", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Sort by One Column: Protein" ] }, { "metadata": { "id": "wvmwiE7N5b2J", "colab_type": "code", "outputId": "3735e5c7-cd8b-41cf-f14a-5f0d00001cdd", "colab": { "base_uri": "https://localhost:8080/", "height": 233 } }, "cell_type": "code", "source": [ "df.sort_values(by=[\"carbohydrates_100g\"], ascending=False).head(5)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fat_100gcarbohydrates_100gsugars_100gproteins_100gsalt_100genergy_100gproduct
420120.0100.085.710.00.0001700.0Spongebob Squarepants Valentine Candy Card Kit
318270.0100.080.000.00.0001700.0Marvel Avengers Assemble, Classroom Candy Mail...
316610.0100.0100.000.00.0001700.0White Crystal Sugar
316650.0100.00.000.00.2541700.0Dried Habanero Chiles
423660.0100.088.890.00.0001700.0Iced Tea Mix, Lemon
\n", "
" ], "text/plain": [ " fat_100g carbohydrates_100g sugars_100g proteins_100g salt_100g \\\n", "42012 0.0 100.0 85.71 0.0 0.000 \n", "31827 0.0 100.0 80.00 0.0 0.000 \n", "31661 0.0 100.0 100.00 0.0 0.000 \n", "31665 0.0 100.0 0.00 0.0 0.254 \n", "42366 0.0 100.0 88.89 0.0 0.000 \n", "\n", " energy_100g product \n", "42012 1700.0 Spongebob Squarepants Valentine Candy Card Kit \n", "31827 1700.0 Marvel Avengers Assemble, Classroom Candy Mail... \n", "31661 1700.0 White Crystal Sugar \n", "31665 1700.0 Dried Habanero Chiles \n", "42366 1700.0 Iced Tea Mix, Lemon " ] }, "metadata": { "tags": [] }, "execution_count": 89 } ] }, { "metadata": { "id": "bq_K-nnPWQcA", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Sort by Two Columns: Sugar, Salt" ] }, { "metadata": { "id": "iI2ogrIvKYIm", "colab_type": "code", "outputId": "c789bc74-8b9e-4f5c-c9b3-98690ab6deec", "colab": { "base_uri": "https://localhost:8080/", "height": 348 } }, "cell_type": "code", "source": [ "df.sort_values(by=[\"fat_100g\", \"salt_100g\"], ascending=[False, False]).head(10)" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fat_100gcarbohydrates_100gsugars_100gproteins_100gsalt_100genergy_100gproduct
8390100.020.000.000.001.5244240.00Horseradish Sauce
44709100.017.863.5710.710.3814385.69Roasted Pecans
295100.00.000.000.000.0003900.00Ventura, Soybean - Peanut Frying Oil Blend
5122100.00.000.000.000.0003900.00Corn Oil
5123100.00.000.000.000.0003900.00Canola Oil
5124100.00.000.000.000.0003900.00Vegetable Oil
5125100.00.000.000.000.0003900.00Vegetable Shortening
5671100.00.000.000.000.0003900.00Organic Coconut Oil
5797100.00.000.000.000.0003900.00Premium Sesame Oil (100% Pure)
5798100.00.000.000.000.0003900.00Sesame Oil
\n", "
" ], "text/plain": [ " fat_100g carbohydrates_100g sugars_100g proteins_100g salt_100g \\\n", "8390 100.0 20.00 0.00 0.00 1.524 \n", "44709 100.0 17.86 3.57 10.71 0.381 \n", "295 100.0 0.00 0.00 0.00 0.000 \n", "5122 100.0 0.00 0.00 0.00 0.000 \n", "5123 100.0 0.00 0.00 0.00 0.000 \n", "5124 100.0 0.00 0.00 0.00 0.000 \n", "5125 100.0 0.00 0.00 0.00 0.000 \n", "5671 100.0 0.00 0.00 0.00 0.000 \n", "5797 100.0 0.00 0.00 0.00 0.000 \n", "5798 100.0 0.00 0.00 0.00 0.000 \n", "\n", " energy_100g product \n", "8390 4240.00 Horseradish Sauce \n", "44709 4385.69 Roasted Pecans \n", "295 3900.00 Ventura, Soybean - Peanut Frying Oil Blend \n", "5122 3900.00 Corn Oil \n", "5123 3900.00 Canola Oil \n", "5124 3900.00 Vegetable Oil \n", "5125 3900.00 Vegetable Shortening \n", "5671 3900.00 Organic Coconut Oil \n", "5797 3900.00 Premium Sesame Oil (100% Pure) \n", "5798 3900.00 Sesame Oil " ] }, "metadata": { "tags": [] }, "execution_count": 92 } ] }, { "metadata": { "id": "sxVe1CtO_FKr", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Groupby" ] }, { "metadata": { "id": "Ljcivk3_AW1v", "colab_type": "code", "outputId": "fd234a0f-0a04-4e04-ff40-7d84ce861626", "colab": { "base_uri": "https://localhost:8080/", "height": 215 } }, "cell_type": "code", "source": [ "def high_protein(row):\n", " \"\"\"Creates a high or low protein category\"\"\"\n", " \n", " if row > 80:\n", " return \"high_protein\"\n", " return \"low_protein\"\n", "\n", "df[\"high_protein\"] = df[\"proteins_100g\"].apply(high_protein)\n", "df.head()" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fat_100gcarbohydrates_100gsugars_100gproteins_100gsalt_100genergy_100gproducthigh_protein
028.5764.2914.293.570.000002267.85Banana Chips Sweetened (Whole)low_protein
257.1417.863.5717.861.224282835.70Organic Salted Nut Mixlow_protein
318.7557.8115.6214.060.139701953.04Organic Mueslilow_protein
436.6736.673.3316.671.607822336.91Zen Party Mixlow_protein
518.1860.0021.8214.550.022861976.37Cinnamon Nut Granolalow_protein
\n", "
" ], "text/plain": [ " fat_100g carbohydrates_100g sugars_100g proteins_100g salt_100g \\\n", "0 28.57 64.29 14.29 3.57 0.00000 \n", "2 57.14 17.86 3.57 17.86 1.22428 \n", "3 18.75 57.81 15.62 14.06 0.13970 \n", "4 36.67 36.67 3.33 16.67 1.60782 \n", "5 18.18 60.00 21.82 14.55 0.02286 \n", "\n", " energy_100g product high_protein \n", "0 2267.85 Banana Chips Sweetened (Whole) low_protein \n", "2 2835.70 Organic Salted Nut Mix low_protein \n", "3 1953.04 Organic Muesli low_protein \n", "4 2336.91 Zen Party Mix low_protein \n", "5 1976.37 Cinnamon Nut Granola low_protein " ] }, "metadata": { "tags": [] }, "execution_count": 93 } ] }, { "metadata": { "id": "bNwREeat_HLX", "colab_type": "code", "outputId": "dcf29a31-f78e-4945-c984-10b3bf877711", "colab": { "base_uri": "https://localhost:8080/", "height": 138 } }, "cell_type": "code", "source": [ "df.groupby(\"high_protein\").median()" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fat_100gcarbohydrates_100gsugars_100gproteins_100gsalt_100genergy_100g
high_protein
high_protein1.6653.3351.66593.180.52071700.00
low_protein3.17022.3905.8804.000.63501121.54
\n", "
" ], "text/plain": [ " fat_100g carbohydrates_100g sugars_100g proteins_100g \\\n", "high_protein \n", "high_protein 1.665 3.335 1.665 93.18 \n", "low_protein 3.170 22.390 5.880 4.00 \n", "\n", " salt_100g energy_100g \n", "high_protein \n", "high_protein 0.5207 1700.00 \n", "low_protein 0.6350 1121.54 " ] }, "metadata": { "tags": [] }, "execution_count": 94 } ] }, { "metadata": { "id": "dpEhkJnxTtUb", "colab_type": "code", "outputId": "e96bb739-6fc3-45b7-db55-d09cef972c96", "colab": { "base_uri": "https://localhost:8080/", "height": 217 } }, "cell_type": "code", "source": [ "df.groupby(\"high_protein\").describe()" ], "execution_count": 0, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
carbohydrates_100genergy_100g...salt_100gsugars_100g
countmeanstdmin25%50%75%maxcountmean...75%maxcountmeanstdmin25%50%75%max
high_protein
high_protein4.07.35000010.7246100.00.003.33510.68522.734.01795.09500...4.20306514.777724.04.2425006.4586710.00.001.6655.907513.64
low_protein45022.034.05643629.5575040.07.4422.39061.540100.0045022.01111.22544...1.4401802032.0000045022.016.00612221.496335-1.21.575.88023.0800100.00
\n", "

2 rows × 48 columns

\n", "
" ], "text/plain": [ " carbohydrates_100g \\\n", " count mean std min 25% 50% \n", "high_protein \n", "high_protein 4.0 7.350000 10.724610 0.0 0.00 3.335 \n", "low_protein 45022.0 34.056436 29.557504 0.0 7.44 22.390 \n", "\n", " energy_100g ... salt_100g \\\n", " 75% max count mean ... 75% \n", "high_protein ... \n", "high_protein 10.685 22.73 4.0 1795.09500 ... 4.203065 \n", "low_protein 61.540 100.00 45022.0 1111.22544 ... 1.440180 \n", "\n", " sugars_100g \\\n", " max count mean std min 25% 50% \n", "high_protein \n", "high_protein 14.77772 4.0 4.242500 6.458671 0.0 0.00 1.665 \n", "low_protein 2032.00000 45022.0 16.006122 21.496335 -1.2 1.57 5.880 \n", "\n", " \n", " 75% max \n", "high_protein \n", "high_protein 5.9075 13.64 \n", "low_protein 23.0800 100.00 \n", "\n", "[2 rows x 48 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 95 } ] } ] }