diff --git a/1-Introduction/01-defining-data-science/notebook.ipynb b/1-Introduction/01-defining-data-science/notebook.ipynb index cf3988e85..75ca03058 100644 --- a/1-Introduction/01-defining-data-science/notebook.ipynb +++ b/1-Introduction/01-defining-data-science/notebook.ipynb @@ -1,419 +1,1020 @@ { - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Challenge: Analyzing Text about Data Science\r\n", - "\r\n", - "In this example, let's do a simple exercise that covers all steps of a traditional data science process. You do not have to write any code, you can just click on the cells below to execute them and observe the result. As a challenge, you are encouraged to try this code out with different data. \r\n", - "\r\n", - "## Goal\r\n", - "\r\n", - "In this lesson, we have been discussing different concepts related to Data Science. Let's try to discover more related concepts by doing some **text mining**. We will start with a text about Data Science, extract keywords from it, and then try to visualize the result.\r\n", - "\r\n", - "As a text, I will use the page on Data Science from Wikipedia:" - ], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [], - "metadata": {} - }, - { - "cell_type": "code", - "execution_count": 62, - "source": [ - "url = 'https://en.wikipedia.org/wiki/Data_science'" - ], - "outputs": [], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [ - "## Step 1: Getting the Data\r\n", - "\r\n", - "First step in every data science process is getting the data. We will use `requests` library to do that:" - ], - "metadata": {} - }, - { - "cell_type": "code", - "execution_count": 63, - "source": [ - "import requests\r\n", - "\r\n", - "text = requests.get(url).content.decode('utf-8')\r\n", - "print(text[:1000])" - ], - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "\n", - "\n", - "\n", - "\n", - "Data science - Wikipedia\n", - "\n", + " \n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + " \n", + " \n" + ] + }, + "metadata": {}, + "execution_count": 37 + } + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 363 + }, + "id": "HJs0xKfP4qCt", + "outputId": "4d1b94d6-05d9-4fb0-b43e-c32923c5ddb2" + } + }, + { + "cell_type": "markdown", + "source": [ + "\n", + "In this dataset, columns as the following:\n", + "* Age and sex are self-explanatory\n", + "* BMI is body mass index\n", + "* BP is average blood pressure\n", + "* S1 through S6 are different blood measurements\n", + "* Y is the qualitative measure of disease progression over one year\n", + "\n", + "Let's study this dataset using methods of probability and statistics.\n", + "\n", + "### Task 1: Compute mean values and variance for all values" + ], + "metadata": { + "id": "NXyoxwjU4qC1" + } + }, + { + "cell_type": "code", + "execution_count": 7, + "source": [ + "# Calculate the mean and variance of all numeric columns\n", + "\n", + "mean = df.mean()\n", + "variance = df.var()\n", + "\n", + "# Create a new DataFrame with the results\n", + "results = pd.DataFrame({'Mean': mean, 'Variance': variance})\n", + "\n", + "# Display the result\n", + "print(results.round(3))" + ], + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " Media Varianza\n", + "AGE 48.518 171.847\n", + "SEX 1.468 0.250\n", + "BMI 26.376 19.520\n", + "BP 94.647 191.304\n", + "S1 189.140 1197.717\n", + "S2 115.439 924.955\n", + "S3 49.788 167.294\n", + "S4 4.070 1.665\n", + "S5 4.641 0.273\n", + "S6 91.260 132.166\n", + "Y 152.133 5943.331\n" + ] + } + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ZxVqeOXV4qC5", + "outputId": "4bf7a7c3-c22f-48f3-bbde-49f4c10f7005" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Task 2: Plot boxplots for BMI, BP and Y depending on gender" + ], + "metadata": { + "id": "l9jvD6Ou4qC7" + } + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 447 + }, + "id": "TFRfrvyKIUW8", + "outputId": "440ae9a1-dbb6-4074-f0c3-1078da2c12bb" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "# Create a figure with three subplots\n", + "fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n", + "\n", + "# Plot the boxplots for BMI, BP, and Y depending on gender\n", + "sns.boxplot(x='SEX', y='BMI', data=df, ax=axes[0])\n", + "sns.boxplot(x='SEX', y='BP', data=df, ax=axes[1])\n", + "sns.boxplot(x='SEX', y='Y', data=df, ax=axes[2])\n", + "\n", + "# Set the titles for the subplots\n", + "axes[0].set_title('BMI')\n", + "axes[1].set_title('BP')\n", + "axes[2].set_title('Y')\n", + "\n", + "# Show the plot\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Task 3: What is the the distribution of Age, Sex, BMI and Y variables?" + ], + "metadata": { + "id": "I7N3sUbQ4qDB" + } + }, + { + "cell_type": "code", + "execution_count": 19, + "source": [ + "# Create a figure with four subplots\n", + "fig, axes = plt.subplots(2, 2, figsize=(10, 10))\n", + "\n", + "# Create histograms for the variables Age, Sex, BMI, and Y.\n", + "axes[0, 0].hist(df['AGE'])\n", + "axes[0, 1].hist(df['SEX'])\n", + "axes[1, 0].hist(df['BMI'])\n", + "axes[1, 1].hist(df['Y'])\n", + "\n", + "# Set the titles for the subplots\n", + "axes[0, 0].set_title('Age')\n", + "axes[0, 1].set_title('Sex')\n", + "axes[1, 0].set_title('BMI')\n", + "axes[1, 1].set_title('Y')\n", + "\n", + "# Show the plot\n", + "plt.show()" + ], + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 853 + }, + "id": "yKaqwzQb4qDD", + "outputId": "3027725f-b6ea-4802-a925-679ff84e6c8d" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Task 4: Test the correlation between different variables and disease progression (Y)\n", + "\n", + "> **Hint** Correlation matrix would give you the most useful information on which values are dependent." + ], + "metadata": { + "id": "d7M0nVE54qDG" + } + }, + { + "cell_type": "code", + "source": [ + "# Calculate the correlation between columns\n", + "corr = df[['AGE','BMI','BP','Y']].corr()\n", + "\n", + "# Create a heatmap to show the correlation\n", + "sns.heatmap(corr, annot=True)\n", + "\n", + "# Show the plot\n", + "plt.show()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 435 + }, + "id": "-y55KMmPBtqx", + "outputId": "05353b4b-bdca-4019-beb0-68c962c7aafe" + }, + "execution_count": 31, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Create a figure with three subplots\n", + "fig, axes = plt.subplots(1, 2, figsize=(15, 5))\n", + "\n", + "# Plot the boxplots for BMI, BP, and Y depending on gender\n", + "sns.scatterplot(x='Y', y='BMI', data=df, ax=axes[0])\n", + "sns.scatterplot(x='Y', y='BP', data=df, ax=axes[1])\n", + "\n", + "# Show the plot\n", + "plt.show()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 427 + }, + "id": "gC5Fqp19FZo0", + "outputId": "4cc0e3ee-eb91-4fcc-81d5-3abebf656e39" + }, + "execution_count": 35, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [], + "metadata": { + "id": "dh-DxaqH4qDK" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Task 5: Test the hypothesis that the degree of diabetes progression is different between men and women" + ], + "metadata": { + "id": "lHAZDxcY4qDL" + } + }, + { + "cell_type": "markdown", + "source": [], + "metadata": { + "id": "sHpUiNUU4qDN" + } + }, + { + "cell_type": "code", + "source": [ + "df.groupby('SEX').agg({ 'Y' : 'mean'})" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 143 + }, + "id": "9el94npsGWFw", + "outputId": "ec34d5d5-7ffb-4506-c279-54b01d456344" + }, + "execution_count": 36, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Y\n", + "SEX \n", + "1 149.021277\n", + "2 155.666667" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Y
SEX
1149.021277
2155.666667
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 36 + } + ] + }, + { + "cell_type": "code", + "source": [ + "import scipy.stats\n", + "\n", + "def mean_confidence_interval(data, confidence=0.95):\n", + " a = 1.0 * np.array(data)\n", + " n = len(a)\n", + " m, se = np.mean(a), scipy.stats.sem(a)\n", + " h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1)\n", + " return m, h\n", + "\n", + "for p in [0.85, 0.9, 0.95]:\n", + " m, h = mean_confidence_interval(df['Y'].fillna(method='pad'),p)\n", + " print(f\"p={p:.2f}, mean = {m:.2f} ± {h:.2f}\")" + ], + "metadata": { + "id": "tZTMnAbkIkx-", + "outputId": "5616ad69-dfa0-4eb4-cdad-be89925b0c24", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 40, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "p=0.85, mean = 152.13 ± 5.29\n", + "p=0.90, mean = 152.13 ± 6.04\n", + "p=0.95, mean = 152.13 ± 7.21\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "for p in [0.85,0.9,0.95]:\n", + " m1, h1 = mean_confidence_interval(df.loc[df['SEX']=='1',['Y']].fillna(method='pad'),p)\n", + " m2, h2 = mean_confidence_interval(df.loc[df['SEX']=='2',['Y']].fillna(method='pad'),p)\n", + " print(f'Conf={p:.2f}, 1: {m1-h1[0]:.2f}..{m1+h1[0]:.2f}, 2 : {m2-h2[0]:.2f}..{m2+h2[0]:.2f}')" + ], + "metadata": { + "id": "aOoz2G37HmaM", + "outputId": "4fbb7e19-442f-46ba-bead-9321838e3c50", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 46, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Conf=0.85, 1: nan..nan, 2 : nan..nan\n", + "Conf=0.90, 1: nan..nan, 2 : nan..nan\n", + "Conf=0.95, 1: nan..nan, 2 : nan..nan\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.\n", + " return _methods._mean(a, axis=axis, dtype=dtype,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars\n", + " ret = ret.dtype.type(ret / rcount)\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:265: RuntimeWarning: Degrees of freedom <= 0 for slice\n", + " ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:223: RuntimeWarning: invalid value encountered in divide\n", + " arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:254: RuntimeWarning: invalid value encountered in divide\n", + " ret = um.true_divide(\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.\n", + " return _methods._mean(a, axis=axis, dtype=dtype,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars\n", + " ret = ret.dtype.type(ret / rcount)\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:265: RuntimeWarning: Degrees of freedom <= 0 for slice\n", + " ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:223: RuntimeWarning: invalid value encountered in divide\n", + " arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:254: RuntimeWarning: invalid value encountered in divide\n", + " ret = um.true_divide(\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.\n", + " return _methods._mean(a, axis=axis, dtype=dtype,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars\n", + " ret = ret.dtype.type(ret / rcount)\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:265: RuntimeWarning: Degrees of freedom <= 0 for slice\n", + " ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:223: RuntimeWarning: invalid value encountered in divide\n", + " arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:254: RuntimeWarning: invalid value encountered in divide\n", + " ret = um.true_divide(\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.\n", + " return _methods._mean(a, axis=axis, dtype=dtype,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars\n", + " ret = ret.dtype.type(ret / rcount)\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:265: RuntimeWarning: Degrees of freedom <= 0 for slice\n", + " ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:223: RuntimeWarning: invalid value encountered in divide\n", + " arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:254: RuntimeWarning: invalid value encountered in divide\n", + " ret = um.true_divide(\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.\n", + " return _methods._mean(a, axis=axis, dtype=dtype,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars\n", + " ret = ret.dtype.type(ret / rcount)\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:265: RuntimeWarning: Degrees of freedom <= 0 for slice\n", + " ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:223: RuntimeWarning: invalid value encountered in divide\n", + " arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:254: RuntimeWarning: invalid value encountered in divide\n", + " ret = um.true_divide(\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.\n", + " return _methods._mean(a, axis=axis, dtype=dtype,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars\n", + " ret = ret.dtype.type(ret / rcount)\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:265: RuntimeWarning: Degrees of freedom <= 0 for slice\n", + " ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:223: RuntimeWarning: invalid value encountered in divide\n", + " arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',\n", + "/usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:254: RuntimeWarning: invalid value encountered in divide\n", + " ret = um.true_divide(\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "from scipy.stats import ttest_ind\n", + "\n", + "tval, pval = ttest_ind(df.loc[df['SEX']=='1',['Y']], df.loc[df['SEX']=='2',['Y']],equal_var=False)\n", + "print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")" + ], + "metadata": { + "id": "nUBcO-GJJKhQ", + "outputId": "765d6a56-cc37-4bff-e6ec-e5c2a9f009c8", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 44, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "T-value = nan\n", + "P-value: nan\n" + ] + } ] - }, - "metadata": {}, - "execution_count": 13 } - ], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [ - "\r\n", - "In this dataset, columns as the following:\r\n", - "* Age and sex are self-explanatory\r\n", - "* BMI is body mass index\r\n", - "* BP is average blood pressure\r\n", - "* S1 through S6 are different blood measurements\r\n", - "* Y is the qualitative measure of disease progression over one year\r\n", - "\r\n", - "Let's study this dataset using methods of probability and statistics.\r\n", - "\r\n", - "### Task 1: Compute mean values and variance for all values" - ], - "metadata": {} - }, - { - "cell_type": "code", - "execution_count": null, - "source": [], - "outputs": [], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [ - "### Task 2: Plot boxplots for BMI, BP and Y depending on gender" - ], - "metadata": {} - }, - { - "cell_type": "code", - "execution_count": null, - "source": [], - "outputs": [], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [ - "### Task 3: What is the the distribution of Age, Sex, BMI and Y variables?" - ], - "metadata": {} - }, - { - "cell_type": "code", - "execution_count": null, - "source": [], - "outputs": [], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [ - "### Task 4: Test the correlation between different variables and disease progression (Y)\r\n", - "\r\n", - "> **Hint** Correlation matrix would give you the most useful information on which values are dependent." - ], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [ - "### Task 5: Test the hypothesis that the degree of diabetes progression is different between men and women" - ], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [], - "metadata": {} - } - ], - "metadata": { - "orig_nbformat": 4, - "language_info": { - "name": "python", - "version": "3.8.8", - "mimetype": "text/x-python", - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "pygments_lexer": "ipython3", - "nbconvert_exporter": "python", - "file_extension": ".py" - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3.8.8 64-bit (conda)" + ], + "metadata": { + "orig_nbformat": 4, + "language_info": { + "name": "python", + "version": "3.8.8", + "mimetype": "text/x-python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "file_extension": ".py" + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3.8.8 64-bit (conda)" + }, + "interpreter": { + "hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5" + }, + "colab": { + "provenance": [], + "toc_visible": true + } }, - "interpreter": { - "hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5" - } - }, - "nbformat": 4, - "nbformat_minor": 2 + "nbformat": 4, + "nbformat_minor": 0 } \ No newline at end of file diff --git a/1-Introduction/04-stats-and-probability/notebook.ipynb b/1-Introduction/04-stats-and-probability/notebook.ipynb index 208eee50c..188fc2de0 100644 --- a/1-Introduction/04-stats-and-probability/notebook.ipynb +++ b/1-Introduction/04-stats-and-probability/notebook.ipynb @@ -1,1122 +1,2076 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Introduction to Probability and Statistics\n", - "In this notebook, we will play around with some of the concepts we have previously discussed. Many concepts from probability and statistics are well-represented in major libraries for data processing in Python, such as `numpy` and `pandas`." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import pandas as pd\n", - "import random\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Random Variables and Distributions\n", - "Let's start with drawing a sample of 30 values from a uniform distribution from 0 to 9. We will also compute mean and variance." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Sample: [4, 8, 5, 10, 5, 1, 1, 1, 7, 9, 7, 0, 2, 7, 3, 5, 9, 8, 3, 10, 2, 9, 2, 9, 9, 8, 1, 8, 7, 3]\n", - "Mean = 5.433333333333334\n", - "Variance = 10.178888888888887\n" - ] - } - ], - "source": [ - "sample = [ random.randint(0,10) for _ in range(30) ]\n", - "print(f\"Sample: {sample}\")\n", - "print(f\"Mean = {np.mean(sample)}\")\n", - "print(f\"Variance = {np.var(sample)}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To visually estimate how many different values are there in the sample, we can plot the **histogram**:" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAL4UlEQVR4nO3db4xlBXnH8e/PXYiCGNpyayzLdDQ1tMZEIROqJSEt2AaKAV+0CSQaa0zmjbXQmJi1b5q+o0lj9IUx2SBKIsVYhNRASzUqMSbttrtAW2AhtXQrq+gOMRawSSn26Yu5C+ty1znL3nPvw8z3k0zm/jmc+xxm9svZc8/hpqqQJPX1qmUPIEn62Qy1JDVnqCWpOUMtSc0ZaklqbvcYKz3vvPNqdXV1jFVL0rZ08ODBp6pqMuu5UUK9urrKgQMHxli1JG1LSf7zZM956EOSmjPUktScoZak5gy1JDVnqCWpOUMtSc1tGeokFyZ58Livp5PcuIDZJEkMOI+6qh4D3g6QZBfwXeCucceSJB1zqoc+rgD+vapOemK2JGm+TvXKxOuA22c9kWQdWAdYWVk5zbEk6eVb3XvPUl738E1Xj7LewXvUSc4ErgH+atbzVbWvqtaqam0ymXm5uiTpZTiVQx9XAfdX1Q/GGkaS9FKnEurrOclhD0nSeAaFOslZwG8Dd447jiTpRIPeTKyq/wZ+YeRZJEkzeGWiJDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJak5Qy1JzRlqSWrOUEtSc4Zakpoz1JLUnKGWpOYMtSQ1Z6glqTlDLUnNGWpJam7op5Cfm+SOJI8mOZTknWMPJknaNOhTyIFPAvdW1e8lORM4a8SZJEnH2TLUSV4HXAb8AUBVPQc8N+5YkqRjhhz6eBOwAXw2yQNJbk5y9okLJVlPciDJgY2NjbkPKkk71ZBQ7wYuBj5dVRcBPwb2nrhQVe2rqrWqWptMJnMeU5J2riGhPgIcqar90/t3sBluSdICbBnqqvo+8ESSC6cPXQE8MupUkqQXDD3r48PAbdMzPh4HPjDeSJKk4w0KdVU9CKyNO4okaRavTJSk5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJak5Qy1JzRlqSWrOUEtSc4Zakpoz1JLUnKGWpOYMtSQ1Z6glqTlDLUnNGWpJam7Qp5AnOQw8A/wEeL6q/ERySVqQQaGe+q2qemq0SSRJM3noQ5KaGxrqAr6S5GCS9VkLJFlPciDJgY2NjflNKEk73NBQX1pVFwNXAR9KctmJC1TVvqpaq6q1yWQy1yElaScbFOqq+t70+1HgLuCSMYeSJL1oy1AnOTvJOcduA78DPDT2YJKkTUPO+ng9cFeSY8v/ZVXdO+pUkqQXbBnqqnoceNsCZpEkzeDpeZLUnKGWpOYMtSQ1Z6glqTlDLUnNGWpJas5QS1JzhlqSmjPUktScoZak5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJam5waFOsivJA0nuHnMgSdJPO5U96huAQ2MNIkmabVCok+wBrgZuHnccSdKJdg9c7hPAR4FzTrZAknVgHWBlZeW0B1u01b33LO21D9909dJeW9vfMn+3NR9b7lEneTdwtKoO/qzlqmpfVa1V1dpkMpnbgJK00w059HEpcE2Sw8AXgMuTfH7UqSRJL9gy1FX1saraU1WrwHXA16vqvaNPJkkCPI9aktob+mYiAFV1H3DfKJNIkmZyj1qSmjPUktScoZak5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJak5Qy1JzRlqSWrOUEtSc4Zakpoz1JLUnKGWpOYMtSQ1Z6glqbktQ53k1Un+Mck/J3k4yZ8tYjBJ0qbdA5b5H+Dyqno2yRnAt5L8bVX9w8izSZIYEOqqKuDZ6d0zpl815lCSpBcN2aMmyS7gIPArwKeqav+MZdaBdYCVlZV5zrjtre69Z9kjLNzhm65eyusu69/1srZX28OgNxOr6idV9XZgD3BJkrfOWGZfVa1V1dpkMpnzmJK0c53SWR9V9SPgPuDKMYaRJL3UkLM+JknOnd5+DfAu4NGR55IkTQ05Rv0G4NbpcepXAV+sqrvHHUuSdMyQsz7+BbhoAbNIkmbwykRJas5QS1JzhlqSmjPUktScoZak5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJak5Qy1JzRlqSWrOUEtSc4Zakpoz1JLU3JahTnJBkm8kOZTk4SQ3LGIwSdKmLT+FHHge+EhV3Z/kHOBgkq9W1SMjzyZJYsAedVU9WVX3T28/AxwCzh97MEnSplM6Rp1kFbgI2D/KNJKklxgc6iSvBb4E3FhVT894fj3JgSQHNjY25jmjJO1og0Kd5Aw2I31bVd05a5mq2ldVa1W1NplM5jmjJO1oQ876CPAZ4FBVfXz8kSRJxxuyR30p8D7g8iQPTr9+d+S5JElTW56eV1XfArKAWSRJM3hloiQ1Z6glqTlDLUnNGWpJas5QS1JzhlqSmjPUktScoZak5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJak5Qy1JzRlqSWrOUEtSc1uGOsktSY4meWgRA0mSftqQPerPAVeOPIck6SS2DHVVfRP44QJmkSTNsHteK0qyDqwDrKysvOz1rO69Z14jqTF/ztJwc3szsar2VdVaVa1NJpN5rVaSdjzP+pCk5gy1JDU35PS824G/By5MciTJB8cfS5J0zJZvJlbV9YsYRJI0m4c+JKk5Qy1JzRlqSWrOUEtSc4Zakpoz1JLUnKGWpOYMtSQ1Z6glqTlDLUnNGWpJas5QS1JzhlqSmjPUktScoZak5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaGxTqJFcmeSzJt5PsHXsoSdKLtgx1kl3Ap4CrgLcA1yd5y9iDSZI2DdmjvgT4dlU9XlXPAV8Arh13LEnSMbsHLHM+8MRx948Av37iQknWgfXp3WeTPPYyZzoPeOpl/rOvVG7zNpc/31nbO7Xjtvk0f86/fLInhoQ6Mx6rlzxQtQ/YdwpDzX6x5EBVrZ3uel5J3Obtb6dtL7jN8zTk0McR4ILj7u8BvjfvQSRJsw0J9T8Bb07yxiRnAtcBXx53LEnSMVse+qiq55P8IfB3wC7glqp6eMSZTvvwySuQ27z97bTtBbd5blL1ksPNkqRGvDJRkpoz1JLUXJtQ77TL1JNckOQbSQ4leTjJDcueaVGS7EryQJK7lz3LIiQ5N8kdSR6d/rzfueyZxpbkj6e/1w8luT3Jq5c907wluSXJ0SQPHffYzyf5apJ/m37/uXm8VotQ79DL1J8HPlJVvwa8A/jQDtjmY24ADi17iAX6JHBvVf0q8Da2+bYnOR/4I2Ctqt7K5kkI1y13qlF8DrjyhMf2Al+rqjcDX5veP20tQs0OvEy9qp6sqvunt59h8w/v+cudanxJ9gBXAzcve5ZFSPI64DLgMwBV9VxV/WipQy3GbuA1SXYDZ7ENr72oqm8CPzzh4WuBW6e3bwXeM4/X6hLqWZepb/toHZNkFbgI2L/kURbhE8BHgf9b8hyL8iZgA/js9HDPzUnOXvZQY6qq7wJ/AXwHeBL4r6r6ynKnWpjXV9WTsLkzBvziPFbaJdSDLlPfjpK8FvgScGNVPb3secaU5N3A0ao6uOxZFmg3cDHw6aq6CPgxc/rrcFfT47LXAm8Efgk4O8l7lzvVK1uXUO/Iy9STnMFmpG+rqjuXPc8CXApck+Qwm4e3Lk/y+eWONLojwJGqOva3pTvYDPd29i7gP6pqo6r+F7gT+I0lz7QoP0jyBoDp96PzWGmXUO+4y9SThM3jloeq6uPLnmcRqupjVbWnqlbZ/Bl/vaq29Z5WVX0feCLJhdOHrgAeWeJIi/Ad4B1Jzpr+nl/BNn8D9ThfBt4/vf1+4K/nsdIh//e80S3hMvUOLgXeB/xrkgenj/1JVf3N8kbSSD4M3DbdCXkc+MCS5xlVVe1PcgdwP5tnNz3ANrycPMntwG8C5yU5AvwpcBPwxSQfZPM/WL8/l9fyEnJJ6q3LoQ9J0kkYaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNff/C2KbzOLSKWIAAAAASUVORK5CYII=\n", - "text/plain": [ - "
" + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "plt.hist(sample)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Analyzing Real Data\n", - "\n", - "Mean and variance are very important when analyzing real-world data. Let's load the data about baseball players from [SOCR MLB Height/Weight Data](http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights)" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameTeamRoleHeightWeightAge
0Adam_DonachieBALCatcher74180.022.99
1Paul_BakoBALCatcher74215.034.69
2Ramon_HernandezBALCatcher72210.030.78
3Kevin_MillarBALFirst_Baseman72210.035.43
4Chris_GomezBALFirst_Baseman73188.035.71
.....................
1029Brad_ThompsonSTLRelief_Pitcher73190.025.08
1030Tyler_JohnsonSTLRelief_Pitcher74180.025.73
1031Chris_NarvesonSTLRelief_Pitcher75205.025.19
1032Randy_KeislerSTLRelief_Pitcher75190.031.01
1033Josh_KinneySTLRelief_Pitcher73195.027.92
\n", - "

1034 rows × 6 columns

\n", - "
" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_Y4qMLCvIUWs" + }, + "source": [ + "# Introduction to Probability and Statistics\n", + "In this notebook, we will play around with some of the concepts we have previously discussed. Many concepts from probability and statistics are well-represented in major libraries for data processing in Python, such as `numpy` and `pandas`." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "0bQAocM4IUWu" + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import random\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f4Fqa0ZYIUWw" + }, + "source": [ + "## Random Variables and Distributions\n", + "Let's start with drawing a sample of 30 values from a uniform distribution from 0 to 9. We will also compute mean and variance." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Bk0zaHH9IUWx", + "outputId": "b93f7f1e-37b3-43c7-88eb-c15849873765" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Sample: [6, 9, 1, 4, 7, 8, 10, 6, 1, 5, 9, 5, 2, 7, 6, 3, 7, 6, 7, 2, 2, 4, 3, 0, 4, 1, 2, 2, 7, 5]\n", + "Mean = 4.7\n", + "Variance = 7.21\n" + ] + } ], - "text/plain": [ - " Name Team Role Height Weight Age\n", - "0 Adam_Donachie BAL Catcher 74 180.0 22.99\n", - "1 Paul_Bako BAL Catcher 74 215.0 34.69\n", - "2 Ramon_Hernandez BAL Catcher 72 210.0 30.78\n", - "3 Kevin_Millar BAL First_Baseman 72 210.0 35.43\n", - "4 Chris_Gomez BAL First_Baseman 73 188.0 35.71\n", - "... ... ... ... ... ... ...\n", - "1029 Brad_Thompson STL Relief_Pitcher 73 190.0 25.08\n", - "1030 Tyler_Johnson STL Relief_Pitcher 74 180.0 25.73\n", - "1031 Chris_Narveson STL Relief_Pitcher 75 205.0 25.19\n", - "1032 Randy_Keisler STL Relief_Pitcher 75 190.0 31.01\n", - "1033 Josh_Kinney STL Relief_Pitcher 73 195.0 27.92\n", - "\n", - "[1034 rows x 6 columns]" + "source": [ + "sample = [ random.randint(0,10) for _ in range(30) ]\n", + "print(f\"Sample: {sample}\")\n", + "print(f\"Mean = {np.mean(sample)}\")\n", + "print(f\"Variance = {np.var(sample)}\")" ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = pd.read_csv(\"../../data/SOCR_MLB.tsv\",sep='\\t', header=None, names=['Name','Team','Role','Height','Weight','Age'])\n", - "df" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "> We are using a package called [**Pandas**](https://pandas.pydata.org/) here for data analysis. We will talk more about Pandas and working with data in Python later in this course.\n", - "\n", - "Let's compute average values for age, height and weight:" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Age 28.736712\n", - "Height 73.697292\n", - "Weight 201.689255\n", - "dtype: float64" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fA1CKbxDIUWz" + }, + "source": [ + "To visually estimate how many different values are there in the sample, we can plot the **histogram**:" ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df[['Age','Height','Weight']].mean()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's focus on height, and compute standard deviation and variance: " - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73, 74, 74, 69, 70, 72, 73, 75, 78]\n" - ] - } - ], - "source": [ - "print(list(df['Height'])[:20])" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Mean = 73.6972920696325\n", - "Variance = 5.316798081118074\n", - "Standard Deviation = 2.3058183105175645\n" - ] - } - ], - "source": [ - "mean = df['Height'].mean()\n", - "var = df['Height'].var()\n", - "std = df['Height'].std()\n", - "print(f\"Mean = {mean}\\nVariance = {var}\\nStandard Deviation = {std}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In addition to mean, it makes sense to look at the median value and quartiles. They can be visualized using a **box plot**:" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAsgAAACICAYAAAD6bB0zAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAATqUlEQVR4nO3dbWxW533H8d8/CYaV5cEJzcJmmNehhhSiZCXZMmcP1bIX3Rale9Fpi7aqzTImtslSK3Whq6U+vCjq1iXVxIuhpe0aVZOlNDIMWauVRSaIBZXxUCfQASpsEKCMAGEucopN5WsvfENunNsP55f4XOfE3490y8kdsP7+5hyfy5fvh0gpCQAAAMCE63IPAAAAAFQJC2QAAACgCQtkAAAAoAkLZAAAAKAJC2QAAACgyQ1z8UmXLFmSOjs75+JTAwAAAO+IvXv3nkspvXfy/XOyQO7s7NSePXvm4lPX2vnz53XbbbflHqNWaOahm4duHrp56Oahm4durUXE8Vb38xCLEu3fvz/3CLVDMw/dPHTz0M1DNw/dPHQrJubijULuu+++xA7yW42NjamtrS33GLVCMw/dPHTz0M1DNw/dPHRrLSL2ppTum3w/O8glev7553OPUDs089DNQzcP3Tx089DNQ7di2EEGAADAvMQOcgX09fXlHqF2aOahm4duHrp56Oahm4duxbCDDAAAgHmJHeQK4Ke34mjmoZuHbh66eejmoZuHbsWwgwwAAIB5iR3kChgYGMg9Qu3QzEM3D908dPPQzUM3D92KYQe5RCMjI1q8eHHuMWqFZh66eejmoZuHbh66eejWGjvIFTA0NJR7hNqhmYduHrp56Oahm4duHroVwwK5RCtWrMg9Qu3QzEM3D908dPPQzUM3D92KYYFcotOnT+ceoXZo5qGbh24eunno5qGbh27FsEAu0Y033ph7hNqhmYduHrp56Oahm4duHroVwwIZAAAAaMICuUQXL17MPULt0MxDNw/dPHTz0M1DNw/dimGBXKKlS5fmHqF2aOahm4duHrp56Oahm4duxbBALtGRI0dyj1A7NPPQzUM3D908dPPQzUO3YnijkBLxIt3F0cxDNw/dPHTz0M1DNw/dWuONQipgx44duUeoHZp56Oahm4duHrp56OahWzHsIAMAAGBeYge5Avr6+nKPUDs089DNQzcP3Tx089DNQ7di2EEGAADAvMQOcgXw01txNPPQzUM3D908dPPQzUO3YthBBgAAwLzEDnIF9Pf35x6hdmjmoZuHbh66eejmoZuHbsWwg1yisbExtbW15R6jVmjmoZuHbh66eejmoZuHbq2xg1wBO3fuzD1C7dDMQzcP3Tx089DNQzcP3YphgVyiu+++O/cItUMzD908dPPQzUM3D908dCuGBXKJjh07lnuE2qGZh24eunno5qGbh24euhXDArlES5YsyT1C7dDMQzcP3Tx089DNQzcP3YphgVyiS5cu5R6hdmjmoZuHbh66eejmoZuHbsWwQC7R5cuXc49QOzTz0M1DNw/dPHTz0M1Dt2JYIJeovb099wi1QzMP3Tx089DNQzcP3Tx0K4YFcolOnjyZe4TaoZmHbh66eejmoZuHbh66FcMCuUQrV67MPULt0MxDNw/dPHTz0M1DNw/dimGBXKLdu3fnHqF2aOahm4duHrp56Oahm4duxfBW0yUaHx/XddfxM0kRNPPQzUM3D908dPPQzUO31nir6QrYunVr7hFqh2Yeunno5qGbh24eunnoVgw7yAAAAJiX2EGugM2bN+ceoXZo5qGbh24eunno5qGbh27FsIMMAACAeYkd5ArYsmVL7hFqh2Yeunno5qGbh24eunnoVgw7yCXiGaTF0cxz66236sKFC7nHqJ30+ZsUX/xR7jFaam9v1+uvv557jJY4Tz1089DNQ7fW2EGugMHBwdwj1A7NPBcuXFBKiVvBm6TsM0x1q/IPPJynHrp56OahWzEskEt0//335x6hdmgGVB/nqYduHrp56FYMC+QSHTp0KPcItUMzoPo4Tz1089DNQ7diWCCX6IEHHsg9Qu10dHTkHgHADDhPPVXuFhG5R5hSlbtVGd2KmXGBHBHfiIjXIuJAGQO5uru7tWjRIkWEFi1apO7u7twj4R1Q5cddotrOvnFWnxj4hM79+FzuUd71OE89dCtm+fLligh1dHQoIrR8+fLcI11V5TXIldk6OjoqNVtvb69Wr16t66+/XqtXr1Zvb2/uka4xmx3kb0r68BzP8bZ0d3dr06ZN2rBhg0ZGRrRhwwZt2rSpMgcBfAsWLMg9Ampq0yubtO/MPm16eVPuUd71OE89dJu95cuX68SJE+rq6tL27dvV1dWlEydOVGKRXOU1SPNs+/btq8xsvb296unp0caNG3Xp0iVt3LhRPT091Vokz/KZ3Z2SDsz22dZr1qxJZVq4cGF68sknr7nvySefTAsXLix1jplM5EYRx44dyz1CLc33Y+21kdfSmm+tSau/uTqt+daadPaNs7P7i5+/aW4Hexuq/P+U89RT5W5VO94kpa6urpTSm926uroqMWeV1yDNs13pVoXZVq1alQYHB6+5b3BwMK1atar0WSTtSS3Wsu/YY5Aj4s8jYk9E7Dl16pSOHz+uw4cP68CBAzp16pR27dql4eFhvfDCCxofH7/6gtVX3vpwy5YtGh8f1wsvvKDh4WHt2rVLp06d0oEDB3T48GEdP35ce/fu1fnz5/Xiiy9qbGxM/f39kqTR0VGtW7dOfX19kqSBgQF97GMf0+joqM6cOaOhoSEdPXpUR48e1dDQkM6cOaOXXnpJIyMjGhgYkKSrf/fKx/7+fo2NjenFF1/U+fPntXfv3rf9NTU6cStw6+zszD5DHW+S7PNp8rkwMDCgkZERvfTSS5U6n6b7mj73nc9pPI1PdEjjemLzE7P6miRV9muq8vcPztN3XzdJlfoeIUmf/exnNTw8rB07dmh8fFyPPfbYO7aOeDtf0+Q1SF9fn9atW6fR0dFSv0e0+ppGR0d1xx13SJK2b9+ukZER3XPPPRodHc36vfzgwYMaHR295mu65ZZbdPDgwdKvT1NqtWqefBM7yO8IVeAn3bo5d+5c7hFqaT4fa827x1dus95FZgfZwnnqqXK3qh1vatpBvtKNHeSZNc92pVsVZptXO8g5rV27VuvXr9dTTz2lN954Q0899ZTWr1+vtWvX5h4Nb9P+/ftzj4Ca2fTKpqu7x1eMp3EeizyHOE89dJu9ZcuWaefOnXrwwQe1bds2Pfjgg9q5c6eWLVuWe7RKr0GaZ9u9e3dlZuvp6dHjjz+ubdu26fLly9q2bZsef/xx9fT0ZJ2r2azeajoiOiX1p5RWz+aT5nir6e7ubj399NMaHR3VwoULtXbtWm3cuLHUGWYSEZpNb7xpbGxMbW1tuceonfl8rH1060d1+MLht9x/Z/udeu6R56b/y1+4WfrC8BxN9vZU+f8p56mnyt2qeLxdeaLeFcuWLdOrr76acaI3VXkNUtXZent79aUvfUkHDx7UXXfdpZ6eHj366KOlzxFTvNX0jAvkiOiV9CFJSySdkfT5lNLXp/s7ORbIdVDFbzhV19/fr4cffjj3GLXDsWZigWzhPPVUuRvH27sP3VqzF8gOFshAXlW+uFUaC2QAmFemWiC/Kx6DXBczPmMSb0EzoPo4Tz1089DNQ7di2EEG3oXYbTSxgwwA8wo7yBXAT2/F0cyX+3VU63ircrf29vbMR9TUOE89dPPQzUO3YthBBgAAwLzEDnIFXHkXF8wezTx089DNQzcP3Tx089CtGHaQSzQyMqLFixfnHqNWaOahm4duHrp56Oahm4durbGDXAFDQ0O5R6gdmnno5qGbh24eunno5qFbMSyQS7RixYrcI9QOzTx089DNQzcP3Tx089CtGBbIJTp9+nTuEWqHZh66eejmoZuHbh66eehWDAvkEt144425R6gdmnno5qGbh24eunno5qFbMSyQAQAAgCYskEt08eLF3CPUDs08dPPQzUM3D908dPPQrRgWyCVaunRp7hFqh2Yeunno5qGbh24eunnoVgwL5BIdOXIk9wi1QzMP3Tx089DNQzcP3Tx0K4Y3CikRL9JdHM08dPPQzUM3D908dPPQrTXeKKQCduzYkXuE2qGZh24eunno5qGbh24euhXDDjIAAADmJXaQK6Cvry/3CLVDMw/dPHTz0M1DNw/dPHQrhh1kAAAAzEvsIFcAP70VRzMP3Tx089DNQzcP3Tx0K4YdZAAAAMxL7CBXQH9/f+4RaodmHrp56Oahm4duHrp56FYMO8glGhsbU1tbW+4xaoVmHrp56Oahm4duHrp56NYaO8gVsHPnztwj1A7NPHTz0M1DNw/dPHTz0K0YFsgluvvuu3OPUDs089DNQzcP3Tx089DNQ7diWCCX6NixY7lHqB2aeejmoZuHbh66eejmoVsxLJBLtGTJktwj1A7NPHTz0M1DNw/dPHTz0K0YFsglunTpUu4RaodmHrp56Oahm4duHrp56FYMC+QSXb58OfcItUMzD908dPPQzUM3D908dCuGBXKJ2tvbc49QOzTz0M1DNw/dPHTz0M1Dt2JYIJfo5MmTuUeoHZp56Oahm4duHrp56OahWzEskEu0cuXK3CPUDs08dPPQzUM3D908dPPQrRgWyCXavXt37hFqh2Yeunno5qGbh24eunnoVgxvNV2i8fFxXXcdP5MUQTMP3Tx089DNQzcP3Tx0a423mq6ArVu35h6hdmjmoZuHbh66eejmoZuHbsWwgwwAAIB5iR3kCti8eXPuEWqHZh66eejmoZuHbh66eehWDDvIAAAAmJfYQa6ALVu25B6hdmjmoZuHbh66eejmoZuHbsWwg1winkFaHM08dPPQzUM3D908dPPQrTV2kCtgcHAw9wi1QzMP3Tx089DNQzcP3Tx0K4Yd5BINDw/r5ptvzj1GrdDMQzcP3Tx089DNQzcP3VpjB7kCDh06lHuE2qGZh24eunno5qGbh24euhXDArlEHR0duUeoHZp56Oahm4duHrp56OahWzEskEt04cKF3CPUDs08dPPQzUM3D908dPPQrRgWyCVasGBB7hFqh2Yeunno5qGbh24eunnoVgwL5BItWrQo9wi1QzMP3Tx089DNQzcP3Tx0K2ZOXsUiIs5KOv6Of+L6WyLpXO4haoZmHrp56Oahm4duHrp56Nbaz6eU3jv5zjlZIKO1iNjT6qVEMDWaeejmoZuHbh66eejmoVsxPMQCAAAAaMICGQAAAGjCArlc/5R7gBqimYduHrp56Oahm4duHroVwGOQAQAAgCbsIAMAAABNWCADAAAATVggz5GIuCUinouIQxFxMCJ+NSLujYjvRsRQROyJiF/OPWeVRMSdjTZXbj+KiE9GxK0R8e8R8YPGx/bcs1bJNN2+0jj+XomIzRFxS+5Zq2Sqbk3//dMRkSJiScYxK2W6ZhHRHRGHI+L7EfF3mUetlGnOUa4JM4iITzWOqQMR0RsRi7gmzGyKblwTCuAxyHMkIp6RtCOl9LWIaJP0HknPSvpqSuk7EfG7kp5IKX0o55xVFRHXSzol6Vck/ZWk11NKX46Iz0hqTymtzzpgRU3qdqekwZTSTyLibyWJbq01d0spHY+IZZK+JmmlpDUpJV5cf5JJx9r7JPVI+r2U0mhE3J5Sei3rgBU1qdvT4powpYj4OUn/IekDKaUfR8Szkv5N0gfENWFK03T7obgmzBo7yHMgIm6S9BuSvi5JKaWxlNL/SUqSbmr8sZs1cbCitYckHU0pHZf0EUnPNO5/RtLv5xqqBq52Syk9n1L6SeP+70rqyDhX1TUfb5L0VUlPaOKcRWvNzf5C0pdTSqOSxOJ4Ws3duCbM7AZJPxURN2hio+mH4powG2/pxjWhGBbIc+N9ks5K+ueI+F5EfC0iFkv6pKSvRMQJSX8v6W8yzlh1fySpt/HPP5NSOi1JjY+3Z5uq+pq7NftTSd8peZY6udotIh6RdCql9HLekSqv+Vh7v6Rfj4hdEbE9Iu7POFfVNXf7pLgmTCmldEoTXV6VdFrScErpeXFNmNY03ZpxTZgBC+S5cYOkD0r6x5TSL0kakfQZTeyyfCqltEzSp9TYYca1Gg9JeUTSt3PPUidTdYuIHkk/kfQvOeaquuZuEfEeTTxU4HN5p6q2FsfaDZLaJT0g6a8lPRsRkWm8ymrRjWvCNBqPLf6IpF+Q9LOSFkfEn+Sdqvpm6sY1YXZYIM+Nk5JOppR2Nf79OU0smD8uqa9x37cl8YSM1n5H0r6U0pnGv5+JiKWS1PjIr29bm9xNEfFxSQ9L+uPEEw6m0tztFzVxUXk5Io5p4leQ+yLijozzVdHkY+2kpL404T8ljUviyY1vNbkb14Tp/bak/0kpnU0pXdZEqy5xTZjJVN24JhTAAnkOpJT+V9KJiLizcddDkv5LE4+d+s3Gfb8l6QcZxquDR3XtwwS2auJCosbHfy19onq4pltEfFjSekmPpJTeyDZV9V3tllLan1K6PaXUmVLq1MTC74ONcxpvmnyObtHE9zRFxPsltUniiY1vNbkb14TpvSrpgYh4T+M3Eg9JOiiuCTNp2Y1rQjG8isUciYh7NfEs+DZJ/y3pMUmrJP2DJn4deUnSX6aU9uaasYoav+I+Iel9KaXhxn23aeIVQJZr4sT/g5TS6/mmrJ4puh2RtFDS+cYf+25KaV2mESupVbdJ//2YpPt4FYs3TXGstUn6hqR7JY1J+nRKaTDbkBU0RbdfE9eEaUXEFyX9oSYeEvA9SX8m6afFNWFaU3T7vrgmzBoLZAAAAKAJD7EAAAAAmrBABgAAAJqwQAYAAACasEAGAAAAmrBABgAAAJqwQAYAAACasEAGAAAAmvw/tSpycIADqyoAAAAASUVORK5CYII=\n", - "text/plain": [ - "
" + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 430 + }, + "id": "GdDQqV5SIUWz", + "outputId": "641584af-900d-459e-aba5-9bc6a9441a30" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAGdCAYAAABO2DpVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAUwklEQVR4nO3da4xcBdnA8WfZtdOK24VWSrtpK0WR2pYiUEpKiaKtkKZpRBNUUrWBxA9m0ZZGQ1fDZaOwBSNRLimUqHyhFvxQUBBIrbQN0ZZeXENFuWgJK1DqjZ22hpHszvvBsO+7LxSY7TM7neH3S+bDOXtmz5OT3Z1/zpzZ01Qul8sBAJDgmFoPAAA0DmEBAKQRFgBAGmEBAKQRFgBAGmEBAKQRFgBAGmEBAKRpGekdDgwMxIsvvhitra3R1NQ00rsHAIahXC7HgQMHor29PY455vDnJUY8LF588cWYMmXKSO8WAEjQ29sbkydPPuzXRzwsWltbI+K/g40dO3akdw8ADEOxWIwpU6YMvo4fzoiHxetvf4wdO1ZYAECdebvLGFy8CQCkERYAQBphAQCkERYAQBphAQCkERYAQBphAQCkERYAQBphAQCkERYAQJqKwuLaa6+NpqamIY/p06dXazYAoM5UfK+QmTNnxq9+9av//QYtI367EQDgKFVxFbS0tMTEiROrMQsAUOcqvsbimWeeifb29jj55JNj6dKl8fzzz7/l9qVSKYrF4pAHANCYmsrlcvmdbvzQQw/FwYMH49RTT42XXnopurq64oUXXog9e/Yc9v7s1157bXR1db1hfV9fn9um16mTVj1Y6xEq9tzqxbUeAVL5PWSkFYvFaGtre9vX74rOWCxatCguvvjimD17dlx44YXxy1/+Ml555ZW49957D/uczs7O6OvrG3z09vZWsksAoI4c0ZWXxx13XHz4wx+OZ5999rDbFAqFKBQKR7IbAKBOHNH/sTh48GD8+c9/jkmTJmXNAwDUsYrC4hvf+EZs2bIlnnvuufjNb34Tn/nMZ6K5uTkuueSSas0HANSRit4K+etf/xqXXHJJ/OMf/4gTTjghzjvvvNi2bVuccMIJ1ZoPAKgjFYXF+vXrqzUHANAA3CsEAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANEcUFqtXr46mpqZYsWJF0jgAQD0bdljs2LEj7rjjjpg9e3bmPABAHRtWWBw8eDCWLl0ad955Zxx//PHZMwEAdWpYYdHR0RGLFy+OhQsXvu22pVIpisXikAcA0JhaKn3C+vXrY/fu3bFjx453tH13d3d0dXVVPBjASDlp1YO1HgEaRkVnLHp7e2P58uVx9913x+jRo9/Rczo7O6Ovr2/w0dvbO6xBAYCjX0VnLHbt2hX79++PM888c3Bdf39/bN26NW699dYolUrR3Nw85DmFQiEKhULOtADAUa2isFiwYEE88cQTQ9ZdeumlMX369LjyyivfEBUAwLtLRWHR2toas2bNGrLu2GOPjfHjx79hPQDw7uM/bwIAaSr+VMj/t3nz5oQxAIBG4IwFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJCmorBYs2ZNzJ49O8aOHRtjx46NefPmxUMPPVSt2QCAOlNRWEyePDlWr14du3btip07d8YnP/nJ+PSnPx1/+MMfqjUfAFBHWirZeMmSJUOWr7vuulizZk1s27YtZs6cmToYAFB/KgqL/6u/vz9+9rOfxaFDh2LevHmH3a5UKkWpVBpcLhaLw90lAHCUqzgsnnjiiZg3b168+uqr8b73vS82bNgQM2bMOOz23d3d0dXVdURDNrKTVj1Y6xE4ivn5gNqqx9/B51Yvrun+K/5UyKmnnho9PT2xffv2+OpXvxrLli2LJ5988rDbd3Z2Rl9f3+Cjt7f3iAYGAI5eFZ+xGDVqVHzoQx+KiIizzjorduzYET/84Q/jjjvueNPtC4VCFAqFI5sSAKgLR/x/LAYGBoZcQwEAvHtVdMais7MzFi1aFFOnTo0DBw7EunXrYvPmzfHII49Uaz4AoI5UFBb79++PL3/5y/HSSy9FW1tbzJ49Ox555JH41Kc+Va35AIA6UlFY/OhHP6rWHABAA3CvEAAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgjbAAANIICwAgTUVh0d3dHWeffXa0trbGhAkT4qKLLoqnnnqqWrMBAHWmorDYsmVLdHR0xLZt22Ljxo3x2muvxQUXXBCHDh2q1nwAQB1pqWTjhx9+eMjyXXfdFRMmTIhdu3bFxz72sdTBAID6U1FY/H99fX0RETFu3LjDblMqlaJUKg0uF4vFI9klAHAUG3ZYDAwMxIoVK2L+/Pkxa9asw27X3d0dXV1dw90NpDhp1YO1HgHgXWHYnwrp6OiIPXv2xPr1699yu87Ozujr6xt89Pb2DneXAMBRblhnLC6//PJ44IEHYuvWrTF58uS33LZQKEShUBjWcABAfakoLMrlcnzta1+LDRs2xObNm2PatGnVmgsAqEMVhUVHR0esW7cu7r///mhtbY19+/ZFRERbW1uMGTOmKgMCAPWjomss1qxZE319fXH++efHpEmTBh/33HNPteYDAOpIxW+FAAAcjnuFAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkKbisNi6dWssWbIk2tvbo6mpKe67774qjAUA1KOKw+LQoUNx+umnx2233VaNeQCAOtZS6RMWLVoUixYtqsYsAECdqzgsKlUqlaJUKg0uF4vFau8SAKiRqodFd3d3dHV1VXs3ERFx0qoHR2Q/AFTO3+h3h6p/KqSzszP6+voGH729vdXeJQBQI1U/Y1EoFKJQKFR7NwDAUcD/sQAA0lR8xuLgwYPx7LPPDi7v3bs3enp6Yty4cTF16tTU4QCA+lJxWOzcuTM+8YlPDC6vXLkyIiKWLVsWd911V9pgAED9qTgszj///CiXy9WYBQCoc66xAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSCAsAII2wAADSDCssbrvttjjppJNi9OjRcc4558Tjjz+ePRcAUIcqDot77rknVq5cGddcc03s3r07Tj/99Ljwwgtj//791ZgPAKgjFYfFTTfdFF/5ylfi0ksvjRkzZsTtt98e733ve+PHP/5xNeYDAOpISyUb/+c//4ldu3ZFZ2fn4LpjjjkmFi5cGL/97W/f9DmlUilKpdLgcl9fX0REFIvF4cz7lgZK/07/ngBQT6rx+vp/v2+5XH7L7SoKi7///e/R398fJ5544pD1J554YvzpT3960+d0d3dHV1fXG9ZPmTKlkl0DAO9A2w+q+/0PHDgQbW1th/16RWExHJ2dnbFy5crB5YGBgfjnP/8Z48ePj6amprT9FIvFmDJlSvT29sbYsWPTvi9DOc4jx7EeGY7zyHCcR0Y1j3O5XI4DBw5Ee3v7W25XUVi8//3vj+bm5nj55ZeHrH/55Zdj4sSJb/qcQqEQhUJhyLrjjjuukt1WZOzYsX5oR4DjPHIc65HhOI8Mx3lkVOs4v9WZitdVdPHmqFGj4qyzzopNmzYNrhsYGIhNmzbFvHnzKp8QAGgoFb8VsnLlyli2bFnMmTMn5s6dGz/4wQ/i0KFDcemll1ZjPgCgjlQcFp///Ofjb3/7W1x99dWxb9+++OhHPxoPP/zwGy7oHGmFQiGuueaaN7ztQi7HeeQ41iPDcR4ZjvPIOBqOc1P57T43AgDwDrlXCACQRlgAAGmEBQCQRlgAAGkaJizcyr26uru74+yzz47W1taYMGFCXHTRRfHUU0/VeqyGt3r16mhqaooVK1bUepSG88ILL8QXv/jFGD9+fIwZMyZOO+202LlzZ63Haij9/f1x1VVXxbRp02LMmDHxwQ9+ML7zne+87b0meHtbt26NJUuWRHt7ezQ1NcV999035OvlcjmuvvrqmDRpUowZMyYWLlwYzzzzzIjM1hBh4Vbu1bdly5bo6OiIbdu2xcaNG+O1116LCy64IA4dOlTr0RrWjh074o477ojZs2fXepSG869//Svmz58f73nPe+Khhx6KJ598Mr7//e/H8ccfX+vRGsoNN9wQa9asiVtvvTX++Mc/xg033BA33nhj3HLLLbUere4dOnQoTj/99Ljtttve9Os33nhj3HzzzXH77bfH9u3b49hjj40LL7wwXn311eoPV24Ac+fOLXd0dAwu9/f3l9vb28vd3d01nKqx7d+/vxwR5S1bttR6lIZ04MCB8imnnFLeuHFj+eMf/3h5+fLltR6poVx55ZXl8847r9ZjNLzFixeXL7vssiHrPvvZz5aXLl1ao4kaU0SUN2zYMLg8MDBQnjhxYvl73/ve4LpXXnmlXCgUyj/96U+rPk/dn7F4/VbuCxcuHFz3drdy58j19fVFRMS4ceNqPElj6ujoiMWLFw/5uSbPz3/+85gzZ05cfPHFMWHChDjjjDPizjvvrPVYDefcc8+NTZs2xdNPPx0REb///e/jsccei0WLFtV4ssa2d+/e2Ldv35C/H21tbXHOOeeMyOti1e9uWm3DuZU7R2ZgYCBWrFgR8+fPj1mzZtV6nIazfv362L17d+zYsaPWozSsv/zlL7FmzZpYuXJlfOtb34odO3bE17/+9Rg1alQsW7as1uM1jFWrVkWxWIzp06dHc3Nz9Pf3x3XXXRdLly6t9WgNbd++fRERb/q6+PrXqqnuw4KR19HREXv27InHHnus1qM0nN7e3li+fHls3LgxRo8eXetxGtbAwEDMmTMnrr/++oiIOOOMM2LPnj1x++23C4tE9957b9x9992xbt26mDlzZvT09MSKFSuivb3dcW5gdf9WyHBu5c7wXX755fHAAw/Eo48+GpMnT671OA1n165dsX///jjzzDOjpaUlWlpaYsuWLXHzzTdHS0tL9Pf313rEhjBp0qSYMWPGkHUf+chH4vnnn6/RRI3pm9/8ZqxatSq+8IUvxGmnnRZf+tKX4oorroju7u5aj9bQXn/tq9XrYt2HhVu5j4xyuRyXX355bNiwIX7961/HtGnTaj1SQ1qwYEE88cQT0dPTM/iYM2dOLF26NHp6eqK5ubnWIzaE+fPnv+Hj0k8//XR84AMfqNFEjenf//53HHPM0JeZ5ubmGBgYqNFE7w7Tpk2LiRMnDnldLBaLsX379hF5XWyIt0Lcyr36Ojo6Yt26dXH//fdHa2vr4Pt0bW1tMWbMmBpP1zhaW1vfcN3KscceG+PHj3c9S6Irrrgizj333Lj++uvjc5/7XDz++OOxdu3aWLt2ba1HayhLliyJ6667LqZOnRozZ86M3/3ud3HTTTfFZZddVuvR6t7Bgwfj2WefHVzeu3dv9PT0xLhx42Lq1KmxYsWK+O53vxunnHJKTJs2La666qpob2+Piy66qPrDVf1zJyPklltuKU+dOrU8atSo8ty5c8vbtm2r9UgNJSLe9PGTn/yk1qM1PB83rY5f/OIX5VmzZpULhUJ5+vTp5bVr19Z6pIZTLBbLy5cvL0+dOrU8evTo8sknn1z+9re/XS6VSrUere49+uijb/o3edmyZeVy+b8fOb3qqqvKJ554YrlQKJQXLFhQfuqpp0ZkNrdNBwDS1P01FgDA0UNYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABphAUAkEZYAABp/gdmqmbcIS3QFgAAAABJRU5ErkJggg==\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.hist(sample)\n", + "plt.show()" ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "plt.figure(figsize=(10,2))\n", - "plt.boxplot(df['Height'], vert=False, showmeans=True)\n", - "plt.grid(color='gray', linestyle='dotted')\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also make box plots of subsets of our dataset, for example, grouped by player role." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dTc8xEMzIUW0" + }, + "source": [ + "## Analyzing Real Data\n", + "\n", + "Mean and variance are very important when analyzing real-world data. Let's load the data about baseball players from [SOCR MLB Height/Weight Data](http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights)" ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "df.boxplot(column='Height', by='Role', figsize=(10,8))\n", - "plt.xticks(rotation='vertical')\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "> **Note**: This diagram suggests, that on average, the heights of first basemen are higher than heights of second basemen. Later we will learn how we can test this hypothesis more formally, and how to demonstrate that our data is statistically significant to show that. \n", - "\n", - "Age, height and weight are all continuous random variables. What do you think their distribution is? A good way to find out is to plot the histogram of values: " - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 424 + }, + "id": "NgKHBG1-IUW1", + "outputId": "decd310d-8511-487e-ed1e-3ea46069474a" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Name Team Role Height Weight Age\n", + "0 Adam_Donachie BAL Catcher 74 180.0 22.99\n", + "1 Paul_Bako BAL Catcher 74 215.0 34.69\n", + "2 Ramon_Hernandez BAL Catcher 72 210.0 30.78\n", + "3 Kevin_Millar BAL First_Baseman 72 210.0 35.43\n", + "4 Chris_Gomez BAL First_Baseman 73 188.0 35.71\n", + "... ... ... ... ... ... ...\n", + "1029 Brad_Thompson STL Relief_Pitcher 73 190.0 25.08\n", + "1030 Tyler_Johnson STL Relief_Pitcher 74 180.0 25.73\n", + "1031 Chris_Narveson STL Relief_Pitcher 75 205.0 25.19\n", + "1032 Randy_Keisler STL Relief_Pitcher 75 190.0 31.01\n", + "1033 Josh_Kinney STL Relief_Pitcher 73 195.0 27.92\n", + "\n", + "[1034 rows x 6 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameTeamRoleHeightWeightAge
0Adam_DonachieBALCatcher74180.022.99
1Paul_BakoBALCatcher74215.034.69
2Ramon_HernandezBALCatcher72210.030.78
3Kevin_MillarBALFirst_Baseman72210.035.43
4Chris_GomezBALFirst_Baseman73188.035.71
.....................
1029Brad_ThompsonSTLRelief_Pitcher73190.025.08
1030Tyler_JohnsonSTLRelief_Pitcher74180.025.73
1031Chris_NarvesonSTLRelief_Pitcher75205.025.19
1032Randy_KeislerSTLRelief_Pitcher75190.031.01
1033Josh_KinneySTLRelief_Pitcher73195.027.92
\n", + "

1034 rows × 6 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 5 + } + ], + "source": [ + "df = pd.read_csv(\"https://raw.githubusercontent.com/Jmackalister/Data-Science-For-Beginners/main/data/SOCR_MLB.tsv\",sep='\\t', header=None, names=['Name','Team','Role','Height','Weight','Age'])\n", + "df" ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "df['Weight'].hist(bins=15, figsize=(10,6))\n", - "plt.suptitle('Weight distribution of MLB Players')\n", - "plt.xlabel('Weight')\n", - "plt.ylabel('Count')\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Normal Distribution\n", - "\n", - "Let's create an artificial sample of weights that follows a normal distribution with the same mean and variance as our real data:" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([73.46072234, 70.40678311, 70.23689776, 73.81190675, 72.41091792,\n", - " 76.00127651, 71.91641414, 77.18162239, 76.7173353 , 73.93996587,\n", - " 74.2862748 , 76.88034696, 72.15184905, 74.43537605, 76.37723417,\n", - " 65.66976051, 74.3200533 , 77.3235274 , 72.8840488 , 77.50300255])" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yGT7sgCCIUW1" + }, + "source": [ + "> We are using a package called [**Pandas**](https://pandas.pydata.org/) here for data analysis. We will talk more about Pandas and working with data in Python later in this course.\n", + "\n", + "Let's compute average values for age, height and weight:" ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "generated = np.random.normal(mean, std, 1000)\n", - "generated[:20]" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "TqTXzahpIUW3", + "outputId": "46b5cbf3-3b5a-4532-ce77-3eca8f44afad" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Age 28.736712\n", + "Height 73.697292\n", + "Weight 201.689255\n", + "dtype: float64" + ] + }, + "metadata": {}, + "execution_count": 6 + } + ], + "source": [ + "df[['Age','Height','Weight']].mean()" ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "plt.figure(figsize=(10,6))\n", - "plt.hist(generated, bins=15)\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f39ELsmyIUW4" + }, + "source": [ + "Now let's focus on height, and compute standard deviation and variance:" ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "plt.figure(figsize=(10,6))\n", - "plt.hist(np.random.normal(0,1,50000), bins=300)\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Since most values in real life are normally distributed, we should not use a uniform random number generator to generate sample data. Here is what happens if we try to generate weights with a uniform distribution (generated by `np.random.rand`):" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAsgAAAGoCAYAAABbtxOxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAATQElEQVR4nO3db6ykd3nf4e9db4FCFGHLx+7GNl1TbUgMapv0hKaNWkV10zoxst1WREakWgVLWyoSSNUorItUV4qQnCbqnxdNpS1xs2opxCKktorSYC35o7wAugaSYAy1G4y99sZekhSSRjI13H1xJs7tk13WPnPOzK73uiRrZn4zc+Z+8dPZj57zeJ7q7gAAAFv+zLoHAACA84lABgCAQSADAMAgkAEAYBDIAAAw7Fv3AEly+eWX94EDB9Y9BgAAF5H777//i929sX39vAjkAwcO5MSJE+seAwCAi0hVfeFM606xAACAQSADAMAgkAEAYBDIAAAwnDOQq+quqnqqqj491n6yqj5bVb9ZVb9QVa8cz91eVQ9X1eeq6u/t0dwAALAnns8R5J9NcsO2tfuSvK67/1KS/5Xk9iSpquuS3JrktYv3/HRVXbJr0wIAwB47ZyB3968l+b1tax/u7mcWDz+a5OrF/ZuTvL+7n+7uzyd5OMnrd3FeAADYU7txDvJbkvzi4v5VSR4bz51crAEAwAVhqUCuqncleSbJe/946Qwv67O893BVnaiqE6dPn15mDAAA2DU7DuSqOpTkDUne3N1/HMEnk1wzXnZ1kifO9P7uPtrdm929ubHxp67wBwAAa7GjQK6qG5K8M8lN3f1H46l7k9xaVS+tqmuTHEzy8eXHBACA1dh3rhdU1fuSfHeSy6vqZJI7svWtFS9Ncl9VJclHu/ut3f1AVd2d5DPZOvXibd391b0aHgAAdlv9ydkR67O5udknTpxY9xgAAFxEqur+7t7cvu5KegAAMAhkAAAYBDIAAAwCGQAAhnN+iwW8GBw48qF1j7Byj9x547pHAIALkiPIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwuJLeRehivKocAMDz5QgyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwHDRf4uFb3QAAGByBBkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAw75zvaCq7kryhiRPdffrFmuXJfm5JAeSPJLk+7v79xfP3Z7ktiRfTfL27v6lPZkc+LoOHPnQukdYuUfuvHHdIwDwIvB8jiD/bJIbtq0dSXK8uw8mOb54nKq6LsmtSV67eM9PV9UluzYtAADssXMGcnf/WpLf27Z8c5Jji/vHktwy1t/f3U939+eTPJzk9bszKgAA7L2dnoN8ZXefSpLF7RWL9auSPDZed3KxBgAAF4RznoP8AtUZ1vqML6w6nORwkrzqVa/a5TEAeDFzjj0vVvb2+WGnR5CfrKr9SbK4fWqxfjLJNeN1Vyd54kw/oLuPdvdmd29ubGzscAwAANhdOw3ke5McWtw/lOSesX5rVb20qq5NcjDJx5cbEQAAVuf5fM3b+5J8d5LLq+pkkjuS3Jnk7qq6LcmjSd6YJN39QFXdneQzSZ5J8rbu/uoezQ4AALvunIHc3W86y1PXn+X1707y7mWGAgCAdXElPQAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwLBv3QMA7JYDRz607hFW7pE7b1z3CAAvOo4gAwDAIJABAGAQyAAAMAhkAAAYBDIAAAwCGQAABoEMAACDQAYAgMGFQgDgAuBCOLA6jiADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAMO+dQ8AwM4dOPKhdY8A8KLjCDIAAAwCGQAABoEMAACDc5ABgPOSc+xZF0eQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGJYK5Kr6p1X1QFV9uqreV1Uvq6rLquq+qnpocXvpbg0LAAB7bceBXFVXJXl7ks3ufl2SS5LcmuRIkuPdfTDJ8cVjAAC4ICx7isW+JH+uqvYleXmSJ5LcnOTY4vljSW5Z8jMAAGBldhzI3f14kp9K8miSU0m+1N0fTnJld59avOZUkit2Y1AAAFiFZU6xuDRbR4uvTfJNSV5RVT/wAt5/uKpOVNWJ06dP73QMAADYVcucYvF3kny+u0939/9L8sEkfyPJk1W1P0kWt0+d6c3dfbS7N7t7c2NjY4kxAABg9ywTyI8m+c6qenlVVZLrkzyY5N4khxavOZTknuVGBACA1dm30zd298eq6gNJPpHkmSSfTHI0yTckubuqbstWRL9xNwYFAIBV2HEgJ0l335Hkjm3LT2fraDIAAFxwXEkPAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABiWCuSqemVVfaCqPltVD1bVX6+qy6rqvqp6aHF76W4NCwAAe23ZI8j/Lsn/6O5vSfKXkzyY5EiS4919MMnxxWMAALgg7DiQq+obk/ytJD+TJN39le7+P0luTnJs8bJjSW5ZbkQAAFidZY4gvzrJ6ST/qao+WVXvqapXJLmyu08lyeL2il2YEwAAVmKZQN6X5NuT/Ifu/rYk/zcv4HSKqjpcVSeq6sTp06eXGAMAAHbPMoF8MsnJ7v7Y4vEHshXMT1bV/iRZ3D51pjd399Hu3uzuzY2NjSXGAACA3bPjQO7u30nyWFW9ZrF0fZLPJLk3yaHF2qEk9yw1IQAArNC+Jd//w0neW1UvSfLbSX4wW9F9d1XdluTRJG9c8jMAAGBllgrk7v5Uks0zPHX9Mj8XAADWxZX0AABgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwLB0IFfVJVX1yar674vHl1XVfVX10OL20uXHBACA1diNI8jvSPLgeHwkyfHuPpjk+OIxAABcEJYK5Kq6OsmNSd4zlm9Ocmxx/1iSW5b5DAAAWKVljyD/2yQ/luRrY+3K7j6VJIvbK870xqo6XFUnqurE6dOnlxwDAAB2x44DuarekOSp7r5/J+/v7qPdvdndmxsbGzsdAwAAdtW+Jd77XUluqqrvS/KyJN9YVf8lyZNVtb+7T1XV/iRP7cagAACwCjs+gtzdt3f31d19IMmtST7S3T+Q5N4khxYvO5TknqWnBACAFdmL70G+M8n3VNVDSb5n8RgAAC4Iy5xi8azu/pUkv7K4/7tJrt+NnwsAAKvmSnoAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwLDjQK6qa6rql6vqwap6oKresVi/rKruq6qHFreX7t64AACwt5Y5gvxMkn/W3d+a5DuTvK2qrktyJMnx7j6Y5PjiMQAAXBB2HMjdfaq7P7G4/wdJHkxyVZKbkxxbvOxYkluWnBEAAFZmV85BrqoDSb4tyceSXNndp5KtiE5yxVnec7iqTlTVidOnT+/GGAAAsLSlA7mqviHJzyf5ke7+8vN9X3cf7e7N7t7c2NhYdgwAANgVSwVyVf3ZbMXxe7v7g4vlJ6tq/+L5/UmeWm5EAABYnWW+xaKS/EySB7v7X4+n7k1yaHH/UJJ7dj4eAACs1r4l3vtdSf5Rkt+qqk8t1v55kjuT3F1VtyV5NMkbl5oQAABWaMeB3N2/nqTO8vT1O/25AACwTq6kBwAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMexbIVXVDVX2uqh6uqiN79TkAALCb9iSQq+qSJP8+yfcmuS7Jm6rqur34LAAA2E17dQT59Uke7u7f7u6vJHl/kpv36LMAAGDX7Nujn3tVksfG45NJ/tp8QVUdTnJ48fAPq+pzezQLe+/yJF9c9xCcN+wHtrMn2M6e4Fn1E0nWtyf+wpkW9yqQ6wxr/ZwH3UeTHN2jz2eFqupEd2+uew7OD/YD29kTbGdPsN35tif26hSLk0muGY+vTvLEHn0WAADsmr0K5P+Z5GBVXVtVL0lya5J79+izAABg1+zJKRbd/UxV/VCSX0pySZK7uvuBvfgszgtOlWGyH9jOnmA7e4Ltzqs9Ud197lcBAMBFwpX0AABgEMgAADAIZJ63qnpNVX1q/PflqvqRqvrJqvpsVf1mVf1CVb1y3bOyGl9nT/z4Yj98qqo+XFXftO5ZWY2z7Ynx/I9WVVfV5WsckxX5Or8j/mVVPT7Wv2/ds7IaX+93RFX9cFV9rqoeqKp/tdY5nYPMTiwuJ/54ti4A85okH1n8z5k/kSTd/c51zsfqbdsTv9/dX16svz3Jdd391nXOx+rNPdHdX6iqa5K8J8m3JPmr3e1CEReRbb8jfjDJH3b3T613KtZp2554dZJ3Jbmxu5+uqiu6+6l1zeYIMjt1fZL/3d1f6O4Pd/czi/WPZut7r7n4zD3x5bH+imy7UBAXjWf3xOLxv0nyY7EfLlbb9wPMPfFPktzZ3U8nyTrjOBHI7NytSd53hvW3JPnFFc/C+eE5e6Kq3l1VjyV5c5J/sbapWKdn90RV3ZTk8e7+jfWOxBpt/3fjhxanYt1VVZeuayjWau6Jb07yN6vqY1X1q1X1HWucyykWvHCLi788keS13f3kWH9Xks0k/6BtrIvK2fbE4rnbk7ysu+9Yy3CsxdwTSf4gyS8n+bvd/aWqeiTJplMsLh7bf0dU1ZVJvpitvyb8eJL93f2Wdc7Iap1hT3w6yUeSvCPJdyT5uSSvXldPOILMTnxvkk9si+NDSd6Q5M3i+KL0p/bE8F+T/MMVz8P6zT3xF5Ncm+Q3FnF8dZJPVNWfX+N8rNZzfkd095Pd/dXu/lqS/5jk9WudjnXY/u/GySQf7C0fT/K1JGv7n3kFMjvxpjz3T+k3JHlnkpu6+4/WNhXrtH1PHBzP3ZTksyufiHV7dk9092919xXdfaC7D2TrH8Jv7+7fWeeArNT23xH7x3N/P8mnVz4R6/acPZHkvyX520lSVd+c5CXZ+ivDWjjFghekql6e5LFs/dnjS4u1h5O8NMnvLl72Ud9YcPE4y574+Wx9u8nXknwhyVu7+/H1TckqnWlPbHv+kTjF4qJxlt8R/znJX8nWKRaPJPnH3X1qXTOyWmfZEy9Jcle29sVXkvxod39kbTMKZAAA+BNOsQAAgEEgAwDAIJABAGAQyAAAMAhkAAAYBDIAAAwCGQAAhv8PCCPnhqb/Rl0AAAAASUVORK5CYII=\n", - "text/plain": [ - "
" + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "WqyT_O1ZIUW4", + "outputId": "b4a8f094-be78-4c10-8561-e4ae845a2e92" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "[74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73, 74, 74, 69, 70, 72, 73, 75, 78]\n" + ] + } + ], + "source": [ + "print(list(df['Height'])[:20])" ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "wrong_sample = np.random.rand(1000)*2*std+mean-std\n", - "plt.figure(figsize=(10,6))\n", - "plt.hist(wrong_sample)\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Confidence Intervals\n", - "\n", - "Let's now calculate confidence intervals for the weights and heights of baseball players. We will use the code [from this stackoverflow discussion](https://stackoverflow.com/questions/15033511/compute-a-confidence-interval-from-sample-data):" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "p=0.85, mean = 201.73 ± 0.94\n", - "p=0.90, mean = 201.73 ± 1.08\n", - "p=0.95, mean = 201.73 ± 1.28\n" - ] - } - ], - "source": [ - "import scipy.stats\n", - "\n", - "def mean_confidence_interval(data, confidence=0.95):\n", - " a = 1.0 * np.array(data)\n", - " n = len(a)\n", - " m, se = np.mean(a), scipy.stats.sem(a)\n", - " h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1)\n", - " return m, h\n", - "\n", - "for p in [0.85, 0.9, 0.95]:\n", - " m, h = mean_confidence_interval(df['Weight'].fillna(method='pad'),p)\n", - " print(f\"p={p:.2f}, mean = {m:.2f} ± {h:.2f}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Hypothesis Testing\n", - "\n", - "Let's explore different roles in our baseball players dataset:" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
HeightWeightCount
Role
Catcher72.723684204.32894776
Designated_Hitter74.222222220.88888918
First_Baseman74.000000213.10909155
Outfielder73.010309199.113402194
Relief_Pitcher74.374603203.517460315
Second_Baseman71.362069184.34482858
Shortstop71.903846182.92307752
Starting_Pitcher74.719457205.163636221
Third_Baseman73.044444200.95555645
\n", - "
" + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hIjGQv-0IUW6", + "outputId": "e26ca71a-2697-46a0-f599-986f71b961e0" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Mean = 73.6972920696325\n", + "Variance = 5.316798081118074\n", + "Standard Deviation = 2.3058183105175645\n" + ] + } ], - "text/plain": [ - " Height Weight Count\n", - "Role \n", - "Catcher 72.723684 204.328947 76\n", - "Designated_Hitter 74.222222 220.888889 18\n", - "First_Baseman 74.000000 213.109091 55\n", - "Outfielder 73.010309 199.113402 194\n", - "Relief_Pitcher 74.374603 203.517460 315\n", - "Second_Baseman 71.362069 184.344828 58\n", - "Shortstop 71.903846 182.923077 52\n", - "Starting_Pitcher 74.719457 205.163636 221\n", - "Third_Baseman 73.044444 200.955556 45" + "source": [ + "mean = df['Height'].mean()\n", + "var = df['Height'].var()\n", + "std = df['Height'].std()\n", + "print(f\"Mean = {mean}\\nVariance = {var}\\nStandard Deviation = {std}\")" ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.groupby('Role').agg({ 'Height' : 'mean', 'Weight' : 'mean', 'Age' : 'count'}).rename(columns={ 'Age' : 'Count'})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's test the hypothesis that First Basemen are taller than Second Basemen. The simplest way to do this is to test the confidence intervals:" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Conf=0.85, 1st basemen height: 73.62..74.38, 2nd basemen height: 71.04..71.69\n", - "Conf=0.90, 1st basemen height: 73.56..74.44, 2nd basemen height: 70.99..71.73\n", - "Conf=0.95, 1st basemen height: 73.47..74.53, 2nd basemen height: 70.92..71.81\n" - ] - } - ], - "source": [ - "for p in [0.85,0.9,0.95]:\n", - " m1, h1 = mean_confidence_interval(df.loc[df['Role']=='First_Baseman',['Height']],p)\n", - " m2, h2 = mean_confidence_interval(df.loc[df['Role']=='Second_Baseman',['Height']],p)\n", - " print(f'Conf={p:.2f}, 1st basemen height: {m1-h1[0]:.2f}..{m1+h1[0]:.2f}, 2nd basemen height: {m2-h2[0]:.2f}..{m2+h2[0]:.2f}')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can see that the intervals do not overlap.\n", - "\n", - "A statistically more correct way to prove the hypothesis is to use a **Student t-test**:" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "T-value = 7.65\n", - "P-value: 9.137321189738925e-12\n" - ] - } - ], - "source": [ - "from scipy.stats import ttest_ind\n", - "\n", - "tval, pval = ttest_ind(df.loc[df['Role']=='First_Baseman',['Height']], df.loc[df['Role']=='Second_Baseman',['Height']],equal_var=False)\n", - "print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The two values returned by the `ttest_ind` function are:\n", - "* p-value can be considered as the probability of two distributions having the same mean. In our case, it is very low, meaning that there is strong evidence supporting that first basemen are taller.\n", - "* t-value is the intermediate value of normalized mean difference that is used in the t-test, and it is compared against a threshold value for a given confidence value." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Simulating a Normal Distribution with the Central Limit Theorem\n", - "\n", - "The pseudo-random generator in Python is designed to give us a uniform distribution. If we want to create a generator for normal distribution, we can use the central limit theorem. To get a normally distributed value we will just compute a mean of a uniform-generated sample." - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAsgAAAGoCAYAAABbtxOxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAARLElEQVR4nO3df4zkd13H8ddblgbkR4DcghU4Fgghlj/4kbOIGFNDMEiNQIIJJGI1mFMjBJREL/yh/FnjryZGMRWQGn6FQPkRriqkkqCJEq9QQpuCIFQsXLg2KKAxIS0f/9g5eLfdc7fznd3v7O3jkUxu5rszO+/93Ox+n/e9mZ0aYwQAANj2A3MPAAAA60QgAwBAI5ABAKARyAAA0AhkAABoNg7yzo4dOza2trYO8i4BAGBHN910011jjM37bj/QQN7a2sqZM2cO8i4BAGBHVfXvO233FAsAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQCGQAAGg25h4AgAdm69TpuUeYxe1XXzn3CMAR4QgyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCg2TWQq+qJVfXxqrqtqm6tqtcttj+mqj5WVV9Y/Pno/R8XAAD2116OIN+d5A1jjB9J8mNJfqOqLktyKsmNY4ynJblxcRkAAA61XQN5jHF2jPGpxflvJ7ktyeOTvCTJdYurXZfkpfs0IwAAHJgH9BzkqtpK8uwkn0zyuDHG2WQ7opM89gK3OVlVZ6rqzJ133jlxXAAA2F97DuSqeniS9yd5/RjjW3u93Rjj2jHGiTHGic3NzWVmBACAA7OnQK6qB2c7jt85xrh+sfnrVXXp4uOXJjm3PyMCAMDB2ctvsagkb01y2xjjj9uHPpzkqsX5q5J8aPXjAQDAwdrYw3Wen+RVST5bVTcvtr0xydVJ3ltVr07ylSQ/vy8TAgDAAdo1kMcY/5ikLvDhF6x2HAAAmJd30gMAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoNuYeAGCKrVOn5x4BgIuMI8gAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBm10CuqrdV1bmquqVte1NVfbWqbl6cXry/YwIAwMHYyxHktyd50Q7b/2SM8azF6YbVjgUAAPPYNZDHGJ9I8o0DmAUAAGa3MeG2r6mqX0xyJskbxhj/udOVqupkkpNJcvz48Ql3BwBHz9ap03OPcOBuv/rKuUfgiFv2RXpvTvLUJM9KcjbJH13oimOMa8cYJ8YYJzY3N5e8OwAAOBhLBfIY4+tjjHvGGN9N8pdJLl/tWAAAMI+lArmqLm0XX5bklgtdFwAADpNdn4NcVe9OckWSY1V1R5LfS3JFVT0ryUhye5Jf3b8RAQDg4OwayGOMV+6w+a37MAsAAMzOO+kBAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQbMw9AADsxdap03OPABwRjiADAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAECzMfcAwGpsnTo99wgAcFFwBBkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANDsGshV9baqOldVt7Rtj6mqj1XVFxZ/Pnp/xwQAgIOxlyPIb0/yovtsO5XkxjHG05LcuLgMAACH3q6BPMb4RJJv3GfzS5Jctzh/XZKXrnYsAACYx8aSt3vcGONskowxzlbVYy90xao6meRkkhw/fnzJuwMAjoqtU6fnHmEWt1995dwjsLDvL9IbY1w7xjgxxjixubm533cHAACTLBvIX6+qS5Nk8ee51Y0EAADzWTaQP5zkqsX5q5J8aDXjAADAvPbya97eneSfkjy9qu6oqlcnuTrJC6vqC0leuLgMAACH3q4v0htjvPICH3rBimcBAIDZeSc9AABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAADNxpQbV9XtSb6d5J4kd48xTqxiKAAAmMukQF74qTHGXSv4PAAAMDtPsQAAgGZqII8kH62qm6rq5CoGAgCAOU19isXzxxhfq6rHJvlYVX1ujPGJfoVFOJ9MkuPHj0+8OwCAi9PWqdNzjzCL26++cu4R7mfSEeQxxtcWf55L8oEkl+9wnWvHGCfGGCc2Nzen3B0AAOy7pQO5qh5WVY84fz7JTye5ZVWDAQDAHKY8xeJxST5QVec/z7vGGH+7kqkAAGAmSwfyGONLSZ65wlkAAGB2fs0bAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQLMx9wCwalunTs89AgBwiDmCDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAzcbcAxyUrVOn5x4BAIBDwBFkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgGZSIFfVi6rq81X1xao6taqhAABgLksHclU9KMmfJfmZJJcleWVVXbaqwQAAYA5TjiBfnuSLY4wvjTG+k+Q9SV6ymrEAAGAeGxNu+/gk/9Eu35Hkufe9UlWdTHJycfG/q+rzE+7zMDuW5K65h7gIWMfVsZarYy1XwzqujrVcHWu5Ghdcx/r9A57k3p6008YpgVw7bBv32zDGtUmunXA/F4WqOjPGODH3HIeddVwda7k61nI1rOPqWMvVsZarcdjWccpTLO5I8sR2+QlJvjZtHAAAmNeUQP6XJE+rqidX1SVJXpHkw6sZCwAA5rH0UyzGGHdX1WuS/F2SByV52xjj1pVNdvE58k8zWRHruDrWcnWs5WpYx9WxlqtjLVfjUK1jjXG/pw0DAMCR5Z30AACgEcgAANAI5In2+nbbVfWjVXVPVb18cfmJVfXxqrqtqm6tqtcd3NTradm1bNsfVFWfrqqP7P+062vKOlbVo6rqfVX1ucVj83kHM/V6mriWv7n43r6lqt5dVQ85mKnX025rWVVXVNU3q+rmxel393rbo2TZdbTPub8pj8nFx+1zFiZ+f6/nfmeM4bTkKdsvTvy3JE9JckmSzyS57ALX+/skNyR5+WLbpUmeszj/iCT/utNtj8ppylq2j/1Wkncl+cjcX89hXcck1yX5lcX5S5I8au6v6TCuZbbfSOnLSR66uPzeJL8099e0zmuZ5Iqdvnf3+vdwFE4T19E+Z0Vr2T5+5Pc5q1jLdd3vOII8zV7fbvu1Sd6f5Nz5DWOMs2OMTy3OfzvJbdneqR5VS69lklTVE5JcmeQt+z3omlt6HavqkUl+Mslbk2SM8Z0xxn/t+8Tra9JjMtu/JeihVbWR5AdztH9P/F7XctW3vdgsvRb2Ofcz6XFln3MvS6/lOu93BPI0O73d9r1+4FTV45O8LMlfXOiTVNVWkmcn+eTqRzw0pq7lNUl+O8l392m+w2LKOj4lyZ1J/mrx34ZvqaqH7eewa27ptRxjfDXJHyb5SpKzSb45xvjovk673nZdy4XnVdVnqupvquoZD/C2R8GUdfwe+5wk09fymtjnnDdlLdd2vyOQp9nL221fk+R3xhj37PgJqh6e7aNPrx9jfGu14x0qS69lVf1sknNjjJv2abbDZMpjciPJc5K8eYzx7CT/k+QoP99zymPy0dk+gvLkJD+c5GFV9Qv7MeQhsZe1/FSSJ40xnpnkT5N88AHc9qiYso7bn8A+57yl19I+536mPC7Xdr+z9BuFkGRvb7d9Isl7qipJjiV5cVXdPcb4YFU9ONs/qN45xrj+IAZeY0uvZZLnJvm5qnpxkockeWRVvWOMcRSDZMo6/nOSO8YY548qvS9r8oNqJlPW8sFJvjzGuDNJqur6JD+e5B37PfSa2nUte6yNMW6oqj+vqmN7ue0RsvQ6jjHuss+5lymPyefHPqeb+v29nvuduZ8EfZhP2f4HxpeyfZTo/BPTn/H/XP/t+f6LeCrJXye5Zu6vYx1OU9byPtuvyBF+wcTUdUzyD0mevjj/piR/MPfXdBjXMtv/aLs12889rmy/COW1c39N67yWSX4o33/zqsuz/fSUeqB/DxfzaeI62uesaC3vc50jvc9ZxVqu637HEeQJxgXebruqfm3x8Qs+7zjb/wJ9VZLPVtXNi21vHGPcsJ8zr6uJa8nCCtbxtUneWVWXZPsH3i/v68BrbMpajjE+WVXvy/Z/K96d5NM5ZG+zukp7XMuXJ/n1xRH4/03yirG9x9zxtrN8ITObso5V9ROxz/meiY9JmhWs5Vrud7zVNAAANF6kBwAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0/wceFVFs3MY9ywAAAABJRU5ErkJggg==\n", - "text/plain": [ - "
" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_CtgdwXxIUW7" + }, + "source": [ + "In addition to mean, it makes sense to look at the median value and quartiles. They can be visualized using a **box plot**:" ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "def normal_random(sample_size=100):\n", - " sample = [random.uniform(0,1) for _ in range(sample_size) ]\n", - " return sum(sample)/sample_size\n", - "\n", - "sample = [normal_random() for _ in range(100)]\n", - "plt.figure(figsize=(10,6))\n", - "plt.hist(sample)\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Correlation and Evil Baseball Corp\n", - "\n", - "Correlation allows us to find relations between data sequences. In our toy example, let's pretend there is an evil baseball corporation that pays its players according to their height - the taller the player is, the more money he/she gets. Suppose there is a base salary of $1000, and an additional bonus from $0 to $100, depending on height. We will take the real players from MLB, and compute their imaginary salaries:" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[(74, 1075.2469071629068), (74, 1075.2469071629068), (72, 1053.7477908306478), (72, 1053.7477908306478), (73, 1064.4973489967772), (69, 1021.4991163322591), (69, 1021.4991163322591), (71, 1042.9982326645181), (76, 1096.746023495166), (71, 1042.9982326645181)]\n" - ] - } - ], - "source": [ - "heights = df['Height']\n", - "salaries = 1000+(heights-heights.min())/(heights.max()-heights.mean())*100\n", - "print(list(zip(heights, salaries))[:10])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's now compute covariance and correlation of those sequences. `np.cov` will give us a so-called **covariance matrix**, which is an extension of covariance to multiple variables. The element $M_{ij}$ of the covariance matrix $M$ is a correlation between input variables $X_i$ and $X_j$, and diagonal values $M_{ii}$ is the variance of $X_{i}$. Similarly, `np.corrcoef` will give us the **correlation matrix**." - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Covariance matrix:\n", - "[[ 5.31679808 57.15323023]\n", - " [ 57.15323023 614.37197275]]\n", - "Covariance = 57.153230230544736\n", - "Correlation = 1.0\n" - ] - } - ], - "source": [ - "print(f\"Covariance matrix:\\n{np.cov(heights, salaries)}\")\n", - "print(f\"Covariance = {np.cov(heights, salaries)[0,1]}\")\n", - "print(f\"Correlation = {np.corrcoef(heights, salaries)[0,1]}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "A correlation equal to 1 means that there is a strong **linear relation** between two variables. We can visually see the linear relation by plotting one value against the other:" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 207 + }, + "id": "GYjvNaCmIUW8", + "outputId": "aabf89b5-3442-4ba8-f365-18a87bd8ab52" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.figure(figsize=(10,2))\n", + "plt.boxplot(df['Height'], vert=False, showmeans=True)\n", + "plt.grid(color='gray', linestyle='dotted')\n", + "plt.tight_layout()\n", + "plt.show()" ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "plt.figure(figsize=(10,6))\n", - "plt.scatter(heights,salaries)\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's see what happens if the relation is not linear. Suppose that our corporation decided to hide the obvious linear dependency between heights and salaries, and introduced some non-linearity into the formula, such as `sin`:" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Correlation = 0.9835304456670837\n" - ] - } - ], - "source": [ - "salaries = 1000+np.sin((heights-heights.min())/(heights.max()-heights.mean()))*100\n", - "print(f\"Correlation = {np.corrcoef(heights, salaries)[0,1]}\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this case, the correlation is slightly smaller, but it is still quite high. Now, to make the relation even less obvious, we might want to add some extra randomness by adding some random variable to the salary. Let's see what happens:" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Correlation = 0.9363097848296155\n" - ] - } - ], - "source": [ - "salaries = 1000+np.sin((heights-heights.min())/(heights.max()-heights.mean()))*100+np.random.random(size=len(heights))*20-10\n", - "print(f\"Correlation = {np.corrcoef(heights, salaries)[0,1]}\")" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RqV-Mw1SIUW8" + }, + "source": [ + "We can also make box plots of subsets of our dataset, for example, grouped by player role." ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "plt.figure(figsize=(10,6))\n", - "plt.scatter(heights, salaries)\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "> Can you guess why the dots line up into vertical lines like this?\n", - "\n", - "We have observed the correlation between an artificially engineered concept like salary and the observed variable *height*. Let's also see if the two observed variables, such as height and weight, correlate too:" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([[ 1., nan],\n", - " [nan, nan]])" + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 806 + }, + "id": "TFRfrvyKIUW8", + "outputId": "3c8c28e4-1788-4df4-9468-834e2e51e6e4" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "df.boxplot(column='Height', by='Role', figsize=(10,8))\n", + "plt.xticks(rotation='vertical')\n", + "plt.tight_layout()\n", + "plt.show()" ] - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "np.corrcoef(df['Height'],df['Weight'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Unfortunately, we did not get any results - only some strange `nan` values. This is due to the fact that some of the values in our series are undefined, represented as `nan`, which causes the result of the operation to be undefined as well. By looking at the matrix we can see that `Weight` is the problematic column, because self-correlation between `Height` values has been computed.\n", - "\n", - "> This example shows the importance of **data preparation** and **cleaning**. Without proper data we cannot compute anything.\n", - "\n", - "Let's use `fillna` method to fill the missing values, and compute the correlation: " - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([[1. , 0.52959196],\n", - " [0.52959196, 1. ]])" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6XHFyajmIUW8" + }, + "source": [ + "> **Note**: This diagram suggests, that on average, the heights of first basemen are higher than heights of second basemen. Later we will learn how we can test this hypothesis more formally, and how to demonstrate that our data is statistically significant to show that. \n", + "\n", + "Age, height and weight are all continuous random variables. What do you think their distribution is? A good way to find out is to plot the histogram of values:" ] - }, - "execution_count": 27, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "np.corrcoef(df['Height'],df['Weight'].fillna(method='pad'))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There is indeed a correlation, but not such a strong one as in our artificial example. Indeed, if we look at the scatter plot of one value against the other, the relation would be much less obvious:" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 609 + }, + "id": "CT_dXGvOIUW-", + "outputId": "908dd0d8-6c4b-4375-a50b-b42175db5ef2" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "df['Weight'].hist(bins=15, figsize=(10,6))\n", + "plt.suptitle('Weight distribution of MLB Players')\n", + "plt.xlabel('Weight')\n", + "plt.ylabel('Count')\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i5Ml9GbUIUW-" + }, + "source": [ + "## Normal Distribution\n", + "\n", + "Let's create an artificial sample of weights that follows a normal distribution with the same mean and variance as our real data:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "eQ2vZ_EDIUW-", + "outputId": "5ba7d226-cdf5-45f4-dbbb-24ecb391fe79" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([71.28129096, 78.82173271, 71.81954911, 72.27705699, 68.09917695,\n", + " 74.42089598, 72.63965744, 67.52063996, 73.40784061, 75.4298678 ,\n", + " 76.53090645, 75.62114278, 75.93431985, 74.11904376, 76.31097626,\n", + " 76.68590245, 73.03769741, 72.48497214, 75.89118176, 74.65380817])" + ] + }, + "metadata": {}, + "execution_count": 12 + } + ], + "source": [ + "generated = np.random.normal(mean, std, 1000)\n", + "generated[:20]" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 607 + }, + "id": "d-RNVGrJIUW_", + "outputId": "d1b62c49-6705-4409-ef44-39497de7e714" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAA94AAAJOCAYAAABBfN/cAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAw0klEQVR4nO3de3DV9Z34/1cwEBBNMLgkpBJhrZVLreKlNErrLSsi46WwWnaptchIuwWtsKNCK7b2YtC1yuqiVMciTqEXt0JRplgLKrZGFNB2rS6gRWHFhO5QEkGJSD6/P/rzfDdC1eB5ewg8HjNnxvP5fM6H15m30Tz5nEtRlmVZAAAAAEl0KvQAAAAAsC8T3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELFhR5gT7S2tsbGjRvj4IMPjqKiokKPAwAAwH4my7J4/fXXo6qqKjp1eu9r2h0yvDdu3Bh9+vQp9BgAAADs5zZs2BCHHXbYex7TIcP74IMPjoi/PsHS0tICTwMAAMD+prm5Ofr06ZPr0/fSIcP7nZeXl5aWCm8AAAAK5oO8/dmHqwEAAEBCwhsAAAASEt4AAACQkPAGAACAhIQ3AAAAJCS8AQAAICHhDQAAAAkJbwAAAEhIeAMAAEBCwhsAAAASEt4AAACQkPAGAACAhIQ3AAAAJCS8AQAAICHhDQAAAAkJbwAAAEhIeAMAAEBCwhsAAAASEt4AAACQkPAGAACAhIQ3AAAAJCS8AQAAICHhDQAAAAkVF3oAAOCv+k5ZVOgRknp5+ohCjwAABeGKNwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJtTu8ly1bFuecc05UVVVFUVFRLFiwYJdjXnjhhTj33HOjrKwsunfvHieeeGKsX78+t3/79u0xYcKE6NmzZxx00EExatSoaGxs/FBPBAAAAPZG7Q7vbdu2xTHHHBMzZ87c7f6XXnophg4dGv37949HH300/vCHP8S0adOia9euuWMmTZoUDzzwQNx3333x2GOPxcaNG2PkyJF7/iwAAABgL1Xc3gcMHz48hg8f/jf3f/Ob34yzzz47brzxxty2I444IvfPTU1Ncffdd8e8efPi9NNPj4iI2bNnx4ABA+LJJ5+Mz3zmM+0dCQAAAPZaeX2Pd2trayxatCg+8YlPxLBhw6JXr14xZMiQNi9HX7lyZezYsSNqa2tz2/r37x/V1dVRX1+fz3EAAACg4PIa3ps2bYqtW7fG9OnT46yzzopf//rX8fnPfz5GjhwZjz32WERENDQ0RJcuXaJHjx5tHltRURENDQ27PW9LS0s0Nze3uQEAAEBH0O6Xmr+X1tbWiIg477zzYtKkSRERceyxx8YTTzwRs2bNilNOOWWPzltXVxfXXXdd3uYEAACAj0per3gfeuihUVxcHAMHDmyzfcCAAblPNa+srIy33nortmzZ0uaYxsbGqKys3O15p06dGk1NTbnbhg0b8jk2AAAAJJPX8O7SpUuceOKJsXr16jbb16xZE4cffnhERBx//PHRuXPnWLJkSW7/6tWrY/369VFTU7Pb85aUlERpaWmbGwAAAHQE7X6p+datW+PFF1/M3V+3bl08++yzUV5eHtXV1XHllVfGF77whfjc5z4Xp512WixevDgeeOCBePTRRyMioqysLMaNGxeTJ0+O8vLyKC0tjcsuuyxqamp8ojkAAAD7nHaH94oVK+K0007L3Z88eXJERFx88cVxzz33xOc///mYNWtW1NXVxeWXXx5HHXVU/OIXv4ihQ4fmHnPLLbdEp06dYtSoUdHS0hLDhg2L22+/PQ9PBwAAAPYuRVmWZYUeor2am5ujrKwsmpqavOwcgH1G3ymLCj1CUi9PH1HoEQAgb9rTpXl9jzcAAADQlvAGAACAhIQ3AAAAJCS8AQAAICHhDQAAAAkJbwAAAEhIeAMAAEBCwhsAAAASEt4AAACQUHGhBwCAD6LvlEWFHgEAYI+44g0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACCh4kIPAADsH/pOWVToEZJ7efqIQo8AwF7IFW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJtTu8ly1bFuecc05UVVVFUVFRLFiw4G8e+9WvfjWKiopixowZbbZv3rw5xowZE6WlpdGjR48YN25cbN26tb2jAAAAwF6v3eG9bdu2OOaYY2LmzJnvedz8+fPjySefjKqqql32jRkzJv74xz/Gww8/HA8++GAsW7Ysxo8f395RAAAAYK9X3N4HDB8+PIYPH/6ex7z66qtx2WWXxUMPPRQjRoxos++FF16IxYsXx9NPPx0nnHBCRETcdtttcfbZZ8dNN92021AHAACAjirv7/FubW2Niy66KK688soYNGjQLvvr6+ujR48eueiOiKitrY1OnTrF8uXL8z0OAAAAFFS7r3i/nxtuuCGKi4vj8ssv3+3+hoaG6NWrV9shioujvLw8GhoadvuYlpaWaGlpyd1vbm7O38AAAACQUF6veK9cuTL+/d//Pe65554oKirK23nr6uqirKwsd+vTp0/ezg0AAAAp5TW8H3/88di0aVNUV1dHcXFxFBcXxyuvvBL/+q//Gn379o2IiMrKyti0aVObx7399tuxefPmqKys3O15p06dGk1NTbnbhg0b8jk2AAAAJJPXl5pfdNFFUVtb22bbsGHD4qKLLoqxY8dGRERNTU1s2bIlVq5cGccff3xERCxdujRaW1tjyJAhuz1vSUlJlJSU5HNUAAAA+Ei0O7y3bt0aL774Yu7+unXr4tlnn43y8vKorq6Onj17tjm+c+fOUVlZGUcddVRERAwYMCDOOuusuPTSS2PWrFmxY8eOmDhxYowePdonmgMAALDPafdLzVesWBGDBw+OwYMHR0TE5MmTY/DgwXHttdd+4HPMnTs3+vfvH2eccUacffbZMXTo0LjzzjvbOwoAAADs9dp9xfvUU0+NLMs+8PEvv/zyLtvKy8tj3rx57f2jAQAAoMPJ+/d4AwAAAP+P8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAk1O7wXrZsWZxzzjlRVVUVRUVFsWDBgty+HTt2xNVXXx1HH310dO/ePaqqquJLX/pSbNy4sc05Nm/eHGPGjInS0tLo0aNHjBs3LrZu3fqhnwwAAADsbdod3tu2bYtjjjkmZs6cucu+N954I1atWhXTpk2LVatWxf333x+rV6+Oc889t81xY8aMiT/+8Y/x8MMPx4MPPhjLli2L8ePH7/mzAAAAgL1UUZZl2R4/uKgo5s+fH+eff/7fPObpp5+OT3/60/HKK69EdXV1vPDCCzFw4MB4+umn44QTToiIiMWLF8fZZ58d//M//xNVVVXv++c2NzdHWVlZNDU1RWlp6Z6OD0AH0nfKokKPAO/r5ekjCj0CAB+R9nRp8vd4NzU1RVFRUfTo0SMiIurr66NHjx656I6IqK2tjU6dOsXy5ctTjwMAAAAfqeKUJ9++fXtcffXV8U//9E+5vwFoaGiIXr16tR2iuDjKy8ujoaFht+dpaWmJlpaW3P3m5uZ0QwMAAEAeJbvivWPHjrjwwgsjy7K44447PtS56urqoqysLHfr06dPnqYEAACAtJKE9zvR/corr8TDDz/c5vXulZWVsWnTpjbHv/3227F58+aorKzc7fmmTp0aTU1NuduGDRtSjA0AAAB5l/eXmr8T3WvXro1HHnkkevbs2WZ/TU1NbNmyJVauXBnHH398REQsXbo0WltbY8iQIbs9Z0lJSZSUlOR7VAAAAEiu3eG9devWePHFF3P3161bF88++2yUl5dH79694x//8R9j1apV8eCDD8bOnTtz79suLy+PLl26xIABA+Kss86KSy+9NGbNmhU7duyIiRMnxujRoz/QJ5oDAABAR9Lu8F6xYkWcdtppufuTJ0+OiIiLL744vv3tb8fChQsjIuLYY49t87hHHnkkTj311IiImDt3bkycODHOOOOM6NSpU4waNSpuvfXWPXwKAAAAsPdqd3ifeuqp8V5f/f1Bvha8vLw85s2b194/GgAAADqc5N/jDQAAAPsz4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACCh4kIPAACwr+g7ZVGhR0ju5ekjCj0CQIfjijcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAk1O7wXrZsWZxzzjlRVVUVRUVFsWDBgjb7syyLa6+9Nnr37h3dunWL2traWLt2bZtjNm/eHGPGjInS0tLo0aNHjBs3LrZu3fqhnggAAADsjYrb+4Bt27bFMcccE5dcckmMHDlyl/033nhj3HrrrTFnzpzo169fTJs2LYYNGxbPP/98dO3aNSIixowZE6+99lo8/PDDsWPHjhg7dmyMHz8+5s2b9+GfEcB+qu+URYUeAQCA3Wh3eA8fPjyGDx++231ZlsWMGTPimmuuifPOOy8iIu69996oqKiIBQsWxOjRo+OFF16IxYsXx9NPPx0nnHBCRETcdtttcfbZZ8dNN90UVVVVH+LpAAAAwN4lr+/xXrduXTQ0NERtbW1uW1lZWQwZMiTq6+sjIqK+vj569OiRi+6IiNra2ujUqVMsX748n+MAAABAwbX7ivd7aWhoiIiIioqKNtsrKipy+xoaGqJXr15thygujvLy8twx79bS0hItLS25+83NzfkcGwAAAJLpEJ9qXldXF2VlZblbnz59Cj0SAAAAfCB5De/KysqIiGhsbGyzvbGxMbevsrIyNm3a1Gb/22+/HZs3b84d825Tp06Npqam3G3Dhg35HBsAAACSyWt49+vXLyorK2PJkiW5bc3NzbF8+fKoqamJiIiamprYsmVLrFy5MnfM0qVLo7W1NYYMGbLb85aUlERpaWmbGwAAAHQE7X6P99atW+PFF1/M3V+3bl08++yzUV5eHtXV1XHFFVfE9773vTjyyCNzXydWVVUV559/fkREDBgwIM4666y49NJLY9asWbFjx46YOHFijB492ieaAwAAsM9pd3ivWLEiTjvttNz9yZMnR0TExRdfHPfcc09cddVVsW3bthg/fnxs2bIlhg4dGosXL859h3dExNy5c2PixIlxxhlnRKdOnWLUqFFx66235uHpAAAAwN6lKMuyrNBDtFdzc3OUlZVFU1OTl50D/P/6TllU6BGA/cDL00cUegSAvUJ7urRDfKo5AAAAdFTCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJBQ3sN7586dMW3atOjXr19069YtjjjiiPjud78bWZbljsmyLK699tro3bt3dOvWLWpra2Pt2rX5HgUAAAAKrjjfJ7zhhhvijjvuiDlz5sSgQYNixYoVMXbs2CgrK4vLL788IiJuvPHGuPXWW2POnDnRr1+/mDZtWgwbNiyef/756Nq1a75HAoi+UxYVegQAAPZTeQ/vJ554Is4777wYMWJERET07ds3fvKTn8RTTz0VEX+92j1jxoy45ppr4rzzzouIiHvvvTcqKipiwYIFMXr06HyPBAAAAAWT95ean3TSSbFkyZJYs2ZNRET8/ve/j9/+9rcxfPjwiIhYt25dNDQ0RG1tbe4xZWVlMWTIkKivr8/3OAAAAFBQeb/iPWXKlGhubo7+/fvHAQccEDt37ozvf//7MWbMmIiIaGhoiIiIioqKNo+rqKjI7Xu3lpaWaGlpyd1vbm7O99gAAACQRN6veP/85z+PuXPnxrx582LVqlUxZ86cuOmmm2LOnDl7fM66urooKyvL3fr06ZPHiQEAACCdvIf3lVdeGVOmTInRo0fH0UcfHRdddFFMmjQp6urqIiKisrIyIiIaGxvbPK6xsTG3792mTp0aTU1NuduGDRvyPTYAAAAkkffwfuONN6JTp7anPeCAA6K1tTUiIvr16xeVlZWxZMmS3P7m5uZYvnx51NTU7PacJSUlUVpa2uYGAAAAHUHe3+N9zjnnxPe///2orq6OQYMGxTPPPBM333xzXHLJJRERUVRUFFdccUV873vfiyOPPDL3dWJVVVVx/vnn53scAAAAKKi8h/dtt90W06ZNi6997WuxadOmqKqqiq985Stx7bXX5o656qqrYtu2bTF+/PjYsmVLDB06NBYvXuw7vAEAANjnFGVZlhV6iPZqbm6OsrKyaGpq8rJz4APpO2VRoUcA2Ce8PH1EoUcA2Cu0p0vz/h5vAAAA4P8R3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJBQcaEHAACg4+g7ZVGhR0jq5ekjCj0CsA9yxRsAAAASEt4AAACQkPAGAACAhIQ3AAAAJCS8AQAAICHhDQAAAAkJbwAAAEhIeAMAAEBCwhsAAAASEt4AAACQkPAGAACAhIQ3AAAAJCS8AQAAICHhDQAAAAkJbwAAAEhIeAMAAEBCwhsAAAASEt4AAACQkPAGAACAhIQ3AAAAJCS8AQAAICHhDQAAAAkJbwAAAEhIeAMAAEBCwhsAAAASEt4AAACQkPAGAACAhIQ3AAAAJCS8AQAAICHhDQAAAAkJbwAAAEhIeAMAAEBCScL71VdfjS9+8YvRs2fP6NatWxx99NGxYsWK3P4sy+Laa6+N3r17R7du3aK2tjbWrl2bYhQAAAAoqLyH91/+8pc4+eSTo3PnzvGrX/0qnn/++fjBD34QhxxySO6YG2+8MW699daYNWtWLF++PLp37x7Dhg2L7du353scAAAAKKjifJ/whhtuiD59+sTs2bNz2/r165f75yzLYsaMGXHNNdfEeeedFxER9957b1RUVMSCBQti9OjR+R4JAAAACibvV7wXLlwYJ5xwQlxwwQXRq1evGDx4cNx11125/evWrYuGhoaora3NbSsrK4shQ4ZEfX39bs/Z0tISzc3NbW4AAADQEeQ9vP/0pz/FHXfcEUceeWQ89NBD8S//8i9x+eWXx5w5cyIioqGhISIiKioq2jyuoqIit+/d6urqoqysLHfr06dPvscGAACAJPIe3q2trXHcccfF9ddfH4MHD47x48fHpZdeGrNmzdrjc06dOjWamppytw0bNuRxYgAAAEgn7+Hdu3fvGDhwYJttAwYMiPXr10dERGVlZURENDY2tjmmsbExt+/dSkpKorS0tM0NAAAAOoK8h/fJJ58cq1evbrNtzZo1cfjhh0fEXz9orbKyMpYsWZLb39zcHMuXL4+ampp8jwMAAAAFlfdPNZ80aVKcdNJJcf3118eFF14YTz31VNx5551x5513RkREUVFRXHHFFfG9730vjjzyyOjXr19MmzYtqqqq4vzzz8/3OAAAAFBQeQ/vE088MebPnx9Tp06N73znO9GvX7+YMWNGjBkzJnfMVVddFdu2bYvx48fHli1bYujQobF48eLo2rVrvscBAACAgirKsiwr9BDt1dzcHGVlZdHU1OT93sAH0nfKokKPAEAH8PL0EYUeAegg2tOleX+PNwAAAPD/CG8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABIqLjQAwAAwN6i75RFhR4huZenjyj0CLDfccUbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsnDe/r06VFUVBRXXHFFbtv27dtjwoQJ0bNnzzjooINi1KhR0djYmHoUAAAA+MglDe+nn346fvjDH8anPvWpNtsnTZoUDzzwQNx3333x2GOPxcaNG2PkyJEpRwEAAICCSBbeW7dujTFjxsRdd90VhxxySG57U1NT3H333XHzzTfH6aefHscff3zMnj07nnjiiXjyySdTjQMAAAAFkSy8J0yYECNGjIja2to221euXBk7duxos71///5RXV0d9fX1uz1XS0tLNDc3t7kBAABAR1Cc4qQ//elPY9WqVfH000/vsq+hoSG6dOkSPXr0aLO9oqIiGhoadnu+urq6uO6661KMCgAAAEnl/Yr3hg0b4utf/3rMnTs3unbtmpdzTp06NZqamnK3DRs25OW8AAAAkFrew3vlypWxadOmOO6446K4uDiKi4vjsccei1tvvTWKi4ujoqIi3nrrrdiyZUubxzU2NkZlZeVuz1lSUhKlpaVtbgAAANAR5P2l5meccUb813/9V5ttY8eOjf79+8fVV18dffr0ic6dO8eSJUti1KhRERGxevXqWL9+fdTU1OR7HAAAACiovIf3wQcfHJ/85CfbbOvevXv07Nkzt33cuHExefLkKC8vj9LS0rjsssuipqYmPvOZz+R7HAAAACioJB+u9n5uueWW6NSpU4waNSpaWlpi2LBhcfvttxdiFAAAAEiqKMuyrNBDtFdzc3OUlZVFU1OT93sDH0jfKYsKPQIA7BVenj6i0CPAPqE9XZrse7wBAAAA4Q0AAABJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISKCz0AUHh9pywq9AgAALDPcsUbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISKCz0AdAR9pywq9AgAAEAH5Yo3AAAAJCS8AQAAICHhDQAAAAkJbwAAAEhIeAMAAEBCwhsAAAASEt4AAACQUN7Du66uLk488cQ4+OCDo1evXnH++efH6tWr2xyzffv2mDBhQvTs2TMOOuigGDVqVDQ2NuZ7FAAAACi4vIf3Y489FhMmTIgnn3wyHn744dixY0eceeaZsW3bttwxkyZNigceeCDuu+++eOyxx2Ljxo0xcuTIfI8CAAAABVec7xMuXry4zf177rknevXqFStXrozPfe5z0dTUFHfffXfMmzcvTj/99IiImD17dgwYMCCefPLJ+MxnPpPvkQAAAKBgkr/Hu6mpKSIiysvLIyJi5cqVsWPHjqitrc0d079//6iuro76+vrdnqOlpSWam5vb3AAAAKAjyPsV7/+rtbU1rrjiijj55JPjk5/8ZERENDQ0RJcuXaJHjx5tjq2oqIiGhobdnqeuri6uu+66lKMCAMB+oe+URYUeIamXp48o9Aiwi6RXvCdMmBDPPfdc/PSnP/1Q55k6dWo0NTXlbhs2bMjThAAAAJBWsiveEydOjAcffDCWLVsWhx12WG57ZWVlvPXWW7Fly5Y2V70bGxujsrJyt+cqKSmJkpKSVKMCAABAMnm/4p1lWUycODHmz58fS5cujX79+rXZf/zxx0fnzp1jyZIluW2rV6+O9evXR01NTb7HAQAAgILK+xXvCRMmxLx58+KXv/xlHHzwwbn3bZeVlUW3bt2irKwsxo0bF5MnT47y8vIoLS2Nyy67LGpqanyiOQAAAPucvIf3HXfcERERp556apvts2fPji9/+csREXHLLbdEp06dYtSoUdHS0hLDhg2L22+/Pd+jAAAAQMHlPbyzLHvfY7p27RozZ86MmTNn5vuPBwAAgL1K0q8TAwAA+Cjt61+XFuEr0zqipF8nBgAAAPs74Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACCh4kIPQMfXd8qiQo8AAACw13LFGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJCQ8AYAAICEhDcAAAAkJLwBAAAgoeJCD7A/6DtlUaFHAAAAoEBc8QYAAICEhDcAAAAkJLwBAAAgIeENAAAACQlvAAAASEh4AwAAQELCGwAAABIS3gAAAJBQcaEHAAAA4IPrO2VRoUdI7uXpIwo9Ql654g0AAAAJFTS8Z86cGX379o2uXbvGkCFD4qmnnirkOAAAAJB3BQvvn/3sZzF58uT41re+FatWrYpjjjkmhg0bFps2bSrUSAAAAJB3BQvvm2++OS699NIYO3ZsDBw4MGbNmhUHHnhg/OhHPyrUSAAAAJB3BflwtbfeeitWrlwZU6dOzW3r1KlT1NbWRn19/S7Ht7S0REtLS+5+U1NTREQ0NzenHzYPWlveKPQIAAAAHUZHaL13Zsyy7H2PLUh4/+///m/s3LkzKioq2myvqKiI//7v/97l+Lq6urjuuut22d6nT59kMwIAAFAYZTMKPcEH9/rrr0dZWdl7HtMhvk5s6tSpMXny5Nz91tbW2Lx5c/Ts2TOKiopy25ubm6NPnz6xYcOGKC0tLcSofASs877PGu8frPO+zxrv+6zx/sE67/us8Z7Jsixef/31qKqqet9jCxLehx56aBxwwAHR2NjYZntjY2NUVlbucnxJSUmUlJS02dajR4+/ef7S0lL/wuwHrPO+zxrvH6zzvs8a7/us8f7BOu/7rHH7vd+V7ncU5MPVunTpEscff3wsWbIkt621tTWWLFkSNTU1hRgJAAAAkijYS80nT54cF198cZxwwgnx6U9/OmbMmBHbtm2LsWPHFmokAAAAyLuChfcXvvCF+POf/xzXXnttNDQ0xLHHHhuLFy/e5QPX2qOkpCS+9a1v7fKydPYt1nnfZ433D9Z532eN933WeP9gnfd91ji9ouyDfPY5AAAAsEcK8h5vAAAA2F8IbwAAAEhIeAMAAEBCwhsAAAAS6pDh/eqrr8YXv/jF6NmzZ3Tr1i2OPvroWLFiRZtjXnjhhTj33HOjrKwsunfvHieeeGKsX7++QBOzJ95vnbdu3RoTJ06Mww47LLp16xYDBw6MWbNmFXBi2qNv375RVFS0y23ChAkREbF9+/aYMGFC9OzZMw466KAYNWpUNDY2Fnhq2uu91nnz5s1x2WWXxVFHHRXdunWL6urquPzyy6OpqanQY9MO7/ez/I4sy2L48OFRVFQUCxYsKMyw7LEPss719fVx+umnR/fu3aO0tDQ+97nPxZtvvlnAqWmP91vjhoaGuOiii6KysjK6d+8exx13XPziF78o8NS0186dO2PatGnRr1+/6NatWxxxxBHx3e9+N/7v521nWRbXXntt9O7dO7p16xa1tbWxdu3aAk69byjY14ntqb/85S9x8sknx2mnnRa/+tWv4u/+7u9i7dq1ccghh+SOeemll2Lo0KExbty4uO6666K0tDT++Mc/RteuXQs4Oe3xQdZ58uTJsXTp0vjxj38cffv2jV//+tfxta99LaqqquLcc88t4PR8EE8//XTs3Lkzd/+5556Lf/iHf4gLLrggIiImTZoUixYtivvuuy/Kyspi4sSJMXLkyPjd735XqJHZA++1zhs3boyNGzfGTTfdFAMHDoxXXnklvvrVr8bGjRvjP//zPws4Ne3xfj/L75gxY0YUFRV91OORJ++3zvX19XHWWWfF1KlT47bbbovi4uL4/e9/H506dchrPPul91vjL33pS7Fly5ZYuHBhHHrooTFv3ry48MILY8WKFTF48OBCjU073XDDDXHHHXfEnDlzYtCgQbFixYoYO3ZslJWVxeWXXx4RETfeeGPceuutMWfOnOjXr19MmzYthg0bFs8//7ye+jCyDubqq6/Ohg4d+p7HfOELX8i++MUvfkQTkcIHWedBgwZl3/nOd9psO+6447JvfvObKUcjka9//evZEUcckbW2tmZbtmzJOnfunN133325/S+88EIWEVl9fX0Bp+TD+r/rvDs///nPsy5dumQ7duz4iCcjX3a3xs8880z2sY99LHvttdeyiMjmz59fuAHJi3ev85AhQ7JrrrmmwFORT+9e4+7du2f33ntvm2PKy8uzu+66qxDjsYdGjBiRXXLJJW22jRw5MhszZkyWZVnW2tqaVVZWZv/2b/+W279ly5aspKQk+8lPfvKRzrqv6XB/Dblw4cI44YQT4oILLohevXrF4MGD46677srtb21tjUWLFsUnPvGJGDZsWPTq1SuGDBniZW0dzPutc0TESSedFAsXLoxXX301siyLRx55JNasWRNnnnlmgaZmT7311lvx4x//OC655JIoKiqKlStXxo4dO6K2tjZ3TP/+/aO6ujrq6+sLOCkfxrvXeXeampqitLQ0ios73AuyiN2v8RtvvBH//M//HDNnzozKysoCT0g+vHudN23aFMuXL49evXrFSSedFBUVFXHKKafEb3/720KPyh7a3c/ySSedFD/72c9i8+bN0draGj/96U9j+/btceqppxZ2WNrlpJNOiiVLlsSaNWsiIuL3v/99/Pa3v43hw4dHRMS6deuioaGhze9gZWVlMWTIEL+DfViFLv/2KikpyUpKSrKpU6dmq1atyn74wx9mXbt2ze65554sy7Lc36YfeOCB2c0335w988wzWV1dXVZUVJQ9+uijBZ6eD+r91jnLsmz79u3Zl770pSwisuLi4qxLly7ZnDlzCjg1e+pnP/tZdsABB2SvvvpqlmVZNnfu3KxLly67HHfiiSdmV1111Uc9Hnny7nV+tz//+c9ZdXV19o1vfOMjnox82d0ajx8/Phs3blzufrji3eG9e53r6+uziMjKy8uzH/3oR9mqVauyK664IuvSpUu2Zs2aAk/Lntjdz/Jf/vKX7Mwzz8z93lVaWpo99NBDBZySPbFz587s6quvzoqKirLi4uKsqKgou/7663P7f/e732URkW3cuLHN4y644ILswgsv/KjH3ad0uEsKra2tccIJJ8T1118fERGDBw+O5557LmbNmhUXX3xxtLa2RkTEeeedF5MmTYqIiGOPPTaeeOKJmDVrVpxyyikFm50P7v3WOSLitttuiyeffDIWLlwYhx9+eCxbtiwmTJgQVVVVbf6Wjr3f3XffHcOHD4+qqqpCj0JC77XOzc3NMWLEiBg4cGB8+9vf/uiHIy/evcYLFy6MpUuXxjPPPFPgycind6/zO797feUrX4mxY8dGxF//v71kyZL40Y9+FHV1dQWblT2zu/9eT5s2LbZs2RK/+c1v4tBDD40FCxbEhRdeGI8//ngcffTRBZyW9vj5z38ec+fOjXnz5sWgQYPi2WefjSuuuCKqqqpyv2OTRocL7969e8fAgQPbbBswYEDuUxUPPfTQKC4u3u0xXvLUcbzfOr/55pvxjW98I+bPnx8jRoyIiIhPfepT8eyzz8ZNN90kvDuQV155JX7zm9/E/fffn9tWWVkZb731VmzZsiV69OiR297Y2Oilqh3U7tb5Ha+//nqcddZZcfDBB8f8+fOjc+fOBZiQD2t3a7x06dJ46aWX2vwcR0SMGjUqPvvZz8ajjz760Q7Jh7a7de7du3dExG7/v+0bZTqe3a3xSy+9FP/xH/8Rzz33XAwaNCgiIo455ph4/PHHY+bMmb5VpgO58sorY8qUKTF69OiIiDj66KPjlVdeibq6urj44otzv2c1NjbmfrbfuX/ssccWYuR9Rod7j/fJJ58cq1evbrNtzZo1cfjhh0dERJcuXeLEE098z2PY+73fOu/YsSN27Nixy6elHnDAAbm/eadjmD17dvTq1Sv3FygREccff3x07tw5lixZktu2evXqWL9+fdTU1BRiTD6k3a1zxF+vdJ955pnRpUuXWLhwoU9L7cB2t8ZTpkyJP/zhD/Hss8/mbhERt9xyS8yePbtAk/Jh7G6d+/btG1VVVX732kfsbo3feOONiAi/d+0D3njjjfdcx379+kVlZWWb38Gam5tj+fLlfgf7sAr9Wvf2euqpp7Li4uLs+9//frZ27dps7ty52YEHHpj9+Mc/zh1z//33Z507d87uvPPObO3atdltt92WHXDAAdnjjz9ewMlpjw+yzqeccko2aNCg7JFHHsn+9Kc/ZbNnz866du2a3X777QWcnPbYuXNnVl1dnV199dW77PvqV7+aVVdXZ0uXLs1WrFiR1dTUZDU1NQWYkg/rb61zU1NTNmTIkOzoo4/OXnzxxey1117L3d5+++0CTcueeK+f5XcL7/HusN5rnW+55ZastLQ0u++++7K1a9dm11xzTda1a9fsxRdfLMCk7Km/tcZvvfVW9vGPfzz77Gc/my1fvjx78cUXs5tuuikrKirKFi1aVKBp2RMXX3xx9rGPfSx78MEHs3Xr1mX3339/duihh7b5DJ3p06dnPXr0yH75y19mf/jDH7Lzzjsv69evX/bmm28WcPKOr8OFd5Zl2QMPPJB98pOfzEpKSrL+/ftnd9555y7H3H333dnHP/7xrGvXrtkxxxyTLViwoACT8mG83zq/9tpr2Ze//OWsqqoq69q1a3bUUUdlP/jBD/7m1xSx93nooYeyiMhWr169y74333wz+9rXvpYdcsgh2YEHHph9/vOfz1577bUCTMmH9bfW+ZFHHskiYre3devWFWZY9sh7/Sy/m/DuuN5vnevq6rLDDjssO/DAA7OamhoXPDqg91rjNWvWZCNHjsx69eqVHXjggdmnPvWpXb5ejL1fc3Nz9vWvfz2rrq7Ounbtmv393/999s1vfjNraWnJHdPa2ppNmzYtq6ioyEpKSrIzzjjjA/33nfdWlGVZVpBL7QAAALAf6HDv8QYAAICORHgDAABAQsIbAAAAEhLeAAAAkJDwBgAAgISENwAAACQkvAEAACAh4Q0AAAAJCW8AAABISHgDAABAQsIbAAAAEhLeAAAAkND/BxTlJ5sThwyCAAAAAElFTkSuQmCC\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.figure(figsize=(10,6))\n", + "plt.hist(generated, bins=15)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 607 + }, + "id": "GVs0ZaObIUW_", + "outputId": "a4782bf2-20de-4c82-e78b-b31457ecd3dd" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.figure(figsize=(10,6))\n", + "plt.hist(np.random.normal(0,1,50000), bins=300)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zHP_chikIUXA" + }, + "source": [ + "Since most values in real life are normally distributed, we should not use a uniform random number generator to generate sample data. Here is what happens if we try to generate weights with a uniform distribution (generated by `np.random.rand`):" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 607 + }, + "id": "gMGaiMfAIUXA", + "outputId": "a0fe0623-07db-49f2-dc36-c36e7cc3e913" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "wrong_sample = np.random.rand(1000)*2*std+mean-std\n", + "plt.figure(figsize=(10,6))\n", + "plt.hist(wrong_sample)\n", + "plt.tight_layout()\n", + "plt.show()" ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uNEzv4urIUXA" + }, + "source": [ + "## Confidence Intervals\n", + "\n", + "Let's now calculate confidence intervals for the weights and heights of baseball players. We will use the code [from this stackoverflow discussion](https://stackoverflow.com/questions/15033511/compute-a-confidence-interval-from-sample-data):" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "P7h0LFCsIUXA", + "outputId": "7346caf5-0f58-4130-e0de-7dc9c67ab99e" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "p=0.85, mean = 201.73 ± 0.94\n", + "p=0.90, mean = 201.73 ± 1.08\n", + "p=0.95, mean = 201.73 ± 1.28\n" + ] + } + ], + "source": [ + "import scipy.stats\n", + "\n", + "def mean_confidence_interval(data, confidence=0.95):\n", + " a = 1.0 * np.array(data)\n", + " n = len(a)\n", + " m, se = np.mean(a), scipy.stats.sem(a)\n", + " h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1)\n", + " return m, h\n", + "\n", + "for p in [0.85, 0.9, 0.95]:\n", + " m, h = mean_confidence_interval(df['Weight'].fillna(method='pad'),p)\n", + " print(f\"p={p:.2f}, mean = {m:.2f} ± {h:.2f}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zMAi2cz6IUXB" + }, + "source": [ + "## Hypothesis Testing\n", + "\n", + "Let's explore different roles in our baseball players dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 363 + }, + "id": "1fsnjUkFIUXC", + "outputId": "9359d40f-c575-461d-e678-0f6ea1ef2f63" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Height Weight Count\n", + "Role \n", + "Catcher 72.723684 204.328947 76\n", + "Designated_Hitter 74.222222 220.888889 18\n", + "First_Baseman 74.000000 213.109091 55\n", + "Outfielder 73.010309 199.113402 194\n", + "Relief_Pitcher 74.374603 203.517460 315\n", + "Second_Baseman 71.362069 184.344828 58\n", + "Shortstop 71.903846 182.923077 52\n", + "Starting_Pitcher 74.719457 205.163636 221\n", + "Third_Baseman 73.044444 200.955556 45" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
HeightWeightCount
Role
Catcher72.723684204.32894776
Designated_Hitter74.222222220.88888918
First_Baseman74.000000213.10909155
Outfielder73.010309199.113402194
Relief_Pitcher74.374603203.517460315
Second_Baseman71.362069184.34482858
Shortstop71.903846182.92307752
Starting_Pitcher74.719457205.163636221
Third_Baseman73.044444200.95555645
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 17 + } + ], + "source": [ + "df.groupby('Role').agg({ 'Height' : 'mean', 'Weight' : 'mean', 'Age' : 'count'}).rename(columns={ 'Age' : 'Count'})" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "O3H--SzfIUXC" + }, + "source": [ + "Let's test the hypothesis that First Basemen are taller than Second Basemen. The simplest way to do this is to test the confidence intervals:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "XsO80SsfIUXC", + "outputId": "424f53ff-729b-4f21-addf-ecca08d4fb04" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Conf=0.85, 1st basemen height: 73.62..74.38, 2nd basemen height: 71.04..71.69\n", + "Conf=0.90, 1st basemen height: 73.56..74.44, 2nd basemen height: 70.99..71.73\n", + "Conf=0.95, 1st basemen height: 73.47..74.53, 2nd basemen height: 70.92..71.81\n" + ] + } + ], + "source": [ + "for p in [0.85,0.9,0.95]:\n", + " m1, h1 = mean_confidence_interval(df.loc[df['Role']=='First_Baseman',['Height']],p)\n", + " m2, h2 = mean_confidence_interval(df.loc[df['Role']=='Second_Baseman',['Height']],p)\n", + " print(f'Conf={p:.2f}, 1st basemen height: {m1-h1[0]:.2f}..{m1+h1[0]:.2f}, 2nd basemen height: {m2-h2[0]:.2f}..{m2+h2[0]:.2f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b8DU4ZUGIUXD" + }, + "source": [ + "We can see that the intervals do not overlap.\n", + "\n", + "A statistically more correct way to prove the hypothesis is to use a **Student t-test**:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Qwra5Iz7IUXD", + "outputId": "2d5f359b-5707-4ce0-9c95-ef8d7ca8b791" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "T-value = 7.65\n", + "P-value: 9.137321189738959e-12\n" + ] + } + ], + "source": [ + "from scipy.stats import ttest_ind\n", + "\n", + "tval, pval = ttest_ind(df.loc[df['Role']=='First_Baseman',['Height']], df.loc[df['Role']=='Second_Baseman',['Height']],equal_var=False)\n", + "print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "brelF-6LIUXE" + }, + "source": [ + "The two values returned by the `ttest_ind` function are:\n", + "* p-value can be considered as the probability of two distributions having the same mean. In our case, it is very low, meaning that there is strong evidence supporting that first basemen are taller.\n", + "* t-value is the intermediate value of normalized mean difference that is used in the t-test, and it is compared against a threshold value for a given confidence value." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CtPClLpDIUXE" + }, + "source": [ + "## Simulating a Normal Distribution with the Central Limit Theorem\n", + "\n", + "The pseudo-random generator in Python is designed to give us a uniform distribution. If we want to create a generator for normal distribution, we can use the central limit theorem. To get a normally distributed value we will just compute a mean of a uniform-generated sample." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 606 + }, + "id": "0vSxU_l1IUXF", + "outputId": "f4ae9bbd-95da-4f70-ca9f-740e8e382bbd" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "def normal_random(sample_size=100):\n", + " sample = [random.uniform(0,1) for _ in range(sample_size) ]\n", + " return sum(sample)/sample_size\n", + "\n", + "sample = [normal_random() for _ in range(100)]\n", + "plt.figure(figsize=(10,6))\n", + "plt.hist(sample)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rdUV9KxXIUXF" + }, + "source": [ + "## Correlation and Evil Baseball Corp\n", + "\n", + "Correlation allows us to find relations between data sequences. In our toy example, let's pretend there is an evil baseball corporation that pays its players according to their height - the taller the player is, the more money he/she gets. Suppose there is a base salary of $1000, and an additional bonus from $0 to $100, depending on height. We will take the real players from MLB, and compute their imaginary salaries:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "oNlWGq_VIUXF", + "outputId": "4bc2c9a7-cf4b-4d0b-e87c-dfe7a7fd08e9" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "[(74, 1075.2469071629068), (74, 1075.2469071629068), (72, 1053.7477908306478), (72, 1053.7477908306478), (73, 1064.4973489967772), (69, 1021.4991163322591), (69, 1021.4991163322591), (71, 1042.9982326645181), (76, 1096.746023495166), (71, 1042.9982326645181)]\n" + ] + } + ], + "source": [ + "heights = df['Height']\n", + "salaries = 1000+(heights-heights.min())/(heights.max()-heights.mean())*100\n", + "print(list(zip(heights, salaries))[:10])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F5FZaXFAIUXG" + }, + "source": [ + "Let's now compute covariance and correlation of those sequences. `np.cov` will give us a so-called **covariance matrix**, which is an extension of covariance to multiple variables. The element $M_{ij}$ of the covariance matrix $M$ is a correlation between input variables $X_i$ and $X_j$, and diagonal values $M_{ii}$ is the variance of $X_{i}$. Similarly, `np.corrcoef` will give us the **correlation matrix**." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Oux7B-KfIUXG", + "outputId": "f115362b-63d9-48b8-caee-f348c86bb40a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Covariance matrix:\n", + "[[ 5.31679808 57.15323023]\n", + " [ 57.15323023 614.37197275]]\n", + "Covariance = 57.1532302305447\n", + "Correlation = 1.0\n" + ] + } + ], + "source": [ + "print(f\"Covariance matrix:\\n{np.cov(heights, salaries)}\")\n", + "print(f\"Covariance = {np.cov(heights, salaries)[0,1]}\")\n", + "print(f\"Correlation = {np.corrcoef(heights, salaries)[0,1]}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JV8ZTH1rIUXG" + }, + "source": [ + "A correlation equal to 1 means that there is a strong **linear relation** between two variables. We can visually see the linear relation by plotting one value against the other:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 607 + }, + "id": "Sz9DY39RIUXH", + "outputId": "ef9fbd38-1b0f-41d0-f7c6-d385fcffe94f" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAA90AAAJOCAYAAACqS2TfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA8n0lEQVR4nO3de5zVdZ348ffBkeGiMwrIDBO3WWs1TNGscAgtkgWR1ejm+jOVjBVL1NQuSoliViDYZW1R01WxzLXclMU085aLbSNe1llTCyVn8DIMVsgcAbk5398fLCcPM8AMzHfODPN8Ph7n8eh8v585vM9+l5oX3+8530ySJEkAAAAA7a5HoQcAAACAPZXoBgAAgJSIbgAAAEiJ6AYAAICUiG4AAABIiegGAACAlIhuAAAASInoBgAAgJQUFXqAtDQ1NUV9fX3su+++kclkCj0OAAAAe5AkSeLNN9+MioqK6NFj++ez99jorq+vjyFDhhR6DAAAAPZgr7zySgwePHi7+/fY6N53330jYsv/AUpKSgo8DQAAAHuSbDYbQ4YMybXn9uyx0b31kvKSkhLRDQAAQCp29nFmX6QGAAAAKRHdAAAAkBLRDQAAACkR3QAAAJAS0Q0AAAApEd0AAACQEtENAAAAKRHdAAAAkBLRDQAAAClpc3QvXrw4TjjhhKioqIhMJhMLFy7M23/nnXfG+PHjo3///pHJZKKmpiZvf11dXWQymRYfd9xxR25dS/tvv/32XXqTAAAAUAhtju61a9fGyJEjY/78+dvdP2bMmLjyyitb3D9kyJBYsWJF3uPyyy+PffbZJyZOnJi39uabb85bN3ny5LaOCwAAAAVT1NYfmDhxYrM4fqfTTjstIrac0W7JXnvtFeXl5Xnb7rrrrjjppJNin332ydu+3377NVsLAAAAXUXBP9P91FNPRU1NTUydOrXZvunTp8eAAQPiQx/6UNx0002RJMl2X2fDhg2RzWbzHgAAAFBIbT7T3d5uvPHGeO973xujR4/O2/7Nb34zPvaxj0WfPn3i/vvvj7PPPjvWrFkT5513XouvM3v27Lj88ss7YmQAAABolYJG91tvvRW33XZbzJw5s9m+d2474ogjYu3atTFv3rztRveMGTPiwgsvzD3PZrMxZMiQ9h8aAAAAWqmgl5f/x3/8R6xbty5OP/30na4dNWpUvPrqq7Fhw4YW9xcXF0dJSUneAwAAAAqpoNF94403xoknnhgHHHDATtfW1NTE/vvvH8XFxR0wGQAAAOy+Nl9evmbNmli2bFnueW1tbdTU1ES/fv1i6NChsWrVqnj55Zejvr4+IiKWLl0aERHl5eV530S+bNmyWLx4cdx7773N/oy77747Vq5cGUcddVT06tUrHnjggfjOd74TX/nKV9r8BgEAAOga3m5K4vHaVfH6m+tj4L694kOV/WKvHplCj7VbMsmOvhK8BY888kiMHTu22fYpU6bEggULYsGCBXHGGWc023/ZZZfFrFmzcs+//vWvx6233hp1dXXRo0f+Cff77rsvZsyYEcuWLYskSeLd7353fPGLX4wzzzyz2drtyWazUVpaGo2NjS41BwAA6OTue3ZFXH7387GicX1u26DSXnHZCSPiuPcNKuBkLWttc7Y5ursK0Q0AANA13Pfsivjirf8T28bp1nPc1576/k4X3q1tzoLfpxsAAIDu6+2mJC6/+/lmwR0RuW2X3/18vN3UNc8Xi24AAAAK5vHaVXmXlG8riYgVjevj8dpVHTdUOxLdAAAAFMzrb24/uHdlXWcjugEAACiYgfv2atd1nY3oBgAAoGA+VNkvBpX2iu3dGCwTW77F/EOV/TpyrHYjugEAACiYvXpk4rITRkRENAvvrc8vO2FEl71ft+gGAACgoI5736C49tT3R3lp/iXk5aW9OuXtwtqiqNADAAAAwHHvGxT/MKI8Hq9dFa+/uT4G7rvlkvKueoZ7K9ENAABAp7BXj0xUHdi/0GO0K5eXAwAAQEpENwAAAKREdAMAAEBKRDcAAACkRHQDAABASkQ3AAAApER0AwAAQEpENwAAAKREdAMAAEBKRDcAAACkRHQDAABASkQ3AAAApER0AwAAQEpENwAAAKREdAMAAEBKRDcAAACkRHQDAABASkQ3AAAApER0AwAAQEqKCj0AAAAAbbdxc1P8pLoulq9aF8P69YnTqoZHzyLnVTsb0Q0AANDFzL73+bjh0dpoSv627dv3/iHOPLoyZhw/onCD0YzoBgAA6EJm3/t8/GhxbbPtTUnktgvvzsO1BwAAAF3Exs1NccOjzYP7nW54tDY2bm7qoInYGdENAADQRfykui7vkvKWNCVb1tE5iG4AAIAuYvmqde26jvSJbgAAgC5iWL8+7bqO9IluAACALuK0quHRI7PjNT0yW9bROYhuAACALqJnUY848+jKHa458+hK9+vuRNwyDAAAoAvZejuwbe/T3SMT7tPdCWWSJNnJd991TdlsNkpLS6OxsTFKSkoKPQ4AAEC72ri5KX5SXRfLV62LYf36xGlVw53h7kCtbU5nugEAALqgnkU9YurRf1foMdgJ/wwCAAAAKRHdAAAAkBLRDQAAACkR3QAAAJAS0Q0AAAApEd0AAACQEtENAAAAKRHdAAAAkBLRDQAAACkR3QAAAJAS0Q0AAAApEd0AAACQEtENAAAAKRHdAAAAkBLRDQAAACkR3QAAAJAS0Q0AAAApEd0AAACQEtENAAAAKSkq9AAAAAAdYePmpvhJdV0sX7UuhvXrE6dVDY+eRc5Dki7RDQAA7PFm3/t83PBobTQlf9v27Xv/EGceXRkzjh9RuMHY47X5n3UWL14cJ5xwQlRUVEQmk4mFCxfm7b/zzjtj/Pjx0b9//8hkMlFTU9PsNT760Y9GJpPJe3zhC1/IW/Pyyy/HpEmTok+fPjFw4MD46le/Gps3b27ruAAAQDc3+97n40eL84M7IqIpifjR4tqYfe/zhRmMbqHN0b127doYOXJkzJ8/f7v7x4wZE1deeeUOX+fMM8+MFStW5B5z587N7Xv77bdj0qRJsXHjxvjd734Xt9xySyxYsCAuvfTSto4LAAB0Yxs3N8UNj9bucM0Nj9bGxs1NHTQR3U2bLy+fOHFiTJw4cbv7TzvttIiIqKur2+Hr9OnTJ8rLy1vcd//998fzzz8fDz74YJSVlcXhhx8eV1xxRVx00UUxa9as6NmzZ1vHBgAAuqGfVNc1O8O9raZky7qpR/9dxwxFt1Kwbw346U9/GgMGDIj3ve99MWPGjFi3bl1uX3V1dRx66KFRVlaW2zZhwoTIZrPx3HPPtfh6GzZsiGw2m/cAAAC6t+Wr1u18URvWQVsV5IvUTjnllBg2bFhUVFTEM888ExdddFEsXbo07rzzzoiIaGhoyAvuiMg9b2hoaPE1Z8+eHZdffnm6gwMAAF3KsH592nUdtFVBonvatGm5/3zooYfGoEGD4thjj40//elPceCBB+7Sa86YMSMuvPDC3PNsNhtDhgzZ7VkBAICu67Sq4fHte/+ww0vMe2S2rIM0dIqb0o0aNSoiIpYtWxYREeXl5bFy5cq8NVufb+9z4MXFxVFSUpL3AAAAureeRT3izKMrd7jmzKMr3a+b1HSK/8/aeluxQYMGRUREVVVV/P73v4/XX389t+aBBx6IkpKSGDHCPfQAAIDWm3H8iDjrmMrokcnf3iMTcdYx7tNNutp8efmaNWtyZ6QjImpra6Ompib69esXQ4cOjVWrVsXLL78c9fX1ERGxdOnSiNhyhrq8vDz+9Kc/xW233RbHH3989O/fP5555pm44IIL4phjjonDDjssIiLGjx8fI0aMiNNOOy3mzp0bDQ0Ncckll8T06dOjuLi4Pd43AADQjcw4fkR8efzB8ZPquli+al0M69cnTqsa7gw3qcskSbKTL9DP98gjj8TYsWObbZ8yZUosWLAgFixYEGeccUaz/ZdddlnMmjUrXnnllTj11FPj2WefjbVr18aQIUPiE5/4RFxyySV5l4QvX748vvjFL8YjjzwSffv2jSlTpsScOXOiqKh1/06QzWajtLQ0GhsbXWoOAABAu2ptc7Y5ursK0Q0AAEBaWtucrqUAAACAlIhuAAAASInoBgAAgJSIbgAAAEiJ6AYAAICUiG4AAABIiegGAACAlIhuAAAASInoBgAAgJSIbgAAAEiJ6AYAAICUiG4AAABIiegGAACAlIhuAAAASInoBgAAgJSIbgAAAEiJ6AYAAICUiG4AAABISVGhBwAAADqfxnWb4vMLHo/6xvVRUdorbvrch6K0z96FHgu6HNENAADk+ci8h2P5X9/KPV/RuD5GfvP+GNa/d/zXVz9WwMmg63F5OQAAkLNtcL/T8r++FR+Z93AHTwRdm+gGAAAiYssl5dsL7q2W//WtaFy3qYMmgq5PdAMAABER8fkFj7frOkB0AwAA/6e+cX27rgNENwAA8H8qSnu16zpAdAMAAP/nps99qF3XAaIbAAD4P6V99o5h/XvvcM2w/r3drxvaQHQDAAA5//XVj203vN2nG9quqNADAAAAnct/ffVj0bhuU3x+weNR37g+Kkp7xU2f+5Az3LALRDcAANBMaZ+94xdnf7jQY0CX5/JyAAAASInoBgAAgJSIbgAAAEiJ6AYAAICUiG4AAABIiegGAACAlIhuAAAASInoBgAAgJSIbgAAAEiJ6AYAAICUiG4AAABIiegGAACAlIhuAAAASInoBgAAgJSIbgAAAEiJ6AYAAICUiG4AAABIiegGAACAlIhuAAAASElRoQcAAIA9wfOvZuMf//XRaIotZ7Z+ec7RMWJwSaHHAgpMdAMAwG4afvE9ec+bIuL4f300IiLq5kwqwERAZ+HycgAA2A3bBndb9wN7NtENAAC76PlXs+26DtjziG4AANhF//h/l5C31zpgzyO6AQBgFzW18zpgzyO6AQBgF7X2l2m/dEP35e8/AADsol+ec3S7rgP2PKIbAAB2UWvvw+1+3dB9iW4AANgNO7sPt/t0Q/dWVOgBAACgq6ubMymefzUb//ivj0ZTbDmz9ctzjnaGGxDdAADQHkYMLomXnNUGttHmy8sXL14cJ5xwQlRUVEQmk4mFCxfm7b/zzjtj/Pjx0b9//8hkMlFTU5O3f9WqVXHuuefGQQcdFL17946hQ4fGeeedF42NjXnrMplMs8ftt9/e5jcIAAAAhdLm6F67dm2MHDky5s+fv939Y8aMiSuvvLLF/fX19VFfXx9XXXVVPPvss7FgwYK47777YurUqc3W3nzzzbFixYrcY/LkyW0dFwAAAAqmzZeXT5w4MSZOnLjd/aeddlpERNTV1bW4/33ve1/84he/yD0/8MAD49vf/naceuqpsXnz5igq+ttI++23X5SXl7d1RAAAAOgUOsW3lzc2NkZJSUlecEdETJ8+PQYMGBAf+tCH4qabbookSQo0IQAAALRdwb9I7S9/+UtcccUVMW3atLzt3/zmN+NjH/tY9OnTJ+6///44++yzY82aNXHeeee1+DobNmyIDRs25J5ns9lU5wYAAICdKWh0Z7PZmDRpUowYMSJmzZqVt2/mzJm5/3zEEUfE2rVrY968eduN7tmzZ8fll1+e5rgAAADQJgW7vPzNN9+M4447Lvbdd9+46667Yu+9997h+lGjRsWrr76adzb7nWbMmBGNjY25xyuvvJLG2AAAANBqBTnTnc1mY8KECVFcXByLFi2KXr167fRnampqYv/994/i4uIW9xcXF293HwAAABRCm6N7zZo1sWzZstzz2traqKmpiX79+sXQoUNj1apV8fLLL0d9fX1ERCxdujQiIsrLy6O8vDyy2WyMHz8+1q1bF7feemtks9nc568POOCA2GuvveLuu++OlStXxlFHHRW9evWKBx54IL7zne/EV77ylfZ4zwAAANAhMkkbvxL8kUceibFjxzbbPmXKlFiwYEEsWLAgzjjjjGb7L7vsspg1a9Z2fz5iS8APHz487rvvvpgxY0YsW7YskiSJd7/73fHFL34xzjzzzOjRo3VXxGez2SgtLc19MzoAAAC0l9Y2Z5uju6sQ3QAAAKSltc3ZKe7TDQAAAHsi0Q0AAAApEd0AAACQEtENAAAAKRHdAAAAkBLRDQAAACkR3QAAAJAS0Q0AAAApKSr0AAAAdD/3PfFafOEXNbnn133q8Djug+8q3EAAKckkSZIUeog0ZLPZKC0tjcbGxigpKSn0OAAA/J/hF9+z3X11cyZ14CQAu661zenycgAAOsyOgrs1+wG6GtENAECHuO+J19p1HUBXILoBAOgQ7/wMd3usA+gKRDcAAACkRHQDAABASkQ3AAAd4rpPHd6u6wC6AtENAECHaO19uN2vG9iTiG4AADrMzu7D7T7dwJ5GdAMA0KHq5kxqdgn5dZ86XHADe6SiQg8AAED3c9wH3xV1LiMHugFnugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlRYUeAACAnbvkF9Vx6xOrcs9P/WC/+Nanqgo4EQCt4Uw3AEAnN/zie/KCOyLi1idWxfCL7ynQRAC0lugGAOjEdhbWwhugcxPdAACd1CW/qG7XdQB0PNENANBJbXtJ+e6uA6DjiW4AAABIiegGAACAlIhuAIBO6tQP9mvXdQB0PNENANBJtfY+3O7XDdB5iW4AgE6sbs6k3doPQGGJbgCATq5uzqRml5Cf+sF+ghugC8gkSZIUeog0ZLPZKC0tjcbGxigpKSn0OAAAAOxBWtucznQDAABASkQ3AAAApER0AwAAQEpENwAAAKREdAMAAEBKRDcAAACkRHQDAABASkQ3AAAApER0AwAAQEpENwAAAKSkzdG9ePHiOOGEE6KioiIymUwsXLgwb/+dd94Z48ePj/79+0cmk4mamppmr7F+/fqYPn169O/fP/bZZ5/41Kc+FStXrsxb8/LLL8ekSZOiT58+MXDgwPjqV78amzdvbuu4AAAAUDBtju61a9fGyJEjY/78+dvdP2bMmLjyyiu3+xoXXHBB3H333XHHHXfEf/3Xf0V9fX188pOfzO1/++23Y9KkSbFx48b43e9+F7fcckssWLAgLr300raOCwAAAAWTSZIk2eUfzmTirrvuismTJzfbV1dXF5WVlfH000/H4Ycfntve2NgYBxxwQNx2223x6U9/OiIi/vjHP8Z73/veqK6ujqOOOip+9atfxT/+4z9GfX19lJWVRUTEddddFxdddFH8+c9/jp49e+50tmw2G6WlpdHY2BglJSW7+hYBAACgmdY2Z4d/pvupp56KTZs2xbhx43LbDj744Bg6dGhUV1dHRER1dXUceuihueCOiJgwYUJks9l47rnnOnpkAAAA2CVFHf0HNjQ0RM+ePWO//fbL215WVhYNDQ25Ne8M7q37t+5ryYYNG2LDhg2559lsth2nBgAAgLbbY769fPbs2VFaWpp7DBkypNAjAQAA0M11eHSXl5fHxo0bY/Xq1XnbV65cGeXl5bk1236b+dbnW9dsa8aMGdHY2Jh7vPLKK+0/PAAAALRBh0f3kUceGXvvvXc89NBDuW1Lly6Nl19+OaqqqiIioqqqKn7/+9/H66+/nlvzwAMPRElJSYwYMaLF1y0uLo6SkpK8BwAAABRSmz/TvWbNmli2bFnueW1tbdTU1ES/fv1i6NChsWrVqnj55Zejvr4+IrYEdcSWM9Tl5eVRWloaU6dOjQsvvDD69esXJSUlce6550ZVVVUcddRRERExfvz4GDFiRJx22mkxd+7caGhoiEsuuSSmT58excXF7fG+AQAAIHVtvmXYI488EmPHjm22fcqUKbFgwYJYsGBBnHHGGc32X3bZZTFr1qyIiFi/fn18+ctfjn//93+PDRs2xIQJE+Kaa67Ju3R8+fLl8cUvfjEeeeSR6Nu3b0yZMiXmzJkTRUWt+3cCtwwDgO7r+Mvuief/9v2qMaI44t7LJxVuIAD2OK1tzt26T3dnJroBoHsafvE9291XN0d4A9A+Ou19ugEA0rKj4G7NfgBob6IbANgjHH9Z64K6tesAoD2IbgBgj/DOz3C3xzoAaA+iGwAAAFIiugEAACAlohsA2COMKG7fdQDQHkQ3ALBHaO19uN2vG4COJLoBgD3Gzu7D7T7dAHQ00Q0A7FHq5kxqdgn5iGLBDUBhFBV6AACA9uYScgA6C2e6AQAAICWiGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICVFhR4AACis4Rff02xb3ZxJBZgEAPY8znQDQDfWUnDvaDsA0DaiGwC6qZ2FtfAGgN0nugGgG2ptUAtvANg9ohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwC6odbeh9v9ugFg94huAOimdhbUghsAdp/oBoBubHthLbgBoH0UFXoAAKCwBDYApMeZbgAAAEiJ6AYAAICUiG4AAABIiegGAACAlIhuAAAASInoBgAAgJSIbgAAAEiJ6AYAAICUiG4AAABIiegGAACAlIhuAAAASInoBgAAgJSIbgAAAEiJ6AYAAICUiG4AAABIiegGAACAlIhuAAAASInoBgAAgJSIbgAAAEhJUaEHAICuZvjF9zTbVjdnUgEmAQA6uzaf6V68eHGccMIJUVFREZlMJhYuXJi3P0mSuPTSS2PQoEHRu3fvGDduXLz44ou5/Y888khkMpkWH0888URERNTV1bW4/7HHHtu9dwsAu6ml4N7RdgCge2tzdK9duzZGjhwZ8+fPb3H/3Llz4+qrr47rrrsulixZEn379o0JEybE+vXrIyJi9OjRsWLFirzHP//zP0dlZWV84AMfyHutBx98MG/dkUceuQtvEQDax87CWngDANtq8+XlEydOjIkTJ7a4L0mS+MEPfhCXXHJJfPzjH4+IiB//+MdRVlYWCxcujJNPPjl69uwZ5eXluZ/ZtGlT/Od//mece+65kclk8l6vf//+eWsBoFBaG9TDL77HpeYAQE67fpFabW1tNDQ0xLhx43LbSktLY9SoUVFdXd3izyxatCj++te/xhlnnNFs34knnhgDBw6MMWPGxKJFi9pzVAAAAEhdu36RWkNDQ0RElJWV5W0vKyvL7dvWjTfeGBMmTIjBgwfntu2zzz7x3e9+Nz784Q9Hjx494he/+EVMnjw5Fi5cGCeeeGKLr7Nhw4bYsGFD7nk2m93dtwMAAAC7paDfXv7qq6/Gr3/96/j5z3+et33AgAFx4YUX5p5/8IMfjPr6+pg3b952o3v27Nlx+eWXpzovAAAAtEW7Xl6+9fPXK1euzNu+cuXKFj+bffPNN0f//v23G9LvNGrUqFi2bNl298+YMSMaGxtzj1deeaWN0wMAAED7atforqysjPLy8njooYdy27LZbCxZsiSqqqry1iZJEjfffHOcfvrpsffee+/0tWtqamLQoEHb3V9cXBwlJSV5DwBoL639cjRfogYAvFObLy9fs2ZN3hnn2traqKmpiX79+sXQoUPj/PPPj29961vxnve8JyorK2PmzJlRUVERkydPznudhx9+OGpra+Of//mfm/0Zt9xyS/Ts2TOOOOKIiIi4884746abbop/+7d/a+u4ANBu6uZM2uG3mAtuAGBbbY7uJ598MsaOHZt7vvWz11OmTIkFCxbE1772tVi7dm1MmzYtVq9eHWPGjIn77rsvevXqlfc6N954Y4wePToOPvjgFv+cK664IpYvXx5FRUVx8MEHx89+9rP49Kc/3dZxAaBdbS+8BTcA0JJMkiRJoYdIQzabjdLS0mhsbHSpOQAAAO2qtc3Zrp/pBgAAAP5GdAMAAEBKRDcAAACkRHQDAABASkQ3AAAApER0AwAAQEpENwAAAKREdAMAAEBKRDcAAACkRHQDAABASkQ3AAAApER0AwAAQEpENwAAAKREdAMAAEBKRDcAAACkRHQDAABASkQ3AAAApER0AwAAQEpENwAAAKSkqNADALBnG37xPc221c2ZVIBJAAA6njPdAKSmpeDe0XYAgD2N6AYgFTsLa+ENAHQHohuAdtfaoBbeAMCeTnQDAABASkQ3AAAApER0AwAAQEpENwAAAKREdAPQ7lp7H2736wYA9nSiG4BU7CyoBTcA0B2IbgBSs72wFtwAQHdRVOgBANizCWwAoDtzphsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlohsAAABSUlToAQDIN/zie5ptq5szqQCTAACwu5zpBuhEWgruHW0HAKBzE90AncTOwlp4AwB0PaIboBNobVALbwCArkV0AwAAQEpENwAAAKREdAMAAEBKRDcAAACkRHQDdAKtvQ+3+3UDAHQtohugk9hZUAtuAICuR3QDdCLbC2vBDQDQNRUVegAA8glsAIA9R5vPdC9evDhOOOGEqKioiEwmEwsXLszbnyRJXHrppTFo0KDo3bt3jBs3Ll588cW8NcOHD49MJpP3mDNnTt6aZ555Jo4++ujo1atXDBkyJObOndv2dwcAAAAF1OboXrt2bYwcOTLmz5/f4v65c+fG1VdfHdddd10sWbIk+vbtGxMmTIj169fnrfvmN78ZK1asyD3OPffc3L5sNhvjx4+PYcOGxVNPPRXz5s2LWbNmxfXXX9/WcQEAAKBg2nx5+cSJE2PixIkt7kuSJH7wgx/EJZdcEh//+McjIuLHP/5xlJWVxcKFC+Pkk0/Ord13332jvLy8xdf56U9/Ghs3boybbropevbsGYccckjU1NTE9773vZg2bVpbRwYAAICCaNcvUqutrY2GhoYYN25cbltpaWmMGjUqqqur89bOmTMn+vfvH0cccUTMmzcvNm/enNtXXV0dxxxzTPTs2TO3bcKECbF06dJ44403WvyzN2zYENlsNu8BAAAAhdSuX6TW0NAQERFlZWV528vKynL7IiLOO++8eP/73x/9+vWL3/3udzFjxoxYsWJFfO9738u9TmVlZbPX2Lpv//33b/Znz549Oy6//PL2fDsAAACwWwry7eUXXnhh7j8fdthh0bNnzzjrrLNi9uzZUVxcvEuvOWPGjLzXzWazMWTIkN2eFQAAAHZVu15evvUz2itXrszbvnLlyu1+fjsiYtSoUbF58+aoq6vLvU5Lr/HOP2NbxcXFUVJSkvcAAACAQmrX6K6srIzy8vJ46KGHctuy2WwsWbIkqqqqtvtzNTU10aNHjxg4cGBERFRVVcXixYtj06ZNuTUPPPBAHHTQQS1eWg4AAACdUZsvL1+zZk0sW7Ys97y2tjZqamqiX79+MXTo0Dj//PPjW9/6VrznPe+JysrKmDlzZlRUVMTkyZMjYsuXpC1ZsiTGjh0b++67b1RXV8cFF1wQp556ai6oTznllLj88stj6tSpcdFFF8Wzzz4b//Iv/xLf//732+ddAwAAQAdoc3Q/+eSTMXbs2NzzrZ+jnjJlSixYsCC+9rWvxdq1a2PatGmxevXqGDNmTNx3333Rq1eviNhyGfjtt98es2bNig0bNkRlZWVccMEFeZ/HLi0tjfvvvz+mT58eRx55ZAwYMCAuvfRStwsDAACgS8kkSZIUeog0ZLPZKC0tjcbGRp/vBgAAoF21tjnb9TPdAAAAwN+IbgAAAEiJ6AYAAICUiG4AAABIiegGAACAlIhuAAAASInoBgAAgJSIbgAAAEhJUaEHANgdJ3/3nnjsz397ftQBEbd/eVLhBgIAgHdwphvosoZfnB/cERGP/XnLdgAA6AxEN9Al7SyshTcAAJ2B6Aa6nJO/27qgbu06AABIi+gGupxtLynf3XUAAJAW0Q0AAAApEd0AAACQEtENdDlHHdC+6wAAIC2iG+hyWnsfbvfrBgCg0EQ30CXVzdlxUO9sPwAAdATRDXRZdXMmNbuE/KgDBDcAAJ1HUaEHANgdLiEHAKAzc6YbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUlJU6AGAjvWdRU/G9b9bmXs+bXRZfP3EDxRwIgAA2HNlkiRJCj1EGrLZbJSWlkZjY2OUlJQUehzoFIZffM9299XNmdSBkwAAQNfW2uZ0eTl0EzsK7tbsBwAA2k50QzfwnUVPtus6AACgdUQ3dAPv/Ax3e6wDAABaR3QDAABASkQ3AAAApER0QzcwbXRZu64DAABaR3RDN9Da+3C7XzcAALQv0Q3dxM7uw+0+3QAA0P5EN3QjdXMmNbuEfNroMsENAAApySRJkhR6iDRks9koLS2NxsbGKCkpKfQ4AAAA7EFa25zOdAMAAEBKRDcAAACkRHQDAABASkQ3AAAApER0AwAAQEpENwAAAKREdAMAAEBKRDcAAACkRHQDAABASkQ3AAAApKTN0b148eI44YQToqKiIjKZTCxcuDBvf5Ikcemll8agQYOid+/eMW7cuHjxxRdz++vq6mLq1KlRWVkZvXv3jgMPPDAuu+yy2LhxY96aTCbT7PHYY4/t+jsFAACADtbm6F67dm2MHDky5s+f3+L+uXPnxtVXXx3XXXddLFmyJPr27RsTJkyI9evXR0TEH//4x2hqaoof/ehH8dxzz8X3v//9uO666+LrX/96s9d68MEHY8WKFbnHkUce2dZxAQAAoGAySZIku/zDmUzcddddMXny5IjYcpa7oqIivvzlL8dXvvKViIhobGyMsrKyWLBgQZx88sktvs68efPi2muvjZdeeikitpzprqysjKeffjoOP/zwXZotm81GaWlpNDY2RklJyS69BgAAALSktc3Zrp/prq2tjYaGhhg3blxuW2lpaYwaNSqqq6u3+3ONjY3Rr1+/ZttPPPHEGDhwYIwZMyYWLVq0wz97w4YNkc1m8x4AAABQSO0a3Q0NDRERUVZWlre9rKwst29by5Ytix/+8Idx1lln5bbts88+8d3vfjfuuOOOuOeee2LMmDExefLkHYb37Nmzo7S0NPcYMmRIO7wjAAAA2HVFhfzDX3vttTjuuOPiM5/5TJx55pm57QMGDIgLL7ww9/yDH/xg1NfXx7x58+LEE09s8bVmzJiR9zPZbFZ4AwAAUFDteqa7vLw8IiJWrlyZt33lypW5fVvV19fH2LFjY/To0XH99dfv9LVHjRoVy5Yt2+7+4uLiKCkpyXsAAABAIbVrdFdWVkZ5eXk89NBDuW3ZbDaWLFkSVVVVuW2vvfZafPSjH40jjzwybr755ujRY+dj1NTUxKBBg9pzXAAAAEhVmy8vX7NmTd4Z59ra2qipqYl+/frF0KFD4/zzz49vfetb8Z73vCcqKytj5syZUVFRkfuG863BPWzYsLjqqqviz3/+c+61tp4Nv+WWW6Jnz55xxBFHRETEnXfeGTfddFP827/92+68VwAAAOhQbY7uJ598MsaOHZt7vvVz1FOmTIkFCxbE1772tVi7dm1MmzYtVq9eHWPGjIn77rsvevXqFRERDzzwQCxbtiyWLVsWgwcPznvtd9697Iorrojly5dHUVFRHHzwwfGzn/0sPv3pT+/Sm4Rd9dgLf42Tb3os9/z2zx8VR/19/wJOBAAAdCW7dZ/uzsx9utldwy++Z7v76uZM6sBJAACAzqYg9+mGPcWOgrs1+wEAACJENzTz2At/bdd1AABA9yW6YRvv/Ax3e6wDAAC6L9ENAAAAKRHdAAAAkBLRDdu4/fNHtes6AACg+xLdsI3W3ofb/boBAICdEd3Qgp3dh9t9ugEAgNYoKvQA0FnVzZkUj73w17xvKb/980c5ww0AALSa6IYdOOrv+zurDQAA7DKXlwMAAEBKRDcAAACkRHQDAABASkQ3AAAApER0AwAAQEpENwAAAKREdAMAAEBKRDcAAACkRHQDAABASkQ3AAAApER0AwAAQEpENwAAAKREdAMAAEBKRDcAAACkRHQDAABASkQ3AAAApER0AwAAQEpENwAAAKREdAMAAEBKigo9AHuWP2c3xCeu+W2sWrsp+vXdO+46e0wcUFJc6LEAAAAKQnTTbg6b9evIrt+ce75u9dvxwe88GCW9iuKZWRMKOBkAAEBhuLycdrFtcL9Tdv3mOGzWrzt4IgAAgMIT3ey2P2c3bDe4t8qu3xx/zm7ooIkAAAA6B9HNbvvENb9t13UAAAB7CtHNblu1dlO7rgMAANhTiG52W7++e7frOgAAgD2F6Ga33XX2mHZdBwAAsKcQ3ey2A0qKo6TXju8+V9KryP26AQCAbkd00y6emTVhu+HtPt0AAEB3tePTk9AGz8yaEH/ObohPXPPbWLV2U/Tru3fcdfYYZ7gBAIBuS3TTrg4oKY7fXnxsoccAAADoFFxeDgAAACkR3QAAAJAS0Q0AAAApEd0AAACQEtENAAAAKRHdAAAAkBLRDQAAACkR3QAAAJAS0Q0AAAApEd0AAACQEtENAAAAKRHdAAAAkBLRDQAAACkR3QAAAJAS0Q0AAAApEd0AAACQEtENAAAAKRHdAAAAkBLRDQAAACkpKvQA3dnGzU3xk+q6WL5qXQzr1ydOqxoePYv8OwgAAMCeos2Ft3jx4jjhhBOioqIiMplMLFy4MG9/kiRx6aWXxqBBg6J3794xbty4ePHFF/PWrFq1Kj772c9GSUlJ7LfffjF16tRYs2ZN3ppnnnkmjj766OjVq1cMGTIk5s6d2/Z314nNvvf5OHjmr+KKe/4QP65eHlfc84c4eOavYva9zxd6NAAAANpJm6N77dq1MXLkyJg/f36L++fOnRtXX311XHfddbFkyZLo27dvTJgwIdavX59b89nPfjaee+65eOCBB+KXv/xlLF68OKZNm5bbn81mY/z48TFs2LB46qmnYt68eTFr1qy4/vrrd+Etdj6z730+frS4NpqS/O1NScSPFtcKbwAAgD1EJkmSZOfLtvPDmUzcddddMXny5IjYcpa7oqIivvzlL8dXvvKViIhobGyMsrKyWLBgQZx88snxhz/8IUaMGBFPPPFEfOADH4iIiPvuuy+OP/74ePXVV6OioiKuvfba+MY3vhENDQ3Rs2fPiIi4+OKLY+HChfHHP/6xVbNls9koLS2NxsbGKCkp2dW32O42bm6Kg2f+qllwv1OPTMQfr5joUnMAAIBOqrXN2a5VV1tbGw0NDTFu3LjcttLS0hg1alRUV1dHRER1dXXst99+ueCOiBg3blz06NEjlixZkltzzDHH5II7ImLChAmxdOnSeOONN1r8szds2BDZbDbv0Rn9pLpuh8EdseWM90+q6zpkHgAAANLTrtHd0NAQERFlZWV528vKynL7GhoaYuDAgXn7i4qKol+/fnlrWnqNd/4Z25o9e3aUlpbmHkOGDNn9N5SC5avWtes6AAAAOq895vrlGTNmRGNjY+7xyiuvFHqkFg3r16dd1wEAANB5tWt0l5eXR0TEypUr87avXLkyt6+8vDxef/31vP2bN2+OVatW5a1p6TXe+Wdsq7i4OEpKSvIendFpVcOjR2bHa3pktqwDAACga2vX6K6srIzy8vJ46KGHctuy2WwsWbIkqqqqIiKiqqoqVq9eHU899VRuzcMPPxxNTU0xatSo3JrFixfHpk2bcmseeOCBOOigg2L//fdvz5E7XM+iHnHm0ZU7XHPm0ZW+RA0AAGAP0OayW7NmTdTU1ERNTU1EbPnytJqamnj55Zcjk8nE+eefH9/61rdi0aJF8fvf/z5OP/30qKioyH3D+Xvf+9447rjj4swzz4zHH388/vu//zvOOeecOPnkk6OioiIiIk455ZTo2bNnTJ06NZ577rn42c9+Fv/yL/8SF154Ybu98UKacfyIOOuYymZnvHtkIs46pjJmHD+iMIMBAADQrtp8y7BHHnkkxo4d22z7lClTYsGCBZEkSVx22WVx/fXXx+rVq2PMmDFxzTXXxN///d/n1q5atSrOOeecuPvuu6NHjx7xqU99Kq6++urYZ599cmueeeaZmD59ejzxxBMxYMCAOPfcc+Oiiy5q9Zyd9ZZh77Rxc1P8pLoulq9aF8P69YnTqoY7ww0AANAFtLY5d+s+3Z1ZV4huAAAAuqaC3KcbAAAA+BvRDQAAACkR3QAAAJAS0Q0AAAApEd0AAACQEtENAAAAKRHdAAAAkBLRDQAAACkR3QAAAJAS0Q0AAAApEd0AAACQEtENAAAAKRHdAAAAkBLRDQAAACkR3QAAAJAS0Q0AAAApEd0AAACQkqJCD5CWJEkiIiKbzRZ4EgAAAPY0W1tza3tuzx4b3W+++WZERAwZMqTAkwAAALCnevPNN6O0tHS7+zPJzrK8i2pqaor6+vrYd999I5PJFHqcbiWbzcaQIUPilVdeiZKSkkKPQwsco87PMer8HKPOzzHq/Byjzs8x6vwco8JJkiTefPPNqKioiB49tv/J7T32THePHj1i8ODBhR6jWyspKfEXv5NzjDo/x6jzc4w6P8eo83OMOj/HqPNzjApjR2e4t/JFagAAAJAS0Q0AAAApEd20u+Li4rjsssuiuLi40KOwHY5R5+cYdX6OUefnGHV+jlHn5xh1fo5R57fHfpEaAAAAFJoz3QAAAJAS0Q0AAAApEd0AAACQEtENAAAAKRHd7LLXXnstTj311Ojfv3/07t07Dj300HjyySdz+9esWRPnnHNODB48OHr37h0jRoyI6667roATdy/Dhw+PTCbT7DF9+vSIiFi/fn1Mnz49+vfvH/vss0986lOfipUrVxZ46u5lR8do1apVce6558ZBBx0UvXv3jqFDh8Z5550XjY2NhR67W9nZ36OtkiSJiRMnRiaTiYULFxZm2G6qNceouro6Pvaxj0Xfvn2jpKQkjjnmmHjrrbcKOHX3srNj1NDQEKeddlqUl5dH37594/3vf3/84he/KPDU3cvbb78dM2fOjMrKyujdu3cceOCBccUVV8Q7v285SZK49NJLY9CgQdG7d+8YN25cvPjiiwWcuvvZ2XHatGlTXHTRRXHooYdG3759o6KiIk4//fSor68v8OQUFXoAuqY33ngjPvzhD8fYsWPjV7/6VRxwwAHx4osvxv77759bc+GFF8bDDz8ct956awwfPjzuv//+OPvss6OioiJOPPHEAk7fPTzxxBPx9ttv554/++yz8Q//8A/xmc98JiIiLrjggrjnnnvijjvuiNLS0jjnnHPik5/8ZPz3f/93oUbudnZ0jOrr66O+vj6uuuqqGDFiRCxfvjy+8IUvRH19ffzHf/xHAafuXnb292irH/zgB5HJZDp6PGLnx6i6ujqOO+64mDFjRvzwhz+MoqKi+N///d/o0cN5h46ys2N0+umnx+rVq2PRokUxYMCAuO222+Kkk06KJ598Mo444ohCjd2tXHnllXHttdfGLbfcEoccckg8+eSTccYZZ0RpaWmcd955ERExd+7cuPrqq+OWW26JysrKmDlzZkyYMCGef/756NWrV4HfQfews+O0bt26+J//+Z+YOXNmjBw5Mt5444340pe+FCeeeGLeiTEKIIFdcNFFFyVjxozZ4ZpDDjkk+eY3v5m37f3vf3/yjW98I83R2I4vfelLyYEHHpg0NTUlq1evTvbee+/kjjvuyO3/wx/+kEREUl1dXcApu7d3HqOW/PznP0969uyZbNq0qYMnY6uWjtHTTz+dvOtd70pWrFiRRERy1113FW5Amh2jUaNGJZdcckmBp+Kdtj1Gffv2TX784x/nrenXr19yww03FGK8bmnSpEnJ5z//+bxtn/zkJ5PPfvazSZIkSVNTU1JeXp7Mmzcvt3/16tVJcXFx8u///u8dOmt3trPj1JLHH388iYhk+fLlaY/HDvhnXnbJokWL4gMf+EB85jOfiYEDB8YRRxwRN9xwQ96a0aNHx6JFi+K1116LJEniN7/5Tbzwwgsxfvz4Ak3dfW3cuDFuvfXW+PznPx+ZTCaeeuqp2LRpU4wbNy635uCDD46hQ4dGdXV1ASftvrY9Ri1pbGyMkpKSKCpykVIhtHSM1q1bF6ecckrMnz8/ysvLCzwh2x6j119/PZYsWRIDBw6M0aNHR1lZWXzkIx+J3/72t4Uetdtq6e/R6NGj42c/+1msWrUqmpqa4vbbb4/169fHRz/60cIO242MHj06HnrooXjhhRciIuJ///d/47e//W1MnDgxIiJqa2ujoaEh7/eG0tLSGDVqlN8bOtDOjlNLGhsbI5PJxH777ddBU9ISv7mxS1566aW49tpr48ILL4yvf/3r8cQTT8R5550XPXv2jClTpkRExA9/+MOYNm1aDB48OIqKiqJHjx5xww03xDHHHFPg6bufhQsXxurVq+Nzn/tcRGz5/FzPnj2b/RdwWVlZNDQ0dPyANDtG2/rLX/4SV1xxRUybNq1jByOnpWN0wQUXxOjRo+PjH/944QYjZ9tj9NJLL0VExKxZs+Kqq66Kww8/PH784x/HscceG88++2y85z3vKeC03VNLf49+/vOfxz/90z9F//79o6ioKPr06RN33XVXvPvd7y7coN3MxRdfHNlsNg4++ODYa6+94u23345vf/vb8dnPfjYiIve7QVlZWd7P+b2hY+3sOG1r/fr1cdFFF8X/+3//L0pKSjp4Wt5JdLNLmpqa4gMf+EB85zvfiYiII444Ip599tm47rrr8qL7sccei0WLFsWwYcNi8eLFMX369KioqMj7l1LSd+ONN8bEiROjoqKi0KOwHTs6RtlsNiZNmhQjRoyIWbNmdfxwRETzY7Ro0aJ4+OGH4+mnny7wZGy17TFqamqKiIizzjorzjjjjIjY8r9XDz30UNx0000xe/bsgs3aXbX033UzZ86M1atXx4MPPhgDBgyIhQsXxkknnRSPPvpoHHrooQWctvv4+c9/Hj/96U/jtttui0MOOSRqamri/PPPj4qKitzvdRReW47Tpk2b4qSTTookSeLaa68t0MTkFPr6drqmoUOHJlOnTs3bds011yQVFRVJkiTJunXrkr333jv55S9/mbdm6tSpyYQJEzpsTpKkrq4u6dGjR7Jw4cLctoceeiiJiOSNN97IWzt06NDke9/7XgdPSEvHaKtsNptUVVUlxx57bPLWW28VYDqSpOVj9KUvfSnJZDLJXnvtlXtERNKjR4/kIx/5SOGG7aZaOkYvvfRSEhHJT37yk7y1J510UnLKKad09IjdXkvHaNmyZUlEJM8++2ze2mOPPTY566yzOnrEbmvw4MHJv/7rv+Ztu+KKK5KDDjooSZIk+dOf/pRERPL000/nrTnmmGOS8847r6PG7PZ2dpy22rhxYzJ58uTksMMOS/7yl7905Ihsh890s0s+/OEPx9KlS/O2vfDCCzFs2LCI2PKva5s2bWr27bB77bVX7swDHePmm2+OgQMHxqRJk3LbjjzyyNh7773joYceym1bunRpvPzyy1FVVVWIMbu1lo5RxJYz3OPHj4+ePXvGokWLfDtsAbV0jC6++OJ45plnoqamJveIiPj+978fN998c4Em7b5aOkbDhw+PioqKHf7vFR2npWO0bt26iAi/LxTYunXrdngMKisro7y8PO/3hmw2G0uWLPF7Qwfa2XGK+NsZ7hdffDEefPDB6N+/f0ePSUsKXf10TY8//nhSVFSUfPvb305efPHF5Kc//WnSp0+f5NZbb82t+chHPpIccsghyW9+85vkpZdeSm6++eakV69eyTXXXFPAybuXt99+Oxk6dGhy0UUXNdv3hS98IRk6dGjy8MMPJ08++WRSVVWVVFVVFWDK7m17x6ixsTEZNWpUcuihhybLli1LVqxYkXts3ry5QNN2Tzv6e7St8O3lBbGjY/T9738/KSkpSe64447kxRdfTC655JKkV69eybJlywowafe1vWO0cePG5N3vfndy9NFHJ0uWLEmWLVuWXHXVVUkmk0nuueeeAk3b/UyZMiV517velfzyl79MamtrkzvvvDMZMGBA8rWvfS23Zs6cOcl+++2X/Od//mfyzDPPJB//+MeTyspKV2F1oJ0dp40bNyYnnnhiMnjw4KSmpibvd4cNGzYUePruTXSzy+6+++7kfe97X1JcXJwcfPDByfXXX5+3f8WKFcnnPve5pKKiIunVq1dy0EEHJd/97ne3ezsk2t+vf/3rJCKSpUuXNtv31ltvJWeffXay//77J3369Ek+8YlPJCtWrCjAlN3b9o7Rb37zmyQiWnzU1tYWZthuakd/j7YlugtjZ8do9uzZyeDBg5M+ffokVVVVyaOPPtrBE7KjY/TCCy8kn/zkJ5OBAwcmffr0SQ477LBmtxAjXdlsNvnSl76UDB06NOnVq1fyd3/3d8k3vvGNvFBrampKZs6cmZSVlSXFxcXJscce26r/XqT97Ow41dbWbvd3h9/85jeFHb6byyRJknTwyXUAAADoFnymGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICWiGwAAAFIiugEAACAlohsAAABSIroBAAAgJaIbAAAAUiK6AQAAICX/H43iTkOetRhlAAAAAElFTkSuQmCC\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.figure(figsize=(10,6))\n", + "plt.scatter(heights,salaries)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1n-bXVOYIUXH" + }, + "source": [ + "Let's see what happens if the relation is not linear. Suppose that our corporation decided to hide the obvious linear dependency between heights and salaries, and introduced some non-linearity into the formula, such as `sin`:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "sdS7DPgcIUXH", + "outputId": "97f0efaf-ef7a-4b8d-95a9-b01d4c917c4f" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Correlation = 0.9835304456670827\n" + ] + } + ], + "source": [ + "salaries = 1000+np.sin((heights-heights.min())/(heights.max()-heights.mean()))*100\n", + "print(f\"Correlation = {np.corrcoef(heights, salaries)[0,1]}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gtPLXZGzIUXI" + }, + "source": [ + "In this case, the correlation is slightly smaller, but it is still quite high. Now, to make the relation even less obvious, we might want to add some extra randomness by adding some random variable to the salary. Let's see what happens:" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "gYKdvnQYIUXI", + "outputId": "51359d22-9747-4bdd-e08a-3cf388ac5258" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Correlation = 0.9332592309527241\n" + ] + } + ], + "source": [ + "salaries = 1000+np.sin((heights-heights.min())/(heights.max()-heights.mean()))*100+np.random.random(size=len(heights))*20-10\n", + "print(f\"Correlation = {np.corrcoef(heights, salaries)[0,1]}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 607 + }, + "id": "-ZYLE3UAIUXJ", + "outputId": "fb604472-ce5b-4f77-c407-6f69de3f9cc5" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAA90AAAJOCAYAAACqS2TfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABigUlEQVR4nO3de3hU1aH//89MbpNAMpBgkoncAnq0MYiiBWOV1ha5yEGtfWr9WpS2PKIcWmvpt1paBVF78KCntfZYqP5aUWnP0X7bIlhPjor2YGskKo0aYxUkiEJCCiEXEnKd/fsjTST3FbIne++Z9+t58kdmPgwrnUrms9faa/ksy7IEAAAAAABs53d6AAAAAAAARCtKNwAAAAAAEULpBgAAAAAgQijdAAAAAABECKUbAAAAAIAIoXQDAAAAABAhlG4AAAAAACKE0g0AAAAAQITEOz2ASAmHwzp48KBSU1Pl8/mcHg4AAAAAIIpYlqX6+nrl5OTI7+9/PjtqS/fBgwc1YcIEp4cBAAAAAIhiH330kcaPH9/v81FbulNTUyV1/A+Qlpbm8GgAAAAAANGkrq5OEyZM6Oqe/Yna0t25pDwtLY3SDQAAAACIiMFuZ2YjNQAAAAAAIoTSDQAAAABAhFC6AQAAAACIEEo3AAAAAAARQukGAAAAACBCKN0AAAAAAEQIpRsAAAAAgAihdAMAAAAAECGUbgAAAAAAIoTSDQAAAABAhFC6AQAAAACIEEo3AAAAAAARQukGAAAAACBCKN0AAAAAAEQIpRsAAAAAgAihdAMAAAAAECGUbgAAAAAAIiTe6QEAAAAAcJ/2sKXi8mpV1TcpMzWgmbnpivP7nB4W4DmUbgAAAADdFJZWaO22MlXUNnU9FgoGtGZRnubnhxwcGeA9LC8HAAAA0KWwtELLN+/qVrglqbK2Scs371JhaYVDIwO8idINAAAAQFLHkvK128pk9fFc52Nrt5WpPdxXAkBfKN0AAAAAJEnF5dW9ZrhPZEmqqG1ScXn1yA0K8DhKNwAAAABJUlV9/4X7ZHIAKN0AAAAA/iEzNWBrDgClGwAAAMA/zMxNVygYUH8Hg/nUsYv5zNz0kRwW4GmUbgAAAACSpDi/T2sW5UlSr+Ld+f2aRXmc1w0MAaUbAAAAQJf5+SEtm50rX49e7fNJy2bnck43MESUbgAAAABdCksr9PCOcvU8FSxsSQ/vKOecbmCIKN0AAAAAJA18TncnzukGhobSDQAAAEAS53QDkUDpBgAAACCJc7qBSKB0AwAAAJDEOd1AJFC6AQAAAEjinG4gEijdAAAAACRxTjcQCZRuAAAAAF3m54e0YfEMZQe7LyHPDga0YfEMzukGhije6QEAAAAA0aA9bKm4vFpV9U3KTO1Ygu3VGeH5+SFdmpcdNT9Pp2h6j+AdlG4AAABgmApLK7R2W1m347ZCwYDWLMrz7MxwnN+ngqkZTg/DNtH4HsEbWF4OAAAADENhaYWWb97V63zrytomLd+8S4WlFQ6NDJ14j+AkSjcAAABwktrDltZuK5PVx3Odj63dVqb2cF8JjATeIziN0g0AAACcpOLy6l6zpyeyJFXUNqm4vHrkBoVueI/gNEo3AAAAcJKq6vsvcyeTg/14j+A0SjcAAABwkjJTA4OHhpCD/XiP4DRKNwAAAHCSZuama0xKwoCZMSkJmpmbPkIjQk8zc9MVCgbU38FgPnXsYs57hEihdAMAAGDEtYctFX1wRE+XHFDRB0eiehMrToF2VpzfpzWL8iT1fi86v1+zKI/zuhExnNMNAACAERVN5yUXl1erprF1wMzRxlYVl1dH1ZnXXjM/P6QNi2f0+v9dtkf/fwdvoXQDAABgxHSel9xzXrvzvOQNi2d4qgCxSZd3zM8P6dK8bBWXV6uqvkmZqR1LypnhRqRRugEAADAiBjsv2aeO85Ivzcv2TBFiky5vifP7WHGAEcc93QAAABgR0Xhe8nmTxmqw6wN+X0cOQGyidAMAAGBERONS7Dc+PKrB9oALWx05ALGJ0g0AAIAREY1LsaPxQgIAe1G6AQAAMCI6z0seiNfOS47GCwkA7EXpBgAAwIiI8/t0+fSBdya/fHrIM5uoSdzTDWBwlG4AAACMiPawpa1vVgyY2fpmhdoHu0naRbinG8Bghly6d+zYoUWLFiknJ0c+n09btmzp9vzvf/97zZ07VxkZGfL5fCopKen1Gk1NTVqxYoUyMjI0evRofelLX9KhQ4e6Zfbv36+FCxcqJSVFmZmZ+t73vqe2trahDhcAAAAuMdju5ZL3di/nnm4Agxly6W5oaND06dP10EMP9fv8RRddpH/7t3/r9zW+853vaNu2bfrtb3+r//3f/9XBgwd11VVXdT3f3t6uhQsXqqWlRa+88ooee+wxbdq0SatXrx7qcAEAAOAS0VhQuacbwGDih/oHFixYoAULFvT7/HXXXSdJ2rdvX5/P19bW6pe//KV+85vf6POf/7wk6dFHH9WnPvUpvfrqq7rgggv03HPPqaysTC+88IKysrJ0zjnn6O6779Ztt92mO++8U4mJiUMdNgAAABwWjQV1Zm66xqQkqKaxtd/MmJQET20O16k9bKm4vFpV9U3KTO3Y4M5L99sDbjHk0j1cb7zxhlpbWzVnzpyux84880xNnDhRRUVFuuCCC1RUVKRp06YpKyurKzNv3jwtX75c77zzjs4999yRHjYAAACGqXP38sraJvV1G7RPUrbHdi834cWaWlhaobXbyrrdDhAKBrRmUZ7m5w+8GR6A7kZ8I7XKykolJiZqzJgx3R7PyspSZWVlV+bEwt35fOdzfWlublZdXV23LwAAALhHnN+nNYvyJPUuop3fr1mU56nZ1OLy6gFnuSXpaGOrp+5TLyyt0PLNu3rdf19Z26Tlm3epsHTgzfAAdBc1u5evW7dOwWCw62vChAlODwkAAAA9zM8PacPiGcrucV53djCgDYtneG4WNdruU28PW1q7razPlQidj63dVuapHeYBp4348vLs7Gy1tLSopqam22z3oUOHlJ2d3ZUpLi7u9uc6dzfvzPS0atUqrVy5suv7uro6ijcAAIALzc8P6dK87Ki4Xzja7lMfbId5S5/sMF8wNWPkBgZ42IjPdJ933nlKSEjQ9u3bux577733tH//fhUUFEiSCgoK9Pbbb6uqqqor8/zzzystLU15eXl9vm5SUpLS0tK6fQEAAMCd4vw+FUzN0BXnnKqCqRmeLNzSJ/ep9zd6nzruhfbKferRNnMPuMGQZ7qPHTumPXv2dH1fXl6ukpISpaena+LEiaqurtb+/ft18OBBSR2FWuqYoc7OzlYwGNTSpUu1cuVKpaenKy0tTd/61rdUUFCgCy64QJI0d+5c5eXl6brrrtP69etVWVmp22+/XStWrFBSUpIdPzcAAICnsJO0O3Xep7588y75pG7Lsr14n3q0zdwDbuCzLGtIN2T86U9/0iWXXNLr8SVLlmjTpk3atGmTvv71r/d6fs2aNbrzzjslSU1NTfrud7+r//zP/1Rzc7PmzZunn//8592Wjn/44Ydavny5/vSnP2nUqFFasmSJ7r33XsXHm10nqKurUzAYVG1tLbPeAADA0wpLK3Tn1jJV1n0yu5idFtCdl3t3J+lou4gQLbt9t4ctXfRvLw66w/yfb/u8p98vwA6mnXPIpdsrKN0AACAaFJZW6KbNu/p9fqMHNx8rLK3QmqdLdai+peuxrNRErb0i33M/y4mi5UJCNP5/DogE084ZNbuXAwAARJv2sKXv//7tATOrfv+2p3aS7ix0JxZuSTpU36KbPH4cVbTcpw7AXpRuAAAAl3p17xGjM6Bf3XtkhEY0PO1hSyufenPAzMqn3vTURYRo03lkWH984sgwYKgo3QAAAC71yp7Dtuac9sruw2psaR8w09jSrld2e+PniUZDOTIMgBlKNwAAgEvtr26wNee03/31Y1tzbtMetlT0wRE9XXJARR8c8eRsMEeGAfYb8pFhAAAAGBnlfzcr06Y5pzU0t9mac5No2b2cI8MA+zHTDQAA4FKm86RemU89JTXJ1pxbFJZWaPnmXb2WZVfWNmm5xzaHm5mbrlAwoP62gPOp42LCzNz0kRwW4GmUbgAAAJdKSjD7qGaac1owkGBrzg06Nx7r68JH52Ne2ngszu/TmkV5ktSreHd+v2ZRnmd3Zo+GWwDgPSwvBwAAcKkzstO0a3+tUc4L4uLMLg6Y5txgKBuPFUzNGLmBDcP8/JA2LJ7Ra7l8tgeXy58oWm4BgPdQugEAAFwq3nA20TTntIKpGfqPl/YY5bwiWjcem58f0qV52Sour1ZVfZMyUzuWlHt1hrvzFoCe89qdtwBsWDyD4o2I8c5lRAAAgBhzzoSxtuacdsGUDI1JGXjp+NiUBF0wxTulO5o3Hovz+1QwNUNXnHOqCqZmeLZwR9stAPAeSjcAAIBL5YxJtjXntDi/T/deNW3AzLqrpnmq3LHxmPtx9jicRukGAABwqc5CNxCvFbr5+SFtXDxD2Wndf65QMKCNHlziG+0bj0WDaL0FAN7BPd0AAAAuFef3Kf/UtAFn6fJPTfNcoYu2+4WjdeOxaBHNtwDAGyjdAAAALtXSFtb2d6sGzGx/t0otbWElxntrAWPn/cLRItouJEgd90JHw8/TuWKksrapz/u6feq4QOKlFSPwFko3AACASz1RtE+D7e0UtjpySy+eMjKDQr+i6UJCNB2v1XkLwPLNu+STuhVvr98CEC0XRqIdpRsAAMClyg832JoDTETj8VrReAtANF0YiXaUbgAAAJeqqD1ua85NmKFzp8GO1/Kp43itS/OyPfd+RdMtANF4YSSaUboBAABc6mhDs605tygsrdCdW99RZd0n485OS9Kdl59FUXDYUI7X8uJS+mi4BSCaL4xI0XlBjtINAADgUj6f2eZopjk3KCyt0E2bd/V6vLKuWTdt3uXJY8OiCcdruV80XxiJ1iXz3vkXGgAAIMackZ1qa85p7WFL3//92wNmvv/7t9U+2O5xiBiO13K/aL0w0rlkvucFhc4l84WlFQ6NbPgo3QAAAC41Y+JYW3NOe/WDI6ppbB0wU9PYqlc/ODJCI0JPncdr9beY16eOmUeO13JONF4YGWzJvNSxZN6rF+Qo3QAAAC5V3dBia85pRXsP25qD/TqP15LUq3h7/XitaBGNF0aGsmTeiyjdAAAALvXOwVpbc84zLWreLHTtYUtFHxzR0yUHVPTBEc/OynUer5Ud7D5Tmh0MsCu2C0TjhZFoXTLfiY3UAAAAXOp4a7utOacVTM3Qf7y0xyjnNR07spepsu6EM6DTArrzcm9uABVNx2tFo2g7dzwal8yfiNINAACizvGWdv3rs2Xad6RRkzNS9IPL8pScGOf0sIbs/Enpeq6syijnBRdMydCYlIQB7+sem5KgC6Z4q3T3vyN7k6d3ZI+G47Wi2fz8kD5/ZpaeKNqnD6sbNSk9RdcVTFZivPcWM3cuma+sberzvm6fOi4oeGnJ/Iko3QAAIKrc8Phrev6EovrybumJV/fr0rxMPXL9px0c2dB9KjvN1pzT4vw+3XvVtD4Laqd1V03z1GyqyY7sq37/tmfPTIZ79XW81v/353JPznR3LplfvnmXfFK34u3VJfMn8t5lEAAAgH70LNwner6sSjc8/toIj2h4ovE+x/n5Id04O1c9Pzv7fdKNs3M9VxZe3Tv4juxHG1v16l52ZId9ovF4rWjeS4CZbgAAEBWOt7T3W7g7PV9WpeMt7Z5Zal7ycY1x7kvnT4jsYGxSWFqhh3eU91pCalnSwzvKde7EsZ76cF1keLxZ0QdH9JnTxkV4NIgFgx2v5VPH8VpeXF0RrXsJMNMNAACiwj1/fMfWnBuYbn7tlU2yo/Es3tZ2s03sTHPAYKL9eK3OvQSuOOdUFUzN8HzhlijdAAAgSrz5UY2tOTfw9VlPTz7ntGgsC3uqjtmaAwYTjbedRDtKNwAAiArHmtpszblBSpLZnYCmOadFY1k4VN9saw4YTLQfrxWNKN0AACAqpI9KtDXnBu8cqLM157RoLAvBQIKtOWAwncdr9bfo2icp5OHjtaIRpRsAAESFiekptubcoKrOcGbYMOe0aCwLyy6eYmsOGEzn8VqSev23FA3Ha0UjSjcAAIgKXzrPbPdu0xzs11kW+rsD3ZL3ysJF/3SKEuMH/kidFO/XRf90ygiNCLEgmo/XikbeuAEIAABgEOdPNpsdNc25QcboBO3+u1kOzojz+/TgNefops27+s389JpzPHUhAd4QrcdrRSNmugEAQFTY/OqHtubcIGx4dJZpzmmdR4YNxGtHhkkd5Wfj4hnKHN19v4Cs1ERtZNYRERSNx2tFI2a6AQBAVCjaazAl/I/cDbO9cX/t3sONtuacNtiRYdInR4YVTM0YoVHZg1lHAP2hdAMAgKgQbTt9dzAtbN4odpWGG76Z5tymc9YRAE7E8nIAABAV2trDtubcYHy62dFZpjmnHTY8q9o0BwBeQOkGAABRIS7ObLbXNOcG508y3BzOMOe0o40ttuYAwAso3QAAICqkJpnt4G2ac4PDx8zKp2nOaaa3N3MbNIBoQukGAABRIRoL3aljk23NOW3WZLP7nU1zbtMetlT0wRE9XXJARR8c8dwu7AAig43UAABAVEiKj7M15wYX5GbooZc+MMp5QnTtC9dNYWmF1m4r67Y7eygY0JpFeRwZBsQ4ZroBAEBUyAom2ZpzBdOJUo9MqO4sr7Y15xaFpRVavnlXr+PQKmqbtHzzLhWWVjg0MgBuQOkGAABRIRRMsTXnBq+WH7E157T2sNnO8aY5N2gPW1q7razf6x6WpLXbylhqDsQwSjcAAIgK50wYY2vODQ7WHLc157T6plZbc25QXF7da4a7p4raJhV7bPYegH0o3QAAICrUHjcraqY5N8gx3CDNNOc0y/BmbdOcG1TWml3wMM0BiD6UbgAAEBWCyWZHgZnm3ODCKeNszTktGvdRq24wO67NNAcg+lC6AQBAVNi1/6itOTe4YGqGxqQMfJFgTEqCLpjqjd3LRyWZ7RxvmnOD9NFmG/OZ5gBEH0o3AAAxLlrOFn73YK2tOTeI8/t071XTBszce9U0xXnk8PFDdc225twgOy1gaw5A9OGcbgAAYlhhaYXWPP2ODtV/UnKyUpO09oqzPHe2cNUxs6JmmoP9Th1jdu+5ac4NZuamKxQMDLiZWigY0Mzc9BEcFQA3YaYbAIAYVVhaoZs27+pWuCXpUH2zbvLg2cIpCWZzCaY5N+g8jqo/PnnrOKoLppgtgzfNuUGc36c1i/LkU+970TsfW7MozzOrEQDYj9INAEAMag9bWvnUmwNmvvvUm54pc5KUmWp2z6xpzg0GO47KkreOo/L7zIqnac4t5ueHtGHxDGUHuy8hzw4GtGHxDM+tGgFgL+9c6gUAALZ5Zc9hNba0D5hpaGnXK3sO6+J/OmWERjU8Z48fo7/sHbx8nj1+TOQHY5Oq+oHPfx5qzmmHG8yW9pvm3GR+fkiX5mWruLxaVfVNykztWFLODDcAZroBAIhBT76+39acG1w41fB4LcOcG4wbZTYrb5pzWmaq2WZipjm3ifP7VDA1Q1ecc6oKpmZQuAFIonQDABCTXv3AbDmyac4NWg2Xwpvm3CBsmY3VNOe0mbnpRkegsekYgGhC6QYAIAZZMitppjk3+NVf9tqac4Odhvdqm+a8gLlhANGG0g0AQAxKijOrNqY5N6hpbLU15w6mFz28cXGkuLx60P/9jza2emZjOAAwQekGACAGBQdZ4jvUnBskGl4gMM25QcEUs/vPTXNOi7aN4QDABKUbAIAYFOePszXnBlMyR9mac4MLpmYoJXHg92BUYpwumOqNc62jfSM1AOgLpRsAgBiUGjA7NdQ05wZ7qxpszblFYvzAH9cSBnneTWbmpitpkPEmxfvZSA1AVPHOv9IAAMA2gXizJdamOTc40tBia84NTO6BrvHQPdAtbWE1t4UHzDS3hdUySAYAvITSDQBADKo93mZrzg0amtttzblBZe1xW3NOu+eP79iaAwAvoHQDABCDkhPNlo2b5tzg1LFJtubc4O91zbbmnPbWx7W25gDACyjdAADEoFPHmm1UZZpzg4xRZmXaNOcGpRV1tuacFkw23DXfMAcAXkDpBgAgBr1fWW9rzg18PrOPNaY5N/j4aKOtOad94zNTbM0BgBd457cOAACwzYdHzEqaac4NjrcMvOHYUHNukJxg9lHNNOe0wXYuH2oOALyAf9EAAIhBlmXZmnODwXb5HmrODaaPH2trzmmHG8zuPTfNAYAXULoBAIhB40ab3ddsmnODxlazY6ZMc25QMDXD1pzTMlPN9ggwzQGAF1C6AQCIQWkpibbm3GB0ktlO66Y5VzBdaOCRBQkzc9MVCgbU3+nvPkmhYEAzc9NHclgAEFGUbgAAYtAZWaNtzbnBhaeZzfaa5txg574jtuacFuf3ac2iPEnqVbw7v1+zKE9x/v5qOQB4D6UbAIAYNMZwBts05wanGC6FN825g2n59E5JnZ8f0obFM5Qd7L6EPDsY0IbFMzQ/P+TQyAAgMjy0vgoAANjF7zMraaY5N4jGc7oLpmboP17aY5Tzkvn5IV2al63i8mpV1TcpM7VjSTkz3ACiEaUbAIAYlJZs9hHANOcGh+sNd8Y2zLnBBVMylJIYp8aW9n4zoxLjdMEUb5VuqWOpudcuFgDAyRjy8vIdO3Zo0aJFysnJkc/n05YtW7o9b1mWVq9erVAopOTkZM2ZM0e7d+/ulnn//fd1xRVXaNy4cUpLS9NFF12kl156qVtm//79WrhwoVJSUpSZmanvfe97amtrG/pPCAAAeqlrMvudappzg3cqam3NuUXiIGdWD/Y8AMBZQ/5XuqGhQdOnT9dDDz3U5/Pr16/Xgw8+qI0bN2rnzp0aNWqU5s2bp6ampq7MP//zP6utrU0vvvii3njjDU2fPl3//M//rMrKSklSe3u7Fi5cqJaWFr3yyit67LHHtGnTJq1evfokf0wAAHCitrDZsVmmOTc4NsBs8Mnk3KC4vHrQc8WPNraquLx6hEYEABiqIZfuBQsW6J577tEXv/jFXs9ZlqUHHnhAt99+u6644gqdffbZevzxx3Xw4MGuGfHDhw9r9+7d+v73v6+zzz5bp59+uu699141NjaqtLRUkvTcc8+prKxMmzdv1jnnnKMFCxbo7rvv1kMPPaSWlpbh/cQAAEC7K4/ZmnMDn2V2bpZpzg2q6psGDw0hBwAYebauRyovL1dlZaXmzJnT9VgwGNSsWbNUVFQkScrIyNAZZ5yhxx9/XA0NDWpra9MvfvELZWZm6rzzzpMkFRUVadq0acrKyup6nXnz5qmurk7vvPNOn393c3Oz6urqun0BAIC+7Tlk9nvSNOcGYcMubZpzg3GGO62b5gAAI8/W3VE6l4efWJY7v+98zufz6YUXXtCVV16p1NRU+f1+ZWZmqrCwUGPHju16nb5e48S/o6d169Zp7dq1dv44AABErZqmgZcsDzXnBlWGG6SZ5lzB9AKBhy4kAECsGfGdNyzL0ooVK5SZmamXX35ZxcXFuvLKK7Vo0SJVVFSc9OuuWrVKtbW1XV8fffSRjaMGACC6JCck2Jpzg8zRZnMJpjk3ONxguCO7YQ4AMPJsLd3Z2dmSpEOHDnV7/NChQ13Pvfjii3rmmWf0X//1X/rMZz6jGTNm6Oc//7mSk5P12GOPdb1OX69x4t/RU1JSktLS0rp9AQCAvqWPSrQ15waW4cca05wbjDM8U9w0BwAYebb+1snNzVV2dra2b9/e9VhdXZ127typgoICSVJjY2PHX+zv/lf7/X6F/7FDakFBgd5++21VVVV1Pf/8888rLS1NeXl5dg4ZAIAhaw9bKvrgiJ4uOaCiD46o3Us3Cf+DZZnt4G2ac4OPa47bmnODsOGmb6Y5AMDIG/L6qmPHjmnPnj1d35eXl6ukpETp6emaOHGibrnlFt1zzz06/fTTlZubqzvuuEM5OTm68sorJXUU6rFjx2rJkiVavXq1kpOT9cgjj6i8vFwLFy6UJM2dO1d5eXm67rrrtH79elVWVur222/XihUrlJTElVwAgHMKSyu05ul3dOiE+4KzUpO09oqzND8/5ODIhqa60ez8bdOcG9QdN7v/3DTnBjsNjwLbWV6ti//plAiPBgBwMoY80/3666/r3HPP1bnnnitJWrlypc4999yuM7RvvfVWfetb39KyZcv06U9/WseOHVNhYaECgYAkady4cSosLNSxY8f0+c9/Xueff77+/Oc/6+mnn9b06dMlSXFxcXrmmWcUFxengoICLV68WNdff73uuusuu35uAACGrLC0Qjdt3tWtcEvSofpm3bR5lwpLT35vkpEWZ7jzlmnODQLxZh9rTHPuwE5qAOB1Q57p/tznPidrgCVMPp9Pd91114AF+fzzz9f//M//DPj3TJo0Sc8+++xQhwcAQES0hy2tfOrNATPffepNXZqXrTi/b4RGdfLaDTuaac4NTs9K1YdHBz+v+vSs1BEYjT0KpozTf7z0gVEOAOBOXrrUCwCAY17Zc1iNLQPf39zQ0q5X9hweoRENTzSeaX3+pHRbc25wwdQMjUkZeAf5MSkJumBqxgiNCAAwVJRuAAAM/O6Nj23NOS0uzuwjgGnODSyf2RUC05wbxPl9uveqaQNm7r1qmidWVwBArPLOb1IAABz0YXWDrTmnfSoz2dacG7z+4VFbc24xPz+kjYtnKDut+2ay2WlJ2rh4hqc28AOAWDTke7oBAIhFHxuWadOc03x+s48Apjk3OFDdaGvOTebnh3RpXraKy6tVVd+kzNSAZuamM8MNAB7gnd+kAAA46Kjh0VmmOaf9rbLO1pwbHDQ8f9s05zZxfp8KuHcbADyH5eUAABhoM7wN2DTntCPHWmzNucFxw//xTXMAANiB0g0AQAwyXZXspdXLyQlmH2tMcwAA2IHfOgAAxKBRgYGPoRpqzg3ONDx/2zQHAIAdKN0AABiIsznntKQ4syls05wb+A2PAjPNAQBgB0o3AAAGTI+r9sqx1smJZpcHTHNucMRwEzvTHAAAdvDIRwMAAJyVkZo0eGgIOae1tLbbmnODsGU2g22aAwDADpRuAAAMfHpSuq05p7WE7c25QXu7WZk2zQEAYAdKNwAABq48d7ytOae1tZu1adOcG4xKirc1BwCAHSjdAAAYeO9Qra05p1mGS6xNc25w7sSxtuYAALADpRsAAAMvlFXZmnNai+ESa9OcG/xwYZ6tOQAA7EDpBgDAQH1Tq605pyXEme1Kbppzg+TEOJ09Pm3AzNnj0zy1IzsAwPso3QAAGEgflWhrzmnxfrMZbNOcG7SHLe2vPj5g5qPq42oPe+dnAgB4H6UbAAADPptzTvMZDtQ05wav7j2imsaBVxocbWzVq3uPjNCIAACgdAMAYORwQ4utOaeNSQnYmnODHe+b3U9vmgMAwA6UbgAADBxrMivTpjmnnTNx4Hufh5pzg7/sMZvBNs0BAGAHSjcAAAbqm9pszTnt2CDLsIeaAwAAfaN0AwBgwGd4t7ZpzmnFH9bYmnODvByzWXnTHAAAdqB0AwBgICUp3tac0463tNuac4PDdQPvXD7UHAAAdqB0AwBgICtotqGYac5pcXFmM/KmOTc4UNtsaw4AADtQugEAMFB33OxebdOc01ISzD4CmObcYLThKgPTHAAAdvDOb1IAABw0KtHsV6ZpzmnRtlxekq7+9ARbcwAA2MEbnwwAAHDY2ePH2ppzWtiyN+cGE9NH2ZoDAMAOlG4AAAzkjDW7V9s057SURMOZbsOcG8zMTVdokHvqQ8GAZuamj9CIAACgdAMAYORPf/u7rTmnZQeTbM25QZzfpzWL8vo9tM0nac2iPMX5vbM5HADA+yjdAAAYqG1ssTXntCnjRtuac4v5+SFtWDyj14x3KBjQhsUzND8/5NDIAACxyjtrxgAAcFDdcbMybZpz2rsV9bbm3GR+fkiX5mWruLxaVfVNykztWFLODDcAwAmUbgBARLW0hfVE0T59WN2oSekpuq5gshLjvbfQ6lhL2Nac0463tNuac5s4v08FUzOcHgYAAJRuAEDkrHu2TI+8XN5tB+wfPfuubrg4V6suy3NuYCchKd6vxtbBC3WSRy4oNLeZnSdumgMAAH3zxicDAIDnrHu2TL/YUd7ryKmwJf1iR7nWPVvmzMBO0tmnBm3NOa251WwG2zQHAAD6RukGANiupS2sh18uHzDz8MvlamnzxlJsSTreZrgc2zDntEbDZfCmOQAA0DdKNwDAdo+9sk+WNXDGsjpyXlFmuKGYac5pLe1mFwdMcwAAoG+UbgCA7Yr2mp1VbZpzg3bDGWzTnNNM9/Fmv28AAIaH0g0AsF3px7W25twgwXCDNNOc01heDgDAyPDGJwMAgKfUNLbamnOD5rZB1ssPMec4n+E4TXMAAKBPlG4AgO1Ma5qX6lx4sJvUh5hzXDS+SQAAuBClGwBgu9SkOFtzbhBneHOzac5pUXcRAQAAl6J0AwBsNyqQYGvODULBJFtzTjM9rc1Dp7oBAOBKlG4AgO2aW8128DbNuUEgMd7WnNPi/GZT8qY5AADQN0o3AMB2fp9ZUTPNuUGL4QUC05zTxiabXRwwzQEAgL5RugEAtmtsabM15wZ/N9xp3TTntGDArEyb5gAAQN8o3QAA27W2m22+ZZpzg0S/2aZvpjmn1TWb3axtmgMAAH2jdAMAbGe6atxDq8s1fXyarTmnmVZpKjcAAMND6QYA2C4pzuzXi2nODa6ZMdHWnNMmj0uxNQcAAPrmnU87AADPMF007p3F5dIf3620Nee08UGzMm2aAwAAfaN0AwBsF43HUdUeb7E157RXPjhsaw4AAPSN0g0AsF1ivFmZNs25wUdHGm3NOa3muNku66Y5AADQN0o3AMB2zW1m22+Z5tzgUF2TrTmn+Q0X95vmAABA3yjdAADbtRmWadOcGzQZjtU05zS/4dJ+0xwAAOgbpRsAYLtmw/O3TXNuMCrR7Pxt05zTEhMMzx03zAEAgL5RugEAtovGM6DPzDY7f9s057Rzxo+xNQcAAPpG6QYA2M4ybNOmOTcIJJr9yjTNOe2Ba2bYmgMAAH3zxicDAICnBAx3JTfNuUFVXbOtOaeNDsTr7PEDz8qfPT5NowPxIzQiAACiE6UbAGC7CenJtubc4EiD2fnbpjk32PrNi/st3mePT9PWb148wiMCACD6cPkaAGC7zGCK3j00+HnVmcGUERiNPZINNxQzzbnF1m9erGNNbfrOk3/V/qPHNXFssn7ylXOZ4QYAwCb8RgUA2O7DI4MX7qHk3CAnmKR91ceNcl4zOhCvR5Z82ulhAAAQlVheDgCwXWu72Q5ppjk3yBlrNitvmgMAALGB0g0AsN2oeLPzt01zbvDq3qO25gAAQGygdAMAbFd9vN3WnBvUNbXamgMAALGB0g0AsF1zm9kMtmnODYIBs1+ZpjkAABAb+GQAALBdU6vZDLZpzg1GJZrtPWqaAwAAsYHSDQCwXdhwfzTTnBscqG22NQcAAGIDpRsAYLvEBJ+tOTdoajGcvTfMAQCA2EDpBgDYbtqpQVtzbtBqePu5aQ4AAMQGbjwDAJdpaQvriaJ9+rC6UZPSU3RdwWQlxnvrGmlWMFlSjWHOG+Ikmcxhx0V6IAAAwFMo3QDgIuueLdMjL5crfMJs6Y+efVc3XJyrVZflOTewIRo/xqxMm+bcICnBp0aDaewkDy2ZBwAAkeetqRMAiGLrni3TL3Z0L9ySFLakX+wo17pny5wZ2EkYHTCb7zXNuUG+4VJ40xwAAIgNlG4AcIGWtrAefrl8wMzDL5erpc0b233/efdhW3NucM7EMbbmAABAbKB0A4ALPPbKPlmDrFy2rI6cF5T/vdHWnBuMTUq0NQcAAGIDpRsAXODVD/5ua85pbT3XyA8z5wb/u8fsf3vTHAAAiA1DLt07duzQokWLlJOTI5/Ppy1btnR73rIsrV69WqFQSMnJyZozZ452797d63X++Mc/atasWUpOTtbYsWN15ZVXdnt+//79WrhwoVJSUpSZmanvfe97amtrG+pwAcAT9lQdszXntNSA2a8X05wblB6stTUHAABiw5A/7TQ0NGj69Ol66KGH+nx+/fr1evDBB7Vx40bt3LlTo0aN0rx589TU1NSV+d3vfqfrrrtOX//61/Xmm2/qL3/5i6699tqu59vb27Vw4UK1tLTolVde0WOPPaZNmzZp9erVJ/EjAoD7NbWZzfia5px2tKHV1pwbtLSY3U9vmgMAALFhyEeGLViwQAsWLOjzOcuy9MADD+j222/XFVdcIUl6/PHHlZWVpS1btuiaa65RW1ubvv3tb+u+++7T0qVLu/5sXt4nR+E899xzKisr0wsvvKCsrCydc845uvvuu3XbbbfpzjvvVGIi98sBiC7tYbOiZppzWqvhOE1zbmD5JBlc87A4MQwAAJzA1nV95eXlqqys1Jw5c7oeCwaDmjVrloqKiiRJu3bt0oEDB+T3+3XuuecqFAppwYIFKi0t7fozRUVFmjZtmrKysroemzdvnurq6vTOO+/0+Xc3Nzerrq6u2xcAeMWYZLNroKY5p41KMDsKzDQHAADgVbaW7srKSknqVpY7v+98bu/evZKkO++8U7fffrueeeYZjR07Vp/73OdUXV3d9Tp9vcaJf0dP69atUzAY7PqaMGGCfT8YAETYKaOTbM05LcNwnKY5N0iIM/uVaZoDAACxYcQ/GYT/sZTwhz/8ob70pS/pvPPO06OPPiqfz6ff/va3J/26q1atUm1tbdfXRx99ZNeQASDiTgkm25pzWly82Qy2ac4NTk0zW2VgmgMAALHB1tKdnZ0tSTp06FC3xw8dOtT1XCgUktT9Hu6kpCRNmTJF+/fv73qdvl7jxL+jp6SkJKWlpXX7AgCvyDEs06Y5p03LMfs32DTnCnEJ9uYAAEBMsLV05+bmKjs7W9u3b+96rK6uTjt37lRBQYEk6bzzzlNSUpLee++9rkxra6v27dunSZMmSZIKCgr09ttvq6qqqivz/PPPKy0trVtZB4BocazZ7EhE05zTQqkBW3MAAABeNeQ1cMeOHdOePXu6vi8vL1dJSYnS09M1ceJE3XLLLbrnnnt0+umnKzc3V3fccYdycnK6zuFOS0vTTTfdpDVr1mjChAmaNGmS7rvvPknSl7/8ZUnS3LlzlZeXp+uuu07r169XZWWlbr/9dq1YsUJJSd65/w8AjJnueO2RnbGffONj49zNc8+I8GjsEWeydfkQcgAAIDYMuXS//vrruuSSS7q+X7lypSRpyZIl2rRpk2699VY1NDRo2bJlqqmp0UUXXaTCwkIFAp/MZtx3332Kj4/Xddddp+PHj2vWrFl68cUXNXbsWElSXFycnnnmGS1fvlwFBQUaNWqUlixZorvuumu4Py8AuFIoaHZB0TTntMPHmm3NuUFdk9kqA9McAACIDT7LsqLyknxdXZ2CwaBqa2u5vxuA633j0WK9+N7fB819/oxT9KuvzxyBEQ3P6T/4o1oNjuBO8Eu7/3Vh5Adkgxl3PafqxtZBc+kpCdq1eu4IjAgAADjJtHNyrgkAuMD7h+ptzTltTMDw3HHDnBtkjE60NQcAAGIDpRsAXCDe8Gxn05zTQmPNdlk3zbnBVTPG25oDAACxwRuf3gAgyp2Vk2przmkNhvc1m+bcYOlFU2zNAQCA2EDpBgAX8PnMtiU3zTmt6liLrTk3SIz368bZuQNmbpydq8R4frUCAIBP8MkAAFyg/HCDrTmnma6C98hq+S6rLsvrt3jfODtXqy7LG+ERAQAAt/PODjYAEMUamg2XYxvmnBZKC6jm+OAXCEJpgUEzbrPqsjx9d+6ZeqJonz6sbtSk9BRdVzCZGW4AANAnSjcAuMCoRLN/jk1zTgumJEoavHR35LwnMd6vpRdz7zYAABgcl+UBwAVyM1JszTmtpc2yNQcAAOBVlG4AcIEjja225px2vNVsGbxpDgAAwKso3QDgAn+vO25rzml1x80uDpjmAAAAvIrSDQAuUG04g22ac1qC4bbkpjkAAACv4tMOALiAZXhrs2nOaWNHJdiaAwAA8CpKNwC4QMZos128TXNO+1QoaGsOAADAqyjdAOACE8aa7UpumnPaRMNxmuYAAAC8itINAC7gM/zX2DTntA+rG23NAQAAeJVHPr4BQHRrbjW7Wds057S3Pq6xNQcAAOBVlG4AcIExKfG25pzW0NJuaw4AAMCrKN0A4AL7DpstszbNOW2s4cUB0xwAAIBXUboBwAUams3O3zbNOW284QZppjkAAACvonQDgAukJJmdV22ac1pKotkMtmkOAADAqyjdAOACqUlxtuac9l5lna05AAAAr6J0A4ALfHTU7F5t05zT9v79mK05AAAAr6J0A4ALtLebHQVmmnNaa3vY1hwAAIBXUboBwAUS483+OTbNOY17ugEAADp449MbAES5461mM76mOadlpgVszQEAAHgVpRsAXKClzaxMm+aclpxgtuGbaQ4AAMCrWNcHwNOOt7TrX58t074jjZqckaIfXJan5ETvFbnkBL/qmtuNcl6QZLgM3jQHAADgVZRuAJ51w+Ov6fmyqq7vX94tPfHqfl2al6lHrv+0gyMbutCYgA4dazXKeUHt8cF/lqHkAAAAvIopBgCe1LNwn+j5sird8PhrIzyi4UlOTLA157RjzW225gAAALyK0g3Ac463tPdbuDs9X1al4y2DL9d2C8syOwrMNOc0v99naw4AAMCrKN0APOdHfyyzNecGR4612Jpz2uSMFFtzAAAAXkXpBuA5JR/V2Jpzg2PNZvc2m+ac9nF1o605AAAAr6J0A/Cc9rDZsVmmOTdoNjx/2zTntKONZhcHTHMAAABeRekG4DnHmgxnhQ1zbmBapb1RuSXTawMeuYYAAABw0ijdADzH5DzroeTcwC+zDdJMc07LTE2yNQcAAOBVlG4AnjM6Kd7WnBscN5zyNc057VM5qbbmAAAAvIrSDcBzrpk53tacK5ienOWRE7aunjHR1hwAAIBXUboBeM7Zp461NecGmWkBW3NOu/D0cUpJjBswk5IYpwtPHzdCIwIAAHAGpRuA5+zcW21rzg1+cOmZtuacFuf36cdXTx8w8+OrpyvO75GpewAAgJNE6QbgOW9+XGNrzg32Hm2wNecG8/ND2rh4hrJSE7s9npWaqI2LZ2h+fsihkQEAAIwc7+wyBAD/kJxodr3QNOcGv//rAePc8ktOj/Bo7DM/P6RL87JVXF6tqvomZaYGNDM3nRluAAAQMyjdADwnM9Xw/mfDnBtU1DTZmnOTOL9PBVMznB4GAACAI7wzDQQA/zA6MPAGXUPNuUE4bHYUmGkOAAAA7kDpBuA5VXUttubcwLRKU7kBAAC8hdINwHOyg2bLxk1zbmB6izO3QgMAAHgLpRuA5xxrarU15wYpiWZbbJjmAAAA4A6UbgCe0x62bM25wWmnpNiaAwAAgDtQugF4zt+Pmd2rbZpzg8ygWZk2zQEAAMAdKN0APCcad/o+JTnR1hwAAADcgdINwHOq6g13LzfMucF/v3vI1hwAAADcgdINwHOicffyIw1NtuYAAADgDpRuAJ7z6cnptubcwO8zOwvMNAcAAAB3oHQD8Jyp40bbmnODadlmYzXNAQAAwB0o3QA8Z9Mr5bbm3CAhIcHWHAAAANyB0g3Ac2qOt9qac4PkxDhbcwAAAHAHSjcAz8kOJtmac4P2cLutOQAAALgDpRuA51yal21rzg0+Ptpsaw4AAADuQOkG4Dm1jWbLxk1zbtDaHrY1BwAAAHegdAPwnLKKOltzbhAMxNuaAwAAgDtQugF4Tn2z2Qy2ac4Nqo612JoDAACAO1C6AXiOZbjE2jTnBtUNZmXaNAcAAAB3oHQD8JxoPDLMJ8vWHAAAANyB0g3Ac6rqzcq0ac4VTLs0nRsAAMBTKN0APCcz1ez8bdOcGyQmxtmaAwAAgDtQugF4zhfyMm3NucGpYwK25gAAAOAOlG4AnpPgN/unyzTnBqeMMivTpjkAAAC4g3c+kQLAP3xcc9zWnBu8daDW1hwAAADcId7pAQAYWe1hS8Xl1aqqb1JmakAzc9MV5/c5PawhmTA22dacGzS1ttuaAwAAgDtQuoEYUlhaoTVPl+pQ/SdnPWelJmrtFfmanx9ycGRDk5sx2tacGyQn+NTUPvjW5MkJ3rpAAgAAEOtYXg7EiMLSCt20eVe3wi1Jh+pbdNPmXSosrXBoZEO36ZVyW3NuMDqQYGsOAAAA7kDpBmJAe9jSyqfeHDCz8qk31R72xiHQNcfNzt82zblBi8Es91ByAAAAcAdKNxADXtl9WI0tA98L3NjSrld2Hx6hEQ1PYpzZEmvTnBskxJmdv22aAwAAgDtQuoEY8Lu/fmxrzmljRpktsTbNucG5E4K25gAAAOAOQy7dO3bs0KJFi5STkyOfz6ctW7Z0e96yLK1evVqhUEjJycmaM2eOdu/e3edrNTc365xzzpHP51NJSUm359566y1dfPHFCgQCmjBhgtavXz/UoQL4h8FmuYeac1pqUqKtOTe4+vyJtuYAAADgDkMu3Q0NDZo+fboeeuihPp9fv369HnzwQW3cuFE7d+7UqFGjNG/ePDU1NfXK3nrrrcrJyen1eF1dnebOnatJkybpjTfe0H333ac777xTDz/88FCHC0DSpyen25pzWnKi2T9dpjk3uPD0cUpJHHjpeEpinC48fdwIjQgAAAB2GPIn0gULFuiee+7RF7/4xV7PWZalBx54QLfffruuuOIKnX322Xr88cd18ODBXjPi//3f/63nnntO999/f6/X+fWvf62Wlhb96le/0llnnaVrrrlGN998s3784x8PdbgAJC2+YJKtOadV1TfbmnODOL9PP756+oCZH1893XNnqgMAAMQ6W6eBysvLVVlZqTlz5nQ9FgwGNWvWLBUVFXU9dujQId1www164oknlJKS0ut1ioqKNHv2bCUmfrI0dN68eXrvvfd09OhRO4cMxISSj2pszTmtsq73ypnh5Nxifn5IGxfPUFZq92XxWamJ2rh4hqfOUgcAAECHeDtfrLKyUpKUlZXV7fGsrKyu5yzL0te+9jXddNNNOv/887Vv374+Xyc3N7fXa3Q+N3bs2F5/prm5Wc3Nn8xq1dXVDetnAaJJVb1Z+TTNOa21NWxrzk3m54d0aV62isurVVXfpMzUgGbmpjPDDQAA4FG2lm4TP/vZz1RfX69Vq1bZ+rrr1q3T2rVrbX1NIFqkp5htKGaac57pWdXePNM6zu9TwdQMp4cBAAAAG9i6vDw7O1tSx/LxEx06dKjruRdffFFFRUVKSkpSfHy8TjvtNEnS+eefryVLlnS9Tl+vceLf0dOqVatUW1vb9fXRRx/Z94MBHve3ynpbc05LMDx/2zQHAAAARIqtpTs3N1fZ2dnavn1712N1dXXauXOnCgoKJEkPPvig3nzzTZWUlKikpETPPvusJOnJJ5/Uj370I0lSQUGBduzYodbW1q7Xef7553XGGWf0ubRckpKSkpSWltbtC0CHj4422ppzWlK82SId0xwAAAAQKUP+RHrs2DHt2bOn6/vy8nKVlJQoPT1dEydO1C233KJ77rlHp59+unJzc3XHHXcoJydHV155pSRp4sTuZ8yOHj1akjR16lSNHz9eknTttddq7dq1Wrp0qW677TaVlpbqpz/9qX7yk5+c7M8JxLRJ6b03LBxOzmkpSWb/dJnmAAAAgEgZ8ifS119/XZdccknX9ytXrpQkLVmyRJs2bdKtt96qhoYGLVu2TDU1NbroootUWFioQCBg/HcEg0E999xzWrFihc477zyNGzdOq1ev1rJly4Y6XACSriuYrB89+67CA9zi7Pd15LwgY3SSrTkAAAAgUoZcuj/3uc/Jsvr/5O7z+XTXXXfprrvuMnq9yZMn9/l6Z599tl5++eWhDg9AHxLj/co/NU1vfdz/rv75p6YpMd7WO04iJhQ0K9OmOQAAACBSvPEJG8CwtLSFVXpg4GP0Sg/UqaXNG0dsNTS325oDAAAAIoXSDcSAJ4r2Dbi0XJLCVkfOCwZabXMyOQAAACBSKN1ADPiw2mxXctOc06rqm23NAQAAAJFC6QZiQLTtXj5udKKtOQAAACBSKN1ADPjKpycOHhpCzml/N5zBNs0BAAAAkULpBmLAfxbvtzXntHbDe7VNcwAAAECkULqBGPDavmpbc05jphsAAABeQekGYkDA8Pxt05zTEuPibM0BAAAAkeKNT9gAhiUp0WdrzmmBBMOLCIY5AAAAIFL4RArEgL1Vx23NOe2U1CRbcwAAAECkULqBmBC2Oees7LSArTkAAAAgUijdQAw4MzvN1pzTGlrabM0BAAAAkULpBmLAeZPSbc05LWx4EphpDgAAAIgUSjcQA0Jjkm3NOY0jwwAAAOAVlG4gBszMTdeYlIQBM2NSEjQz1xsz3WykBgAAAK+gdAOQJHnjsLAOPsPBmuYAAACASKF0AzGguLxaNY2tA2aONraquLx6hEY0PKlJ8bbmAAAAgEihdAMxoKq+ydac0yprzcZpmgMAAAAihdINxIBxo83ubTbNOY715QAAAPAISjcQC0yPzvLIEVs5QbNd1k1zAAAAQKRQuoEYcLjB7Ogs05zTUhPjbM0BAAAAkULpBmJAZmrA1pzTXv7gsK05AAAAIFIo3UAMmJmbrlBw4EIdCgY8c053XXObrTkAAAAgUijdQAyI8/t0+fTQgJnLp4cU5/fGxmPZaWYz8qY5AAAAIFIo3UAMaA9bevL1jwfMPPn6x2oPe2MntXl5WbbmAAAAgEihdAMx4NUPjqimsXXATE1jq1794MgIjWh4stPMdiU3zQEAAACRQukGYkDRXrMNxUxzTiurrLM1BwAAAEQKpRuICab3anvjnu43Pjxqaw4AAACIFEo3MID2sKWiD47o6ZIDKvrgiGfuee5pluGu5KY5p40yPH/bNAcAAABESrzTAwDcqrC0Qmu3lamitqnrsVAwoDWL8jQ/f+CdwN0mbJldLDDNOe2qGeP1h5KDRjkAAADAScx0A30oLK3Q8s27uhVuSaqsbdLyzbtUWFrh0MhOzpa/HrA157QLTxs36Cz2qKQ4XXjauBEaEQAAANA3SjfQQ3vY0tptZeprzrfzsbXbyjy11PxYc5utOafF+X3696unD5j59y9P98y54wAAAIhelG6gh+Ly6l4z3CeyJFXUNqm4vHrkBjVMmWlJtubcYH5+SBsXz1B2WqDb46FgQBsXz/DcLQAAAACITtzTDfRQVd9/4T6ZnBvMmJiuX+/8yCjnJfPzQ7o0L1vF5dWqqm9SZmpAM3PTmeEGAACAa1C6gR7GjTab7TXNuUHOmGRbc24S5/epYGqG08MAAAAA+sTycqCHsOG92qY5N5iZm64xKQkDZsakJGimR44MAwAAALyC0g30sNPwXm3TnFu0toUHfr594OcBAAAADB2lG+jFdAbbOzPdr+49ooaW9gEzDc3tenXvkREaEQAAABAbKN1ADwVTzM52Ns25wSt7DtuaAwAAAGCG0g30cMHUDKP7ny/w0OZdB2qO25oDAAAAYIbSDfQQ5/fpK+ePHzDzlfPHe+pYquxgYPDQEHIAAAAAzFC6gR7aw5a2vlkxYGbrmxVq99Du5WNTEm3NAQAAADBD6QZ6KC6vVkVt04CZitomFXto9/K6plZbcwAAAADMULqBHqrqBy7cQ825gU9mS+FNcwAAAADMULqBHjJTze5rNs25wacnj7U1BwAAAMAMpRvoYWZuukLBQL9zvj5JoWBAM3PTR3JYw/L+oXpbcwAAAADMULqBHuL8Pq1ZlCdJvYp35/drFuV5avfy/dVmR4GZ5gAAAACYoXQDfZifH9KGxTN6HaGVHQxow+IZmp8fcmhkJ8eywrbmAAAAAJiJd3oAgFvNzw/p0rxsFZdXq6q+SZmpHUvKvTTD3SklMc7WHAAAAAAzlG5gAHF+nwqmZjg9jGErqzC7V9s0BwAAAMAMpRsYQEtbWE8U7dOH1Y2alJ6i6womKzHee3dlJBmO2TQHAAAAwAylG+jHumfL9MjL5Qpbnzz2o2ff1Q0X52rVZXnODewktIfN7tU2zQEAAAAwQ+kG+rDu2TL9Ykd5r8fDlroe91LxPlDTZGsOAAAAgBnWkgI9tLSF9cjLvQv3iR55uVwtbd6ZFTYdqod+JAAAAMATKN1AD08U7eu2pLwvYasj5xUFU9JtzQEAAAAwQ+kGeth3pNHWnBv8wHApvGkOAAAAgBlKN9DLINPcQ8457+0DtbbmAAAAAJihdAM9nDN+jK05N6iqN9sgzTQHAAAAwAylG+ghZ2yKrTk3GBNIsDUHAAAAwAylG+hhZm66QsHAgJlQMKCZud7ZdGzb2x/bmgMAAABghtIN9BDn9+ny6aEBM5dPDynO7xuhEQ3f9ncP25oDAAAAYIbSDfTQHra09c2KATNb36xQ+2DnirmIZZmN1TQHAAAAwAylG+ihuLxaFbUDbyhWUduk4vLqERrR8M2YONbWHAAAAAAzlG6gh2jc6fsnXznX1hwAAAAAM5RuoIdxo5NszblBWUWdrTkAAAAAZijdQE+mtzV76PbnaJy9BwAAALyA0g30cLih2dacG4wbZTh7b5gDAAAAYIbSDfSQmTrwGd1DzblBWzhsaw4AAACAGUo30MPM3HSFggH1dwq3T1IoGNDM3PSRHNawbPnrAVtzAAAAAMxQuoEe4vw+rVmUN2BmzaI8xfn7q+Xu09DSbmsOAAAAgBlKN9CH+fkhLZudq5692u+Tls3O1fz8kDMDO0mfnmx2/rZpDgAAAIAZSjfQh8LSCj28o1zhHjuUW5b08I5yFZZWODOwk7Tkwtx+l8t38v0jBwAAAMA+lG6gh/awpbXbyvo8EazzsbXbytTes5G7WGK8X8tmD1yol83OVWI8/yQAAAAAduITNtBDcXm1Kmr7P6/aklRR26Ti8uqRG5QNVl2Wpxtn957x9km6cXauVl028H3sAAAAAIZuyKV7x44dWrRokXJycuTz+bRly5Zuz1uWpdWrVysUCik5OVlz5szR7t27u57ft2+fli5dqtzcXCUnJ2vq1Klas2aNWlpaur3OW2+9pYsvvliBQEATJkzQ+vXrT+4nBIaoqr7/wn0yOTdZdVme3rtnge5Y+CldXzBJdyz8lN67ZwGFGwAAAIiQ+KH+gYaGBk2fPl3f+MY3dNVVV/V6fv369XrwwQf12GOPKTc3V3fccYfmzZunsrIyBQIB/e1vf1M4HNYvfvELnXbaaSotLdUNN9yghoYG3X///ZKkuro6zZ07V3PmzNHGjRv19ttv6xvf+IbGjBmjZcuWDf+nBgYQjed0nygx3q+lF09xehgAAABATPBZlnXSN6b6fD794Q9/0JVXXimpY5Y7JydH3/3ud/V//+//lSTV1tYqKytLmzZt0jXXXNPn69x3333asGGD9u7dK0nasGGDfvjDH6qyslKJiYmSpO9///vasmWL/va3vxmNra6uTsFgULW1tUpLSzvZHxExqD1s6bx7nldNY2u/mTEpCXrj9ks9dWwYAAAAAPuYdk5b7+kuLy9XZWWl5syZ0/VYMBjUrFmzVFRU1O+fq62tVXp6etf3RUVFmj17dlfhlqR58+bpvffe09GjR/t8jebmZtXV1XX7Ak5WS1t4wOdbB3keAAAAACSbS3dlZaUkKSsrq9vjWVlZXc/1tGfPHv3sZz/TjTfe2O11+nqNE/+OntatW6dgMNj1NWHChJP+ORDbXv3giBpb2gfMNLS069UPjozQiAAAAAB4laO7lx84cEDz58/Xl7/8Zd1www3Deq1Vq1aptra26+ujjz6yaZSINUV7D9uaAwAAABC7bC3d2dnZkqRDhw51e/zQoUNdz3U6ePCgLrnkEl144YV6+OGHe71OX69x4t/RU1JSktLS0rp9ASfH9D5t7ucGAAAAMDBbS3dubq6ys7O1ffv2rsfq6uq0c+dOFRQUdD124MABfe5zn9N5552nRx99VH5/92EUFBRox44dam39ZCOr559/XmeccYbGjh1r55CBXgqmZtiaAwAAABC7hly6jx07ppKSEpWUlEjq2DytpKRE+/fvl8/n0y233KJ77rlHW7du1dtvv63rr79eOTk5XTucdxbuiRMn6v7779ff//53VVZWdrtX+9prr1ViYqKWLl2qd955R08++aR++tOfauXKlbb80MBALpiSoVGJcQNmRiXF6YIplG4AAAAAAxvyOd2vv/66Lrnkkq7vO4vwkiVLtGnTJt16661qaGjQsmXLVFNTo4suukiFhYUKBDrONH7++ee1Z88e7dmzR+PHj+/22p2nlwWDQT333HNasWKFzjvvPI0bN06rV6/mjG6MmMHO0Tv5g/YAAAAAxJJhndPtZpzTjZP1l92H9dVf7hw09+uls/SZ08eNwIgAAAAAuI0j53QD7WFLRR8c0dMlB1T0wRG1h713TYfdywEAAADYZcjLy4H+FJZWaO22MlXUNnU9FgoGtGZRnubnhxwc2VCxezkAAAAAezDTDVsUllZo+eZd3Qq3JFXWNmn55l0qLK1waGRDx+7lAAAAAOxC6cawtYctrd1W1ufmY52Prd1W5pml5hdMydCYlIQBM2NTEti9HAAAAMCgKN0YtuLy6l4z3CeyJFXUNqm4vHrkBjUMcX6f7r1q2oCZdVdNU5yf5eUAAAAABkbpxrBV1fdfuE8m5wbz80PauHiGstMC3R4PBQPauHiGx+5RBwAAAOAUNlLDsGWmBgYPDSHnFvPzQ7o0L1vF5dWqqm9SZmpAM3PTmeEGAAAAYIzSjWGbmZuuUDCgytqmPu/r9knKDnYUVq+J8/vYMA0AAADASWN5OYYtzu/TmkV5knofotX5/ZpFeZ6cIW5pC+uXL+/V6qdL9cuX96qlLez0kAAAAAB4iM+yLG9sKT1EdXV1CgaDqq2tVVpamtPDiQnRc053h3XPlumRl8t14qbrfp90w8W5WnVZnnMDAwAAAOA4087J8nLYJprugV73bJl+saO81+NhS12PU7wBAAAADIbSDVtFwz3QLW1hPfJy78J9okdeLtd3556pxHju0AAAAADQPxoD0MMTRfu6LSnvS9jqyAEAAADAQJjphq3aw5bnl5d/WN1oaw4AAABA7KJ0wzaFpRW6c2uZKus+2UgtOy2gOy/31kZqk9JTbM0BAAAAiF0sL4ctCksrdNPmXd0KtyRV1jXpps27VFha4dDIhu66gskabHLe7+vIAQAAAMBAKN0Ytvawpe///u0BM6t+/7baB7tR2iUS4/264eLcATM3XJzLJmoAAAAABkVrwLC9uveIahpbB8wcbWzVq3uPjNCIhm/VZXm6cXZurxlvv0+6cTbndAMAAAAwwz3dGLa/7DlsnPvMaeMiPBr7rLosT9+de6aeKNqnD6sbNSk9RdcVTGaGGwAAAIAxSjeG7WDNcVtzbpIY79fSi6c4PQwAAAAAHkXpxrCdOibZ1pybRMMRaAAAAACcQ+nGsF142jg99KcPjHJeUlhaobXbylRR+8mO7KFgQGsWeesINAAAAADO4eZUDNsFUzI0JiVhwMzYlARdMCVjhEY0fIWlFVq+eVe3wi1JlbVNWu6xI9AAAAAAOIfSjWGL8/t071XTBsysu2qaZ5Zlt4ctrd1Wpr4OOOt8bO22Ms8cgQYAAADAOZRu2GJ+fkgbF89Qdlqg2+OhYEAbF8/w1HLs4vLqXjPcJ7IkVdQ2qbi8euQGBQAAAMCTuKcbtpmfH9Kledme33isqr7/wn0yOQAAAACxi9INW8X5fSqY6p17t/uSnpJoaw4AAABA7GJ5OdDD3yrrbM0BAAAAiF3MdMNW0XCu9UdHj9uaAwAAABC7KN2wTbScaz0pPcXWHAAAAIDYxfJy2CKazrW+rmCyBpuc9/s6cgAAAAAwEEo3hi3azrVOjPfrhotzB8zccHGuEuP5zwcAAADAwGgNGLZoPNd61WV5unF2bq8Zb79PunF2rlZdlufMwAAAAAB4Cvd0Y9ii9VzrVZfl6btzz9QTRfv0YXWjJqWn6LqCycxwAwAAADBG6cawZaYGbM25SWK8X0svnuL0MAAAAAB4FFN2GLaZuekKBQPqb+8xnzp2MZ+Zmz6SwwIAAAAAx1G6MWxxfp/WLOq4x7ln8e78fs2iPM+d1w0AAAAAw0Xphi3m54e0YfEMZQe7LyHPDga0YfEMT53TDQAAAAB24Z5u2GZ+fkiX5mWruLxaVfVNykztWFLODDcAAACAWEXphq3i/D4VTM1wehgAAAAA4AosLwcAAAAAIEIo3QAAAAAARAilGwAAAACACKF0AwAAAAAQIZRuAAAAAAAihNINAAAAAECEULoBAAAAAIgQSjcAAAAAABFC6QYAAAAAIELinR5ALGsPWyour1ZVfZMyUwOamZuuOL/P6WEBAAAAAGxC6XZIYWmF1m4rU0VtU9djoWBAaxblaX5+yMGRDQ8XEgAAAADgE5RuBxSWVmj55l2yejxeWduk5Zt3acPiGZ4s3tF6IQEAAAAAThb3dI+w9rCltdvKehVuSV2Prd1WpvZwXwn36ryQcGLhlj65kFBYWuHQyAAAAADAOZTuEVZcXt2rmJ7IklRR26Ti8uqRG9QwReuFBAAAAAAYLkr3CKuq779wn0zODaLxQgIAAAAA2IHSPcIyUwO25twgGi8kAAAAAIAdKN0jbGZuukLBgPrbz9unjs3HZuamj+SwhiUaLyQAAAAAgB0o3SMszu/TmkV5ktSreHd+v2ZRnqeO2YrGCwkAAAAAYAdKtwPm54e0YfEMZQe7z/xmBwOePC4sGi8kAAAAAIAdfJZlReWW0nV1dQoGg6qtrVVaWprTw+lTe9hScXm1quqblJnaMRPs5WLKOd0AAAAAYoVp56R0w1bRdiEBAAAAAPpi2jnjR3BMiAFxfp8KpmY4PQwAAAAAcAXu6QYAAAAAIEIo3QAAAAAARAilGwAAAACACKF0AwAAAAAQIZRuAAAAAAAihNINAAAAAECEULoBAAAAAIgQSjcAAAAAABES7/QAEF3aw5aKy6tVVd+kzNSAZuamK87vc3pYAAAAAOAISjdsU1haobXbylRR29T1WCgY0JpFeZqfH3JwZAAAAADgDJaXwxaFpRVavnlXt8ItSZW1TVq+eZcKSyscGhkAAAAAOIfSjWFrD1tau61MVh/PdT62dluZ2sN9JQAAAAAgeg25dO/YsUOLFi1STk6OfD6ftmzZ0u15y7K0evVqhUIhJScna86cOdq9e3e3THV1tb761a8qLS1NY8aM0dKlS3Xs2LFumbfeeksXX3yxAoGAJkyYoPXr1w/9p8OIKC6v7jXDfSJLUkVtk4rLq0duUAAAAADgAkMu3Q0NDZo+fboeeuihPp9fv369HnzwQW3cuFE7d+7UqFGjNG/ePDU1fVLKvvrVr+qdd97R888/r2eeeUY7duzQsmXLup6vq6vT3LlzNWnSJL3xxhu67777dOedd+rhhx8+iR8RkVZV33/hPpkcAAAAAESLIW+ktmDBAi1YsKDP5yzL0gMPPKDbb79dV1xxhSTp8ccfV1ZWlrZs2aJrrrlG7777rgoLC/Xaa6/p/PPPlyT97Gc/02WXXab7779fOTk5+vWvf62Wlhb96le/UmJios466yyVlJToxz/+cbdyDnfITA3YmgMAAACAaGHrPd3l5eWqrKzUnDlzuh4LBoOaNWuWioqKJElFRUUaM2ZMV+GWpDlz5sjv92vnzp1dmdmzZysxMbErM2/ePL333ns6evRon393c3Oz6urqun1hZMzMTVcoGFB/B4P51LGL+czc9JEcFgAAAAA4ztbSXVlZKUnKysrq9nhWVlbXc5WVlcrMzOz2fHx8vNLT07tl+nqNE/+OntatW6dgMNj1NWHChOH/QDAS5/dpzaI8SepVvDu/X7Moj/O6AQAAAMScqNm9fNWqVaqtre36+uijj5weUkyZnx/ShsUzlB3svoQ8OxjQhsUzOKcbAAAAQEwa8j3dA8nOzpYkHTp0SKHQJyXr0KFDOuecc7oyVVVV3f5cW1ubqquru/58dna2Dh061C3T+X1npqekpCQlJSXZ8nPg5MzPD+nSvGwVl1erqr5JmakdS8qZ4QYAAAAQq2yd6c7NzVV2dra2b9/e9VhdXZ127typgoICSVJBQYFqamr0xhtvdGVefPFFhcNhzZo1qyuzY8cOtba2dmWef/55nXHGGRo7dqydQ4bN4vw+FUzN0BXnnKqCqRkUbgAAAAAxbcil+9ixYyopKVFJSYmkjs3TSkpKtH//fvl8Pt1yyy265557tHXrVr399tu6/vrrlZOToyuvvFKS9KlPfUrz58/XDTfcoOLiYv3lL3/RN7/5TV1zzTXKycmRJF177bVKTEzU0qVL9c477+jJJ5/UT3/6U61cudK2HxwAAAAAgEjzWZZlDeUP/OlPf9Ill1zS6/ElS5Zo06ZNsixLa9as0cMPP6yamhpddNFF+vnPf65/+qd/6spWV1frm9/8prZt2ya/368vfelLevDBBzV69OiuzFtvvaUVK1botdde07hx4/Stb31Lt912m/E46+rqFAwGVVtbq7S0tKH8iAAAAAAADMi0cw65dHsFpRsAAAAAECmmnTNqdi8HAAAAAMBtKN0AAAAAAEQIpRsAAAAAgAihdAMAAAAAECGUbgAAAAAAIoTSDQAAAABAhFC6AQAAAACIEEo3AAAAAAARQukGAAAAACBCKN0AAAAAAEQIpRsAAAAAgAihdAMAAAAAECGUbgAAAAAAIoTSDQAAAABAhFC6AQAAAACIEEo3AAAAAAARQukGAAAAACBCKN0AAAAAAEQIpRsAAAAAgAihdAMAAAAAECGUbgAAAAAAIoTSDQAAAABAhFC6AQAAAACIEEo3AAAAAAARQukGAAAAACBC4p0eQCxrD1sqLq9WVX2TMlMDmpmbrji/z+lhAQAAAABsQul2SGFphdZuK1NFbVPXY6FgQGsW5Wl+fsjBkQEAAAAA7MLycgcUllZo+eZd3Qq3JFXWNmn55l0qLK1waGQAAAAAADtRukdYe9jS2m1lsvp4rvOxtdvK1B7uKwEAAAAA8BJK9wgrLq/uNcN9IktSRW2TisurR25QAAAAAICIoHSPsKr6/gv3yeQAAAAAAO5F6R5hmakBW3MAAAAAAPeidI+wmbnpCgUD6u9gMJ86djGfmZs+ksMCAAAAAEQApXuExfl9WrMoT5J6Fe/O79csyuO8bgAAAACIApRuB8zPD2nD4hnKDnZfQp4dDGjD4hmc0w0AAAAAUSLe6QHEqvn5IV2al63i8mpV1TcpM7VjSTkz3AAAAAAQPSjdDorz+1QwNcPpYQAAAAAAIoTl5QAAAAAARAilGwAAAACACKF0AwAAAAAQIZRuAAAAAAAihNINAAAAAECEULoBAAAAAIgQSjcAAAAAABFC6QYAAAAAIEIo3QAAAAAARAilGwAAAACACKF0AwAAAAAQIZRuAAAAAAAihNINAAAAAECEULoBAAAAAIgQSjcAAAAAABFC6QYAAAAAIEIo3QAAAAAAREi80wOIFMuyJEl1dXUOjwQAAAAAEG06u2Zn9+xP1Jbu+vp6SdKECRMcHgkAAAAAIFrV19crGAz2+7zPGqyWe1Q4HNbBgweVmpoqn8/n9HBiSl1dnSZMmKCPPvpIaWlpTg8HfeA9cj/eI/fjPXI/3iP34z1yP94j9+M9co5lWaqvr1dOTo78/v7v3I7amW6/36/x48c7PYyYlpaWxn/4Lsd75H68R+7He+R+vEfux3vkfrxH7sd75IyBZrg7sZEaAAAAAAARQukGAAAAACBCKN2wXVJSktasWaOkpCSnh4J+8B65H++R+/EeuR/vkfvxHrkf75H78R65X9RupAYAAAAAgNOY6QYAAAAAIEIo3QAAAAAARAilGwAAAACACKF0AwAAAAAQIZRunLQDBw5o8eLFysjIUHJysqZNm6bXX3+96/ljx47pm9/8psaPH6/k5GTl5eVp48aNDo44tkyePFk+n6/X14oVKyRJTU1NWrFihTIyMjR69Gh96Utf0qFDhxwedWwZ6D2qrq7Wt771LZ1xxhlKTk7WxIkTdfPNN6u2ttbpYceUwf476mRZlhYsWCCfz6ctW7Y4M9gYZfIeFRUV6fOf/7xGjRqltLQ0zZ49W8ePH3dw1LFlsPeosrJS1113nbKzszVq1CjNmDFDv/vd7xwedWxpb2/XHXfcodzcXCUnJ2vq1Km6++67deJ+y5ZlafXq1QqFQkpOTtacOXO0e/duB0cdewZ7n1pbW3Xbbbdp2rRpGjVqlHJycnT99dfr4MGDDo8c8U4PAN509OhRfeYzn9Ell1yi//7v/9Ypp5yi3bt3a+zYsV2ZlStX6sUXX9TmzZs1efJkPffcc/qXf/kX5eTk6PLLL3dw9LHhtddeU3t7e9f3paWluvTSS/XlL39ZkvSd73xHf/zjH/Xb3/5WwWBQ3/zmN3XVVVfpL3/5i1NDjjkDvUcHDx7UwYMHdf/99ysvL08ffvihbrrpJh08eFD/7//9PwdHHVsG+++o0wMPPCCfzzfSw4MGf4+Kioo0f/58rVq1Sj/72c8UHx+vN998U34/8w4jZbD36Prrr1dNTY22bt2qcePG6Te/+Y2uvvpqvf766zr33HOdGnZM+bd/+zdt2LBBjz32mM466yy9/vrr+vrXv65gMKibb75ZkrR+/Xo9+OCDeuyxx5Sbm6s77rhD8+bNU1lZmQKBgMM/QWwY7H1qbGzUrl27dMcdd2j69Ok6evSovv3tb+vyyy/vNjEGB1jASbjtttusiy66aMDMWWedZd11113dHpsxY4b1wx/+MJJDQz++/e1vW1OnTrXC4bBVU1NjJSQkWL/97W+7nn/33XctSVZRUZGDo4xtJ75HfXnqqaesxMREq7W1dYRHhk59vUd//etfrVNPPdWqqKiwJFl/+MMfnBsger1Hs2bNsm6//XaHR4UT9XyPRo0aZT3++OPdMunp6dYjjzzixPBi0sKFC61vfOMb3R676qqrrK9+9auWZVlWOBy2srOzrfvuu6/r+ZqaGispKcn6z//8zxEdaywb7H3qS3FxsSXJ+vDDDyM9PAyAy7w4KVu3btX555+vL3/5y8rMzNS5556rRx55pFvmwgsv1NatW3XgwAFZlqWXXnpJ77//vubOnevQqGNXS0uLNm/erG984xvy+Xx644031Nraqjlz5nRlzjzzTE2cOFFFRUUOjjR29XyP+lJbW6u0tDTFx7NIyQl9vUeNjY269tpr9dBDDyk7O9vhEaLne1RVVaWdO3cqMzNTF154obKysvTZz35Wf/7zn50easzq67+jCy+8UE8++aSqq6sVDof1X//1X2pqatLnPvc5ZwcbQy688EJt375d77//viTpzTff1J///GctWLBAklReXq7KyspunxuCwaBmzZrF54YRNNj71Jfa2lr5fD6NGTNmhEaJvvDJDSdl79692rBhg1auXKkf/OAHeu2113TzzTcrMTFRS5YskST97Gc/07JlyzR+/HjFx8fL7/frkUce0ezZsx0efezZsmWLampq9LWvfU1Sx/1ziYmJvf4BzsrKUmVl5cgPEL3eo54OHz6su+++W8uWLRvZgaFLX+/Rd77zHV144YW64oornBsYuvR8j/bu3StJuvPOO3X//ffrnHPO0eOPP64vfOELKi0t1emnn+7gaGNTX/8dPfXUU/rKV76ijIwMxcfHKyUlRX/4wx902mmnOTfQGPP9739fdXV1OvPMMxUXF6f29nb96Ec/0le/+lVJ6vpskJWV1e3P8blhZA32PvXU1NSk2267Tf/n//wfpaWljfBocSJKN05KOBzW+eefr3/913+VJJ177rkqLS3Vxo0bu5XuV199VVu3btWkSZO0Y8cOrVixQjk5Od2ulCLyfvnLX2rBggXKyclxeijox0DvUV1dnRYuXKi8vDzdeeedIz84SOr9Hm3dulUvvvii/vrXvzo8MnTq+R6Fw2FJ0o033qivf/3rkjp+X23fvl2/+tWvtG7dOsfGGqv6+rfujjvuUE1NjV544QWNGzdOW7Zs0dVXX62XX35Z06ZNc3C0seOpp57Sr3/9a/3mN7/RWWedpZKSEt1yyy3Kycnp+lwH5w3lfWptbdXVV18ty7K0YcMGh0aMLk6vb4c3TZw40Vq6dGm3x37+859bOTk5lmVZVmNjo5WQkGA988wz3TJLly615s2bN2LjhGXt27fP8vv91pYtW7oe2759uyXJOnr0aLfsxIkTrR//+McjPEL09R51qqurswoKCqwvfOEL1vHjxx0YHSyr7/fo29/+tuXz+ay4uLiuL0mW3++3PvvZzzo32BjV13u0d+9eS5L1xBNPdMteffXV1rXXXjvSQ4x5fb1He/bssSRZpaWl3bJf+MIXrBtvvHGkhxizxo8fb/3Hf/xHt8fuvvtu64wzzrAsy7I++OADS5L117/+tVtm9uzZ1s033zxSw4x5g71PnVpaWqwrr7zSOvvss63Dhw+P5BDRD+7pxkn5zGc+o/fee6/bY++//74mTZokqePqWmtra6/dYePi4rpmHjAyHn30UWVmZmrhwoVdj5133nlKSEjQ9u3bux577733tH//fhUUFDgxzJjW13skdcxwz507V4mJidq6dSu7wzqor/fo+9//vt566y2VlJR0fUnST37yEz366KMOjTR29fUeTZ48WTk5OQP+vsLI6es9amxslCQ+LzissbFxwPcgNzdX2dnZ3T431NXVaefOnXxuGEGDvU/SJzPcu3fv1gsvvKCMjIyRHib64nTrhzcVFxdb8fHx1o9+9CNr9+7d1q9//WsrJSXF2rx5c1fms5/9rHXWWWdZL730krV3717r0UcftQKBgPXzn//cwZHHlvb2dmvixInWbbfd1uu5m266yZo4caL14osvWq+//rpVUFBgFRQUODDK2Nbfe1RbW2vNmjXLmjZtmrVnzx6roqKi66utrc2h0camgf476knsXu6Igd6jn/zkJ1ZaWpr129/+1tq9e7d1++23W4FAwNqzZ48DI41d/b1HLS0t1mmnnWZdfPHF1s6dO609e/ZY999/v+Xz+aw//vGPDo029ixZssQ69dRTrWeeecYqLy+3fv/731vjxo2zbr311q7Mvffea40ZM8Z6+umnrbfeesu64oorrNzcXFZhjaDB3qeWlhbr8ssvt8aPH2+VlJR0++zQ3Nzs8OhjG6UbJ23btm1Wfn6+lZSUZJ155pnWww8/3O35iooK62tf+5qVk5NjBQIB64wzzrD+/d//vd/jkGC///mf/7EkWe+9916v544fP279y7/8izV27FgrJSXF+uIXv2hVVFQ4MMrY1t979NJLL1mS+vwqLy93ZrAxaqD/jnqidDtjsPdo3bp11vjx462UlBSroKDAevnll0d4hBjoPXr//fetq666ysrMzLRSUlKss88+u9cRYoisuro669vf/rY1ceJEKxAIWFOmTLF++MMfditq4XDYuuOOO6ysrCwrKSnJ+sIXvmD07yLsM9j7VF5e3u9nh5deesnZwcc4n2VZ1ghPrgMAAAAAEBO4pxsAAAAAgAihdAMAAAAAECGUbgAAAAAAIoTSDQAAAABAhFC6AQAAAACIEEo3AAAAAAARQukGAAAAACBCKN0AAAAAAEQIpRsAAAAAgAihdAMAAAAAECGUbgAAAAAAIoTSDQAAAABAhPz/a10cYjkok9IAAAAASUVORK5CYII=\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.figure(figsize=(10,6))\n", + "plt.scatter(heights, salaries)\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U_hRjJkMIUXJ" + }, + "source": [ + "> Can you guess why the dots line up into vertical lines like this?\n", + "\n", + "We have observed the correlation between an artificially engineered concept like salary and the observed variable *height*. Let's also see if the two observed variables, such as height and weight, correlate too:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ba1ty3GoIUXK", + "outputId": "56ecb028-8d9a-4777-b028-e72fc2028498" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[ 1., nan],\n", + " [nan, nan]])" + ] + }, + "metadata": {}, + "execution_count": 27 + } + ], + "source": [ + "np.corrcoef(df['Height'],df['Weight'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0o3CEkTHIUXL" + }, + "source": [ + "Unfortunately, we did not get any results - only some strange `nan` values. This is due to the fact that some of the values in our series are undefined, represented as `nan`, which causes the result of the operation to be undefined as well. By looking at the matrix we can see that `Weight` is the problematic column, because self-correlation between `Height` values has been computed.\n", + "\n", + "> This example shows the importance of **data preparation** and **cleaning**. Without proper data we cannot compute anything.\n", + "\n", + "Let's use `fillna` method to fill the missing values, and compute the correlation:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "C1wzEgHGIUXM", + "outputId": "f953f94a-e9d6-41a3-e53f-7c5a72e38f8c" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[1. , 0.52959196],\n", + " [0.52959196, 1. ]])" + ] + }, + "metadata": {}, + "execution_count": 28 + } + ], + "source": [ + "np.corrcoef(df['Height'],df['Weight'].fillna(method='pad'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "59HPAOScIUXM" + }, + "source": [ + "There is indeed a correlation, but not such a strong one as in our artificial example. Indeed, if we look at the scatter plot of one value against the other, the relation would be much less obvious:" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 607 + }, + "id": "LR27XJe9IUXM", + "outputId": "a1d4bb37-9c83-49b3-b299-d8b8bc365fd6" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.figure(figsize=(10,6))\n", + "plt.scatter(df['Height'],df['Weight'])\n", + "plt.xlabel('Height')\n", + "plt.ylabel('Weight')\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "O-i0GjqIIUXM" + }, + "source": [ + "## Conclusion\n", + "\n", + "In this notebook we have learnt how to perform basic operations on data to compute statistical functions. We now know how to use a sound apparatus of math and statistics in order to prove some hypotheses, and how to compute confidence intervals for arbitrary variables given a data sample." + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Challenge\n" + ], + "metadata": { + "id": "z9YubMYAuAG7" + } + }, + { + "cell_type": "markdown", + "source": [ + "Use the sample code in the notebook to test other hypothesis that:\n", + "\n", + "1. First basemen are older than second basemen\n", + "2. First basemen are taller than third basemen\n", + "3. Shortstops are taller than second basemen\n" + ], + "metadata": { + "id": "q0cxF4-2uQg5" + } + }, + { + "cell_type": "markdown", + "source": [ + "### First basemen are older than second basemen" + ], + "metadata": { + "id": "eqa9jPuIuWrz" + } + }, + { + "cell_type": "markdown", + "source": [ + "Let's test the hypothesis that First Basemen are older than Second Basemen. The simplest way to do this is to test the confidence intervals:" + ], + "metadata": { + "id": "sDiN7_wpvPaZ" + } + }, + { + "cell_type": "code", + "source": [ + "for p in [0.85,0.9,0.95]:\n", + " m1, h1 = mean_confidence_interval(df.loc[df['Role']=='First_Baseman',['Age']],p)\n", + " m2, h2 = mean_confidence_interval(df.loc[df['Role']=='Second_Baseman',['Age']],p)\n", + " print(f'Conf={p:.2f}, 1st basemen age: {m1-h1[0]:.2f}..{m1+h1[0]:.2f}, 2nd basemen age: {m2-h2[0]:.2f}..{m2+h2[0]:.2f}')" + ], + "metadata": { + "id": "17SWW8n1uL-s", + "outputId": "5a4a6929-c3f0-4257-b7b4-b7257d7e773b", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 30, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Conf=0.85, 1st basemen age: 28.56..30.39, 2nd basemen age: 28.18..29.87\n", + "Conf=0.90, 1st basemen age: 28.42..30.53, 2nd basemen age: 28.06..29.99\n", + "Conf=0.95, 1st basemen age: 28.22..30.73, 2nd basemen age: 27.87..30.18\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "We can see that the intervals do overlap.\n", + "\n", + "A statistically more correct way to prove the hypothesis is to use a **Student t-test**:\n" + ], + "metadata": { + "id": "_kjnTN57voE1" + } + }, + { + "cell_type": "code", + "source": [ + "tval, pval = ttest_ind(df.loc[df['Role']=='First_Baseman',['Age']], df.loc[df['Role']=='Second_Baseman',['Age']],equal_var=False)\n", + "print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")" + ], + "metadata": { + "id": "MHUMgyBGv1Wo", + "outputId": "09f8e24e-6f67-4eaf-a6c1-f58260f31064", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 31, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "T-value = 0.53\n", + "P-value: 0.6005513264471434\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "\n", + "\n", + "* The p-value is the probability of obtaining a result at least as extreme as the one observed, assuming that the null hypothesis is true (p-value can be considered as the probability of two distributions having the same mean). In our case, the null hypothesis would be that there is no difference in ages between First basemen and second basemen of players. A p-value of 0.60 indicates that, assuming the null hypothesis is true, there is a 60% probability of obtaining a result at least as extreme as the one observed. In general, if the p-value is less than a predefined significance level (for example, 0.05), the null hypothesis is rejected and it is concluded that there is a significant difference between the groups. However, since your p-value is greater than 0.05, there is not enough evidence to reject the null hypothesis and conclude that there is a significant difference in ages between the two groups of players.\n" + ], + "metadata": { + "id": "-Qx9SQL7wETA" + } + }, + { + "cell_type": "markdown", + "source": [ + "* The t-value is a statistic that measures the size of the difference between two groups in relation to the variation in the sample data. In other words, the t-value is simply the calculated difference represented in units of standard error. The larger the magnitude of the t-value, the greater the evidence against the null hypothesis.\n", + "\n", + " In your case, if the t-value is equal to 0.53, this indicates that the difference between the two groups is not very large in relation to the variation in the sample data. A small t-value suggests that there is not enough evidence to reject the null hypothesis and conclude that there is a significant difference between the two groups." + ], + "metadata": { + "id": "fVtApZXSy7-k" + } + }, + { + "cell_type": "markdown", + "source": [ + "### First basemen are taller than third basemen" + ], + "metadata": { + "id": "5m3HHZvDzcch" + } + }, + { + "cell_type": "markdown", + "source": [ + "Let's test the hypothesis that First Basemen are taller than Third Basemen. The simplest way to do this is to test the confidence intervals:" + ], + "metadata": { + "id": "THSfoe-r0BAn" + } + }, + { + "cell_type": "code", + "source": [ + "for p in [0.85,0.9,0.95]:\n", + " m1, h1 = mean_confidence_interval(df.loc[df['Role']=='First_Baseman',['Height']],p)\n", + " m3, h3 = mean_confidence_interval(df.loc[df['Role']=='Third_Baseman',['Height']],p)\n", + " print(f'Conf={p:.2f}, 1st basemen height: {m1-h1[0]:.2f}..{m1+h1[0]:.2f}, 3rd basemen height: {m3-h3[0]:.2f}..{m3+h3[0]:.2f}')" + ], + "metadata": { + "id": "HtLzYWJ5v3KO", + "outputId": "2b93373b-5669-4f67-ccf5-58fc8e611b8c", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 32, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Conf=0.85, 1st basemen height: 73.62..74.38, 3rd basemen height: 72.58..73.51\n", + "Conf=0.90, 1st basemen height: 73.56..74.44, 3rd basemen height: 72.51..73.58\n", + "Conf=0.95, 1st basemen height: 73.47..74.53, 3rd basemen height: 72.40..73.68\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "We can see that the intervals do not overlap.\n", + "\n", + "**student t-test**" + ], + "metadata": { + "id": "WeSVWGn10FDZ" + } + }, + { + "cell_type": "code", + "source": [ + "tval, pval = ttest_ind(df.loc[df['Role']=='First_Baseman',['Height']], df.loc[df['Role']=='Third_Baseman',['Height']],equal_var=False)\n", + "print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")" + ], + "metadata": { + "id": "j__x6Szdz67t", + "outputId": "60441609-435f-40d5-bc1c-f446935f39ae", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 37, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "T-value = 2.32\n", + "P-value: 0.02285634157510527\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "\n", + "\n", + "* A t-value of 2.32 indicates that there is a moderate difference in heights between the two groups in relation to the variation in the sample data.\n", + "* p-value of 0.023 indicates that, assuming the null hypothesis is true (i.e., that there is no difference in heights between the two groups), there is a 2.3% probability of obtaining a result at least as extreme as the one observed" + ], + "metadata": { + "id": "RY5vm27t1IaE" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Shortstops are taller than second basemen" + ], + "metadata": { + "id": "DKbjAJeX1kIu" + } + }, + { + "cell_type": "markdown", + "source": [ + "the confidence intervals." + ], + "metadata": { + "id": "d-Vwzkzw1p0W" + } + }, + { + "cell_type": "code", + "source": [ + "for p in [0.85,0.9,0.95]:\n", + " m1, h1 = mean_confidence_interval(df.loc[df['Role']=='Shortstop',['Height']],p)\n", + " m2, h2 = mean_confidence_interval(df.loc[df['Role']=='Second_Baseman',['Height']],p)\n", + " print(f'Conf={p:.2f}, Shortstop height: {m1-h1[0]:.2f}..{m1+h1[0]:.2f}, 2nd basemen height: {m2-h2[0]:.2f}..{m2+h2[0]:.2f}')" + ], + "metadata": { + "id": "CtimHO1V0Zu1", + "outputId": "abdb0d0c-5460-4e1e-b970-003f6ce30ad1", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 36, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Conf=0.85, Shortstop height: 71.54..72.27, 2nd basemen height: 71.04..71.69\n", + "Conf=0.90, Shortstop height: 71.49..72.32, 2nd basemen height: 70.99..71.73\n", + "Conf=0.95, Shortstop height: 71.40..72.40, 2nd basemen height: 70.92..71.81\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "We can see that the intervals do overlap.\n", + "\n", + "**student t-test**" + ], + "metadata": { + "id": "7IiPnuFR2NRW" + } + }, + { + "cell_type": "code", + "source": [ + "tval, pval = ttest_ind(df.loc[df['Role']=='Shortstop',['Height']], df.loc[df['Role']=='Second_Baseman',['Height']],equal_var=False)\n", + "print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")" + ], + "metadata": { + "id": "xWDTpnDI127B", + "outputId": "ce861612-fb9b-4f23-9bbd-a43cae0657a0", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": 38, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "T-value = 1.62\n", + "P-value: 0.10763413630751067\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "* A t-value of 1.62 indicates that there is a small difference in heights between the two groups in relation to the variation in the sample data.\n", + "* A p-value of 0.10763413630751067 indicates that, assuming the null hypothesis is true (i.e., that there is no difference in heights between the two groups), there is a 10.76% probability of obtaining a result at least as extreme as the one observed." + ], + "metadata": { + "id": "NE6XLRMX23qY" + } + } + ], + "metadata": { + "interpreter": { + "hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5" + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.12" + }, + "colab": { + "provenance": [], + "toc_visible": true, + "include_colab_link": true } - ], - "source": [ - "plt.figure(figsize=(10,6))\n", - "plt.scatter(df['Height'],df['Weight'])\n", - "plt.xlabel('Height')\n", - "plt.ylabel('Weight')\n", - "plt.tight_layout()\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Conclusion\n", - "\n", - "In this notebook we have learnt how to perform basic operations on data to compute statistical functions. We now know how to use a sound apparatus of math and statistics in order to prove some hypotheses, and how to compute confidence intervals for arbitrary variables given a data sample. " - ] - } - ], - "metadata": { - "interpreter": { - "hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.12" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file