|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Top and Bottom Performing\n", |
| 8 | + "Let's look at how we might get the top performing stocks for a single period. For this example, we'll look at just a single month of closing prices:" |
| 9 | + ] |
| 10 | + }, |
| 11 | + { |
| 12 | + "cell_type": "code", |
| 13 | + "execution_count": 1, |
| 14 | + "metadata": {}, |
| 15 | + "outputs": [ |
| 16 | + { |
| 17 | + "data": { |
| 18 | + "text/html": [ |
| 19 | + "<div>\n", |
| 20 | + "<style scoped>\n", |
| 21 | + " .dataframe tbody tr th:only-of-type {\n", |
| 22 | + " vertical-align: middle;\n", |
| 23 | + " }\n", |
| 24 | + "\n", |
| 25 | + " .dataframe tbody tr th {\n", |
| 26 | + " vertical-align: top;\n", |
| 27 | + " }\n", |
| 28 | + "\n", |
| 29 | + " .dataframe thead th {\n", |
| 30 | + " text-align: right;\n", |
| 31 | + " }\n", |
| 32 | + "</style>\n", |
| 33 | + "<table border=\"1\" class=\"dataframe\">\n", |
| 34 | + " <thead>\n", |
| 35 | + " <tr style=\"text-align: right;\">\n", |
| 36 | + " <th></th>\n", |
| 37 | + " <th>A</th>\n", |
| 38 | + " <th>B</th>\n", |
| 39 | + " <th>C</th>\n", |
| 40 | + " <th>D</th>\n", |
| 41 | + " <th>E</th>\n", |
| 42 | + " <th>F</th>\n", |
| 43 | + " <th>G</th>\n", |
| 44 | + " <th>H</th>\n", |
| 45 | + " </tr>\n", |
| 46 | + " </thead>\n", |
| 47 | + " <tbody>\n", |
| 48 | + " <tr>\n", |
| 49 | + " <th>2018-02-01</th>\n", |
| 50 | + " <td>1</td>\n", |
| 51 | + " <td>12</td>\n", |
| 52 | + " <td>35</td>\n", |
| 53 | + " <td>3</td>\n", |
| 54 | + " <td>79</td>\n", |
| 55 | + " <td>2</td>\n", |
| 56 | + " <td>15</td>\n", |
| 57 | + " <td>59</td>\n", |
| 58 | + " </tr>\n", |
| 59 | + " </tbody>\n", |
| 60 | + "</table>\n", |
| 61 | + "</div>" |
| 62 | + ], |
| 63 | + "text/plain": [ |
| 64 | + " A B C D E F G H\n", |
| 65 | + "2018-02-01 1 12 35 3 79 2 15 59" |
| 66 | + ] |
| 67 | + }, |
| 68 | + "execution_count": 1, |
| 69 | + "metadata": {}, |
| 70 | + "output_type": "execute_result" |
| 71 | + } |
| 72 | + ], |
| 73 | + "source": [ |
| 74 | + "import pandas as pd\n", |
| 75 | + "\n", |
| 76 | + "month = pd.to_datetime('02/01/2018')\n", |
| 77 | + "close_month = pd.DataFrame(\n", |
| 78 | + " {\n", |
| 79 | + " 'A': 1,\n", |
| 80 | + " 'B': 12,\n", |
| 81 | + " 'C': 35,\n", |
| 82 | + " 'D': 3,\n", |
| 83 | + " 'E': 79,\n", |
| 84 | + " 'F': 2,\n", |
| 85 | + " 'G': 15,\n", |
| 86 | + " 'H': 59},\n", |
| 87 | + " [month])\n", |
| 88 | + "\n", |
| 89 | + "close_month" |
| 90 | + ] |
| 91 | + }, |
| 92 | + { |
| 93 | + "cell_type": "markdown", |
| 94 | + "metadata": {}, |
| 95 | + "source": [ |
| 96 | + "`close_month` gives use the prices for the month of February, 2018 for all the stocks in this universe (A, B, C, ...). Looking at these prices, we can see that the top 2 performing stocks for that month was E and H with the prices 79 and 59.\n", |
| 97 | + "\n", |
| 98 | + "To get this using code, we can use the [`Series.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.Series.nlargest.html) function. This function returns the items with the *n* largest numbers. For the example we just talked about, our *n* is 2." |
| 99 | + ] |
| 100 | + }, |
| 101 | + { |
| 102 | + "cell_type": "code", |
| 103 | + "execution_count": 2, |
| 104 | + "metadata": {}, |
| 105 | + "outputs": [ |
| 106 | + { |
| 107 | + "name": "stdout", |
| 108 | + "output_type": "stream", |
| 109 | + "text": [ |
| 110 | + "Error: nlargest() missing 1 required positional argument: 'columns'\n" |
| 111 | + ] |
| 112 | + } |
| 113 | + ], |
| 114 | + "source": [ |
| 115 | + "try:\n", |
| 116 | + " # Attempt to run nlargest\n", |
| 117 | + " close_month.nlargest(2)\n", |
| 118 | + "except TypeError as err:\n", |
| 119 | + " print('Error: {}'.format(err))" |
| 120 | + ] |
| 121 | + }, |
| 122 | + { |
| 123 | + "cell_type": "markdown", |
| 124 | + "metadata": {}, |
| 125 | + "source": [ |
| 126 | + "What happeened here? It turns out we're not calling the [`Series.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.Series.nlargest.html) function, we're actually calling [`DataFrame.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.nlargest.html), since `close_month` is a DataFrame. Let's get the Series from the dataframe using `.loc[month]`, where `month` is the 2018-02-01 index created above." |
| 127 | + ] |
| 128 | + }, |
| 129 | + { |
| 130 | + "cell_type": "code", |
| 131 | + "execution_count": 3, |
| 132 | + "metadata": {}, |
| 133 | + "outputs": [ |
| 134 | + { |
| 135 | + "data": { |
| 136 | + "text/plain": [ |
| 137 | + "E 79\n", |
| 138 | + "H 59\n", |
| 139 | + "Name: 2018-02-01 00:00:00, dtype: int64" |
| 140 | + ] |
| 141 | + }, |
| 142 | + "execution_count": 3, |
| 143 | + "metadata": {}, |
| 144 | + "output_type": "execute_result" |
| 145 | + } |
| 146 | + ], |
| 147 | + "source": [ |
| 148 | + "close_month.loc[month].nlargest(2)" |
| 149 | + ] |
| 150 | + }, |
| 151 | + { |
| 152 | + "cell_type": "markdown", |
| 153 | + "metadata": {}, |
| 154 | + "source": [ |
| 155 | + "Perfect! That gives us the top performing tickers for that month. Now, how do we get the bottom performing tickers? There's two ways to do this. You can use Panda's [`Series.nsmallest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.Series.nsmallest.html) function or just flip the sign on the prices and then apply [`DataFrame.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.nlargest.html). Either way is fine. For this course, we'll flip the sign with nlargest. This allows us to reuse any funtion created with nlargest to get the smallest.\n", |
| 156 | + "\n", |
| 157 | + "To get the bottom 2 performing tickers from `close_month`, we'll flip the sign." |
| 158 | + ] |
| 159 | + }, |
| 160 | + { |
| 161 | + "cell_type": "code", |
| 162 | + "execution_count": 4, |
| 163 | + "metadata": {}, |
| 164 | + "outputs": [ |
| 165 | + { |
| 166 | + "data": { |
| 167 | + "text/plain": [ |
| 168 | + "A -1\n", |
| 169 | + "F -2\n", |
| 170 | + "Name: 2018-02-01 00:00:00, dtype: int64" |
| 171 | + ] |
| 172 | + }, |
| 173 | + "execution_count": 4, |
| 174 | + "metadata": {}, |
| 175 | + "output_type": "execute_result" |
| 176 | + } |
| 177 | + ], |
| 178 | + "source": [ |
| 179 | + "(-1 * close_month).loc[month].nlargest(2)" |
| 180 | + ] |
| 181 | + }, |
| 182 | + { |
| 183 | + "cell_type": "markdown", |
| 184 | + "metadata": {}, |
| 185 | + "source": [ |
| 186 | + "That gives us the bottom performing tickers, but not the actual prices. To get this, we can flip the sign from the output of nlargest." |
| 187 | + ] |
| 188 | + }, |
| 189 | + { |
| 190 | + "cell_type": "code", |
| 191 | + "execution_count": 5, |
| 192 | + "metadata": {}, |
| 193 | + "outputs": [ |
| 194 | + { |
| 195 | + "data": { |
| 196 | + "text/plain": [ |
| 197 | + "A 1\n", |
| 198 | + "F 2\n", |
| 199 | + "Name: 2018-02-01 00:00:00, dtype: int64" |
| 200 | + ] |
| 201 | + }, |
| 202 | + "execution_count": 5, |
| 203 | + "metadata": {}, |
| 204 | + "output_type": "execute_result" |
| 205 | + } |
| 206 | + ], |
| 207 | + "source": [ |
| 208 | + "(-1 * close_month).loc[month].nlargest(2) *-1" |
| 209 | + ] |
| 210 | + }, |
| 211 | + { |
| 212 | + "cell_type": "markdown", |
| 213 | + "metadata": {}, |
| 214 | + "source": [ |
| 215 | + "Now you've seen how to get the top and bottom performing prices in a single month. Let's see if you can apply this knowledge.\n", |
| 216 | + "## Quiz\n", |
| 217 | + "Implement `date_top_industries` to find the top performing closing prices and return their sectors for a single date. The function should only return the [set](https://docs.python.org/3/tutorial/datastructures.html#sets) of sectors, there shouldn't be any duplicates returned.\n", |
| 218 | + "\n", |
| 219 | + "- The number of top performing prices to look at is represented by the parameter `top_n`.\n", |
| 220 | + "- The `date` parameter is the date to look for the top performing prices in the `prices` DataFrame.\n", |
| 221 | + "- The sector information for each ticker is located in the `sector` parameter.\n", |
| 222 | + "\n", |
| 223 | + "For example:\n", |
| 224 | + "```\n", |
| 225 | + " Prices\n", |
| 226 | + " A B C D E\n", |
| 227 | + "2013-07-08 2 2 7 2 6\n", |
| 228 | + "2013-07-09 5 3 6 7 5\n", |
| 229 | + "... ... ... ...\n", |
| 230 | + "\n", |
| 231 | + " Sector\n", |
| 232 | + "A \"Utilities\" \n", |
| 233 | + "B \"Health Care\" \n", |
| 234 | + "C \"Real Estate\"\n", |
| 235 | + "D \"Real Estate\"\n", |
| 236 | + "E \"Information Technology\"\n", |
| 237 | + "\n", |
| 238 | + "Date: 2013-07-09\n", |
| 239 | + "Top N: 3\n", |
| 240 | + "```\n", |
| 241 | + "The set created from the function `date_top_industries` should be the following:\n", |
| 242 | + "```\n", |
| 243 | + "{\"Utilities\", \"Real Estate\"}\n", |
| 244 | + "```\n", |
| 245 | + "*Note: Stock A and E have the same price for the date, but only A's sector got returned. We'll keep it simple and only take the first occurrences of ties.*" |
| 246 | + ] |
| 247 | + }, |
| 248 | + { |
| 249 | + "cell_type": "code", |
| 250 | + "execution_count": 12, |
| 251 | + "metadata": {}, |
| 252 | + "outputs": [ |
| 253 | + { |
| 254 | + "name": "stdout", |
| 255 | + "output_type": "stream", |
| 256 | + "text": [ |
| 257 | + "Tests Passed\n" |
| 258 | + ] |
| 259 | + } |
| 260 | + ], |
| 261 | + "source": [ |
| 262 | + "import project_tests\n", |
| 263 | + "\n", |
| 264 | + "\n", |
| 265 | + "def date_top_industries(prices, sector, date, top_n):\n", |
| 266 | + " \"\"\"\n", |
| 267 | + " Get the set of the top industries for the date\n", |
| 268 | + " \n", |
| 269 | + " Parameters\n", |
| 270 | + " ----------\n", |
| 271 | + " prices : DataFrame\n", |
| 272 | + " Prices for each ticker and date\n", |
| 273 | + " sector : Series\n", |
| 274 | + " Sector name for each ticker\n", |
| 275 | + " date : Date\n", |
| 276 | + " Date to get the top performers\n", |
| 277 | + " top_n : int\n", |
| 278 | + " Number of top performers to get\n", |
| 279 | + " \n", |
| 280 | + " Returns\n", |
| 281 | + " -------\n", |
| 282 | + " top_industries : set\n", |
| 283 | + " Top industries for the date\n", |
| 284 | + " \"\"\"\n", |
| 285 | + " # TODO: Implement Function\n", |
| 286 | + " short_performers = set(sector.loc[prices.loc[date].nlargest(top_n).index])\n", |
| 287 | + "\n", |
| 288 | + " \n", |
| 289 | + " return short_performers\n", |
| 290 | + "\n", |
| 291 | + "\n", |
| 292 | + "project_tests.test_date_top_industries(date_top_industries)" |
| 293 | + ] |
| 294 | + }, |
| 295 | + { |
| 296 | + "cell_type": "markdown", |
| 297 | + "metadata": {}, |
| 298 | + "source": [ |
| 299 | + "## Quiz Solution\n", |
| 300 | + "If you're having trouble, you can check out the quiz solution [here](top_and_bottom_performing_solution.ipynb)." |
| 301 | + ] |
| 302 | + } |
| 303 | + ], |
| 304 | + "metadata": { |
| 305 | + "kernelspec": { |
| 306 | + "display_name": "Python 3", |
| 307 | + "language": "python", |
| 308 | + "name": "python3" |
| 309 | + }, |
| 310 | + "language_info": { |
| 311 | + "codemirror_mode": { |
| 312 | + "name": "ipython", |
| 313 | + "version": 3 |
| 314 | + }, |
| 315 | + "file_extension": ".py", |
| 316 | + "mimetype": "text/x-python", |
| 317 | + "name": "python", |
| 318 | + "nbconvert_exporter": "python", |
| 319 | + "pygments_lexer": "ipython3", |
| 320 | + "version": "3.6.3" |
| 321 | + } |
| 322 | + }, |
| 323 | + "nbformat": 4, |
| 324 | + "nbformat_minor": 2 |
| 325 | +} |
0 commit comments