Skip to content

Commit 17deff8

Browse files
committed
Added 3c bonus notebook.
1 parent 0f22ff9 commit 17deff8

File tree

1 file changed

+282
-0
lines changed

1 file changed

+282
-0
lines changed

notebooks/3c_bonus.ipynb

Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "b4f88aa3",
6+
"metadata": {},
7+
"source": [
8+
"# Bonus: Extended topics\n",
9+
"\n",
10+
"In this notebook, we will work with the following topics:\n",
11+
"\n",
12+
"1. Polars, an alternative to Pandas\n",
13+
"1. Type hints"
14+
]
15+
},
16+
{
17+
"cell_type": "code",
18+
"execution_count": null,
19+
"id": "bb59a9cc",
20+
"metadata": {},
21+
"outputs": [],
22+
"source": [
23+
"import pandas as pd\n",
24+
"import polars as pl"
25+
]
26+
},
27+
{
28+
"cell_type": "markdown",
29+
"id": "1818c085",
30+
"metadata": {},
31+
"source": [
32+
"# Polars\n",
33+
"\n",
34+
"[Polars](https://pola.rs) is a high performance dataframe package, written in the Rust programming language.\n",
35+
"It generally has high performance, and it has some conveniences for working with large datasets that will not fit in memory, particularly when we only need a subset or aggregation of the data.\n",
36+
"\n",
37+
"[Modern Polars](https://kevinheavey.github.io/modern-polars/) shows many examples of analogous Pandas and Polars code.\n",
38+
"\n",
39+
"Below, let's rework our Pandas code from before.\n",
40+
"Note that there are some efficiencies we could wring out here, but we're aiming for a similar flow to the original to make the comparison clearer."
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": null,
46+
"id": "4fa0f93f",
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"LOOKUP = {\"Microsoft\": \"MSFT\", \"Google\": \"GOOG\"}\n",
51+
"\n",
52+
"firmyear = (\n",
53+
" pl.from_pandas(pd.read_stata(\"../data/firmyear.dta\"))\n",
54+
" .with_columns(\n",
55+
" [\n",
56+
" pl.col(\"year\").cast(pl.Int32),\n",
57+
" pl.col(\"count_of_employees\").cast(pl.Int32),\n",
58+
" pl.col(\"name\").replace(LOOKUP).alias(\"id_ticker\"),\n",
59+
" ]\n",
60+
" )\n",
61+
" .rename({\"count_of_employees\": \"size_emp\"})\n",
62+
" .sort([\"name\", \"year\"])\n",
63+
" .with_columns(\n",
64+
" [\n",
65+
" pl.col(\"size_emp\").diff().over(\"name\").alias(\"size_emp_change\"),\n",
66+
" pl.col(\"id_ticker\").str.to_lowercase(),\n",
67+
" ]\n",
68+
" )\n",
69+
")\n",
70+
"\n",
71+
"firmyear.head()"
72+
]
73+
},
74+
{
75+
"cell_type": "code",
76+
"execution_count": null,
77+
"id": "d5eb81c7",
78+
"metadata": {},
79+
"outputs": [],
80+
"source": [
81+
"stock = pl.read_csv(\"../data/stock.csv\")\n",
82+
"\n",
83+
"firmyear = firmyear.join(\n",
84+
" stock, how=\"left\", left_on=[\"id_ticker\", \"year\"], right_on=[\"tic\", \"yr\"]\n",
85+
")\n",
86+
"\n",
87+
"firmyear.head(6)"
88+
]
89+
},
90+
{
91+
"cell_type": "code",
92+
"execution_count": null,
93+
"id": "49c5374e",
94+
"metadata": {},
95+
"outputs": [],
96+
"source": [
97+
"msft_nyt = pl.read_csv(\"../data/msft_nyt.csv\").with_columns(\n",
98+
" [pl.col(\"pub_date\").str.strptime(pl.Date, \"%Y-%m-%d %H:%M:%S\")],\n",
99+
")\n",
100+
"\n",
101+
"msft_nyt.head()"
102+
]
103+
},
104+
{
105+
"cell_type": "code",
106+
"execution_count": null,
107+
"id": "46f3cf5c",
108+
"metadata": {},
109+
"outputs": [],
110+
"source": [
111+
"def query_docs(data: pl.DataFrame, ticker: str, year: int) -> pl.DataFrame:\n",
112+
" filtered = data.filter(\n",
113+
" (pl.col(\"id_ticker\") == ticker) & (pl.col(\"pub_date\").dt.year() == year)\n",
114+
" )\n",
115+
" agg = filtered.select(\n",
116+
" [\n",
117+
" pl.col(\"word_count\").mean().alias(\"wc_mean\"),\n",
118+
" pl.col(\"word_count\").sum().alias(\"wc_sum\"),\n",
119+
" ]\n",
120+
" )\n",
121+
" return agg.with_columns(\n",
122+
" [pl.lit(ticker).alias(\"id_ticker\"), pl.lit(year).alias(\"year\")]\n",
123+
" )"
124+
]
125+
},
126+
{
127+
"cell_type": "code",
128+
"execution_count": null,
129+
"id": "ebec4464",
130+
"metadata": {},
131+
"outputs": [],
132+
"source": [
133+
"results = pl.concat(\n",
134+
" [\n",
135+
" query_docs(msft_nyt, row[0], row[1])\n",
136+
" for row in firmyear.select([\"id_ticker\", \"year\"]).rows()\n",
137+
" ]\n",
138+
")\n",
139+
"\n",
140+
"firmyear = firmyear.join(results, on=[\"id_ticker\", \"year\"], how=\"left\")\n",
141+
"\n",
142+
"firmyear.head(6)"
143+
]
144+
},
145+
{
146+
"cell_type": "markdown",
147+
"id": "378441db",
148+
"metadata": {},
149+
"source": [
150+
"As we can see above, we get substantially the same result, though it nicely omits the duplicated `tic` and `yr` columns from the pandas example."
151+
]
152+
},
153+
{
154+
"cell_type": "markdown",
155+
"id": "6a3b9ec2",
156+
"metadata": {},
157+
"source": [
158+
"# Type hints\n",
159+
"\n",
160+
"Python supports something called [type hints](https://docs.python.org/3/library/typing.html), which are a way of annotating our code to express what types we think variables and arguments should be.\n",
161+
"\n",
162+
"It is important to note that Python itself does not enforce these (cf. statically typed languages which do), and it will run code which violates the type hints with no warnings or errors.\n",
163+
"However, many tools that run in VS Code or other editors can use type hints in conjunction with other tools to help us spot our own errors in logic and provide richer information for assisting us.\n",
164+
"\n",
165+
"If you hover over the use of `query_docs()` above, you'll see annotated types for the arguments and the return type of the function itself.\n",
166+
"If you go back to our original pandas code without type hints, you'll see those as `Unknown`."
167+
]
168+
},
169+
{
170+
"cell_type": "code",
171+
"execution_count": null,
172+
"id": "e1de036a",
173+
"metadata": {},
174+
"outputs": [],
175+
"source": [
176+
"NO_HINT = \"No type hint provided\"\n",
177+
"HINT: str = \"Type hint!\""
178+
]
179+
},
180+
{
181+
"cell_type": "code",
182+
"execution_count": null,
183+
"id": "7ba122c7",
184+
"metadata": {},
185+
"outputs": [],
186+
"source": [
187+
"def countdown_no_hint(count):\n",
188+
" if count < 1 or count > 5:\n",
189+
" count = 5\n",
190+
" for i in range(count, 0, -1):\n",
191+
" print(f\"Counting down: {i}\")\n",
192+
" print(\"Done!\")"
193+
]
194+
},
195+
{
196+
"cell_type": "code",
197+
"execution_count": null,
198+
"id": "6a8b506d",
199+
"metadata": {},
200+
"outputs": [],
201+
"source": [
202+
"countdown_no_hint(3)"
203+
]
204+
},
205+
{
206+
"cell_type": "code",
207+
"execution_count": null,
208+
"id": "cd1b60db",
209+
"metadata": {},
210+
"outputs": [],
211+
"source": [
212+
"def countdown_hint(count: int) -> None:\n",
213+
" if count < 1 or count > 5:\n",
214+
" count = 5\n",
215+
" for i in range(count, 0, -1):\n",
216+
" print(f\"Counting down: {i}\")\n",
217+
" print(\"Done!\")"
218+
]
219+
},
220+
{
221+
"cell_type": "code",
222+
"execution_count": null,
223+
"id": "ac250024",
224+
"metadata": {},
225+
"outputs": [],
226+
"source": [
227+
"countdown_hint(3)"
228+
]
229+
},
230+
{
231+
"cell_type": "markdown",
232+
"id": "cd65c968",
233+
"metadata": {},
234+
"source": [
235+
"Similarly, if you type in this code,\n",
236+
"\n",
237+
"```python\n",
238+
"countdown_hint(3.0)\n",
239+
"```\n",
240+
"\n",
241+
"you will see a red underline noting that there is a type issue.\n",
242+
"But, there's no such warning with this code:\n",
243+
"\n",
244+
"```python\n",
245+
"countdown_no_hint(3.0)\n",
246+
"```\n"
247+
]
248+
},
249+
{
250+
"cell_type": "markdown",
251+
"id": "275ded91",
252+
"metadata": {},
253+
"source": [
254+
"**My advice:** use type hints for functions whenever you can.\n",
255+
"It's not always practical, because not every third-party package supports typing.\n",
256+
"However, a lot of them do now, so it's often straightforward to do so.\n",
257+
"This benefits you—via the better results from tools—and anyone reading your code who can more clearly see your intent."
258+
]
259+
}
260+
],
261+
"metadata": {
262+
"kernelspec": {
263+
"display_name": "Python 3",
264+
"language": "python",
265+
"name": "python3"
266+
},
267+
"language_info": {
268+
"codemirror_mode": {
269+
"name": "ipython",
270+
"version": 3
271+
},
272+
"file_extension": ".py",
273+
"mimetype": "text/x-python",
274+
"name": "python",
275+
"nbconvert_exporter": "python",
276+
"pygments_lexer": "ipython3",
277+
"version": "3.12.2"
278+
}
279+
},
280+
"nbformat": 4,
281+
"nbformat_minor": 5
282+
}

0 commit comments

Comments
 (0)