Skip to content

Commit c5941d6

Browse files
committed
fixes
1 parent 91124fd commit c5941d6

File tree

1 file changed

+49
-35
lines changed

1 file changed

+49
-35
lines changed

polars/01_why_polars.py

Lines changed: 49 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
import marimo
1111

12-
__generated_with = "0.11.0"
12+
__generated_with = "0.11.8"
1313
app = marimo.App(width="medium")
1414

1515

@@ -19,19 +19,41 @@ def _():
1919
return (mo,)
2020

2121

22-
@app.cell
22+
@app.cell(hide_code=True)
2323
def _(mo):
2424
mo.md(
2525
"""
2626
# An introduction to Polars
2727
2828
This notebook provides a birds-eye overview of [Polars](https://pola.rs/), a fast and user-friendly data manipulation library for Python, and compares it to alternatives like Pandas and PySpark.
29-
30-
Like Pandas and PySpark, the central data structure in Polars is **the DataFrame**, a tabular data structure consisting of named columns. For example, the next cell constructs a DataFrame that records the gender, age, and height in centimeters for a number of individuals.
31-
32-
<INSERT CODE CELL>
33-
34-
Unlike Python's earliest DataFrame library Pandas, Polars was designed with performance and usability in mind — Polars can scale to large datasets with ease while maintaining a simple and intuitive API.
29+
30+
Like Pandas and PySpark, the central data structure in Polars is **the DataFrame**, a tabular data structure consisting of named columns. For example, the next cell constructs a DataFrame that records the gender, age, and height in centimeters for a number of individuals.
31+
"""
32+
)
33+
return
34+
35+
36+
@app.cell
37+
def _():
38+
import polars as pl
39+
40+
df_pl = pl.DataFrame(
41+
{
42+
"gender": ["Male", "Female", "Male", "Female", "Male", "Female",
43+
"Male", "Female", "Male", "Female"],
44+
"age": [13, 15, 17, 19, 21, 23, 25, 27, 29, 31],
45+
"height_cm": [150.0, 170.0, 146.5, 142.0, 155.0, 165.0, 170.8, 130.0, 132.5, 162.0]
46+
}
47+
)
48+
df_pl
49+
return df_pl, pl
50+
51+
52+
@app.cell(hide_code=True)
53+
def _(mo):
54+
mo.md(
55+
"""
56+
Unlike Python's earliest DataFrame library Pandas, Polars was designed with performance and usability in mind — Polars can scale to large datasets with ease while maintaining a simple and intuitive API.
3557
3658
Polars' performance is due to a number of factors, including its implementation and rust and its ability to perform operations in a parallelized and vectorized manner. It supports a wide range of data types, advanced query optimizations, and seamless integration with other Python libraries, making it a versatile tool for data scientists, engineers, and analysts. Additionally, Polars provides a lazy API for deferred execution, allowing users to optimize their workflows by chaining operations and executing them in a single pass.
3759
@@ -41,27 +63,26 @@ def _(mo):
4163
return
4264

4365

44-
@app.cell
66+
@app.cell(hide_code=True)
4567
def _(mo):
4668
mo.md(
4769
"""
4870
## Choosing Polars over Pandas
4971
50-
5172
In this section we'll give a few reasons why Polars is a better choice than Pandas, along with examples.
5273
"""
5374
)
5475
return
5576

5677

57-
@app.cell
78+
@app.cell(hide_code=True)
5879
def _(mo):
5980
mo.md(
6081
"""
6182
### Intuitive syntax
6283
6384
Polars' syntax is similar to PySpark and intuitive like SQL, making heavy use of **method chaining**. This makes it easy for data professionals to transition to Polars, and leads to an API that is more concise and readable than Pandas.
64-
85+
6586
**Example.** In the next few cells, we contrast the code to perform a basic filter and aggregation of data with Pandas to the code required to accomplish the same task with `Polars`.
6687
"""
6788
)
@@ -92,21 +113,15 @@ def _():
92113
return df_pd, filtered_df_pd, pd, result_pd
93114

94115

95-
@app.cell
116+
@app.cell(hide_code=True)
96117
def _(mo):
97-
mo.md(
98-
r"""
99-
The same example can be worked out in Polars more concisely, using method chaining. Notice how the Polars code is essentially as readable as English.
100-
"""
101-
)
118+
mo.md(r"""The same example can be worked out in Polars more concisely, using method chaining. Notice how the Polars code is essentially as readable as English.""")
102119
return
103120

104121

105122
@app.cell
106-
def _():
107-
import polars as pl
108-
109-
df_pl = pl.DataFrame(
123+
def _(pl):
124+
data_pl = pl.DataFrame(
110125
{
111126
"Gender": ["Male", "Female", "Male", "Female", "Male", "Female",
112127
"Male", "Female", "Male", "Female"],
@@ -118,31 +133,30 @@ def _():
118133
# query: average height of male and female after the age of 15 years
119134

120135
# filter, groupby and aggregation using method chaining
121-
result_pl = df_pl.filter(pl.col("Age") > 15).group_by("Gender").agg(pl.mean("Height_CM"))
136+
result_pl = data_pl.filter(pl.col("Age") > 15).group_by("Gender").agg(pl.mean("Height_CM"))
122137
result_pl
123-
return df_pl, pl, result_pl
138+
return data_pl, result_pl
124139

125140

126-
@app.cell
141+
@app.cell(hide_code=True)
127142
def _(mo):
128143
mo.md(
129144
"""
130145
Notice how Polars uses a *method-chaining* approach, similar to PySpark, which makes the code more readable and expressive while using a *single line* to design the query.
131-
132146
Additionally, Polars supports SQL-like operations *natively*, that allows you to write SQL queries directly on polars dataframe:
133147
"""
134148
)
135149
return
136150

137151

138152
@app.cell
139-
def _(df_pl):
140-
result = df_pl.sql("SELECT Gender, AVG(Height_CM) FROM self WHERE Age > 15 GROUP BY Gender")
153+
def _(data_pl):
154+
result = data_pl.sql("SELECT Gender, AVG(Height_CM) FROM self WHERE Age > 15 GROUP BY Gender")
141155
result
142156
return (result,)
143157

144158

145-
@app.cell
159+
@app.cell(hide_code=True)
146160
def _(mo):
147161
mo.md(
148162
"""
@@ -154,7 +168,7 @@ def _(mo):
154168
return
155169

156170

157-
@app.cell
171+
@app.cell(hide_code=True)
158172
def _(mo):
159173
mo.md(
160174
"""
@@ -178,7 +192,7 @@ def _(mo):
178192
return
179193

180194

181-
@app.cell
195+
@app.cell(hide_code=True)
182196
def _(mo):
183197
mo.md(
184198
"""
@@ -211,7 +225,7 @@ def _(mo):
211225
return
212226

213227

214-
@app.cell
228+
@app.cell(hide_code=True)
215229
def _(mo):
216230
mo.md(
217231
"""
@@ -249,7 +263,7 @@ def _(mo):
249263
return
250264

251265

252-
@app.cell
266+
@app.cell(hide_code=True)
253267
def _(mo):
254268
mo.md(
255269
"""
@@ -268,7 +282,7 @@ def _(mo):
268282
return
269283

270284

271-
@app.cell
285+
@app.cell(hide_code=True)
272286
def _(mo):
273287
mo.md(
274288
"""
@@ -282,7 +296,7 @@ def _(mo):
282296
return
283297

284298

285-
@app.cell
299+
@app.cell(hide_code=True)
286300
def _(mo):
287301
mo.md(
288302
"""

0 commit comments

Comments
 (0)