You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: polars/01_why_polars.py
+49-35Lines changed: 49 additions & 35 deletions
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@
9
9
10
10
importmarimo
11
11
12
-
__generated_with="0.11.0"
12
+
__generated_with="0.11.8"
13
13
app=marimo.App(width="medium")
14
14
15
15
@@ -19,19 +19,41 @@ def _():
19
19
return (mo,)
20
20
21
21
22
-
@app.cell
22
+
@app.cell(hide_code=True)
23
23
def_(mo):
24
24
mo.md(
25
25
"""
26
26
# An introduction to Polars
27
27
28
28
This notebook provides a birds-eye overview of [Polars](https://pola.rs/), a fast and user-friendly data manipulation library for Python, and compares it to alternatives like Pandas and PySpark.
29
-
30
-
Like Pandas and PySpark, the central data structure in Polars is **the DataFrame**, a tabular data structure consisting of named columns. For example, the next cell constructs a DataFrame that records the gender, age, and height in centimeters for a number of individuals.
31
-
32
-
<INSERT CODE CELL>
33
-
34
-
Unlike Python's earliest DataFrame library Pandas, Polars was designed with performance and usability in mind — Polars can scale to large datasets with ease while maintaining a simple and intuitive API.
29
+
30
+
Like Pandas and PySpark, the central data structure in Polars is **the DataFrame**, a tabular data structure consisting of named columns. For example, the next cell constructs a DataFrame that records the gender, age, and height in centimeters for a number of individuals.
Unlike Python's earliest DataFrame library Pandas, Polars was designed with performance and usability in mind — Polars can scale to large datasets with ease while maintaining a simple and intuitive API.
35
57
36
58
Polars' performance is due to a number of factors, including its implementation and rust and its ability to perform operations in a parallelized and vectorized manner. It supports a wide range of data types, advanced query optimizations, and seamless integration with other Python libraries, making it a versatile tool for data scientists, engineers, and analysts. Additionally, Polars provides a lazy API for deferred execution, allowing users to optimize their workflows by chaining operations and executing them in a single pass.
37
59
@@ -41,27 +63,26 @@ def _(mo):
41
63
return
42
64
43
65
44
-
@app.cell
66
+
@app.cell(hide_code=True)
45
67
def_(mo):
46
68
mo.md(
47
69
"""
48
70
## Choosing Polars over Pandas
49
71
50
-
51
72
In this section we'll give a few reasons why Polars is a better choice than Pandas, along with examples.
52
73
"""
53
74
)
54
75
return
55
76
56
77
57
-
@app.cell
78
+
@app.cell(hide_code=True)
58
79
def_(mo):
59
80
mo.md(
60
81
"""
61
82
### Intuitive syntax
62
83
63
84
Polars' syntax is similar to PySpark and intuitive like SQL, making heavy use of **method chaining**. This makes it easy for data professionals to transition to Polars, and leads to an API that is more concise and readable than Pandas.
64
-
85
+
65
86
**Example.** In the next few cells, we contrast the code to perform a basic filter and aggregation of data with Pandas to the code required to accomplish the same task with `Polars`.
66
87
"""
67
88
)
@@ -92,21 +113,15 @@ def _():
92
113
returndf_pd, filtered_df_pd, pd, result_pd
93
114
94
115
95
-
@app.cell
116
+
@app.cell(hide_code=True)
96
117
def_(mo):
97
-
mo.md(
98
-
r"""
99
-
The same example can be worked out in Polars more concisely, using method chaining. Notice how the Polars code is essentially as readable as English.
100
-
"""
101
-
)
118
+
mo.md(r"""The same example can be worked out in Polars more concisely, using method chaining. Notice how the Polars code is essentially as readable as English.""")
Notice how Polars uses a *method-chaining* approach, similar to PySpark, which makes the code more readable and expressive while using a *single line* to design the query.
131
-
132
146
Additionally, Polars supports SQL-like operations *natively*, that allows you to write SQL queries directly on polars dataframe:
133
147
"""
134
148
)
135
149
return
136
150
137
151
138
152
@app.cell
139
-
def_(df_pl):
140
-
result=df_pl.sql("SELECT Gender, AVG(Height_CM) FROM self WHERE Age > 15 GROUP BY Gender")
153
+
def_(data_pl):
154
+
result=data_pl.sql("SELECT Gender, AVG(Height_CM) FROM self WHERE Age > 15 GROUP BY Gender")
0 commit comments