You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: polars/09_strings.py
+35-35Lines changed: 35 additions & 35 deletions
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@
14
14
app=marimo.App(width="medium")
15
15
16
16
17
-
@app.cell
17
+
@app.cell(hide_code=True)
18
18
def_(mo):
19
19
mo.md(
20
20
r"""
@@ -30,7 +30,7 @@ def _(mo):
30
30
return
31
31
32
32
33
-
@app.cell
33
+
@app.cell(hide_code=True)
34
34
def_(mo):
35
35
mo.md(
36
36
r"""
@@ -43,7 +43,7 @@ def _(mo):
43
43
return
44
44
45
45
46
-
@app.cell(hide_code=True)
46
+
@app.cell
47
47
def_(pl):
48
48
pip_metadata_raw_df=pl.DataFrame(
49
49
[
@@ -56,7 +56,7 @@ def _(pl):
56
56
return (pip_metadata_raw_df,)
57
57
58
58
59
-
@app.cell
59
+
@app.cell(hide_code=True)
60
60
def_(mo):
61
61
mo.md(r"""We can use the [`json_decode`](https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.str.json_decode.html) expression to parse the raw JSON strings into Polars-native structs and we can use the [unnest](https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.unnest.html) dataframe operation to have a dedicated column per parsed attribute.""")
mo.md(r"""This is already a much friendlier representation of the data we started out with, but note that since the JSON entries had only string attributes, all values are strings, even the temporal `released_at` and numerical `size_mb` columns.""")
75
75
return
76
76
77
77
78
-
@app.cell
78
+
@app.cell(hide_code=True)
79
79
def_(mo):
80
80
mo.md(r"""As we know that the `size_mb` column should have a decimal representation, we go ahead and use [`to_decimal`](https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.str.to_decimal.html#polars.Expr.str.to_decimal) to perform the conversion.""")
81
81
return
@@ -91,7 +91,7 @@ def _(pip_metadata_df, pl):
91
91
return
92
92
93
93
94
-
@app.cell
94
+
@app.cell(hide_code=True)
95
95
def_(mo):
96
96
mo.md(
97
97
r"""
@@ -127,7 +127,7 @@ def _(pip_metadata_df, pl):
127
127
return
128
128
129
129
130
-
@app.cell
130
+
@app.cell(hide_code=True)
131
131
def_(mo):
132
132
mo.md(r"""Alternatively, instead of using three different functions to perform the conversion to date, we can use a single one, [`strptime`](https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.str.strptime.html) which takes the desired temporal data type as its first parameter.""")
133
133
return
@@ -145,7 +145,7 @@ def _(pip_metadata_df, pl):
145
145
return
146
146
147
147
148
-
@app.cell
148
+
@app.cell(hide_code=True)
149
149
def_(mo):
150
150
mo.md(r"""And to wrap up this section on parsing and conversion, let's consider a final scenario. What if we don't want to parse the entire raw JSON string, because we only need a subset of its attributes? Well, in this case we can leverage the [`json_path_match`](https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.str.json_path_match.html) expression to extract only the desired attributes using standard [JSONPath](https://goessner.net/articles/JsonPath/) syntax.""")
mo.md(r"""As the following visualization shows, `str` is one of the richest Polars expression namespaces with multiple dozens of functions in it.""")
223
223
return
@@ -232,7 +232,7 @@ def _(alt, expressions_df):
232
232
return
233
233
234
234
235
-
@app.cell
235
+
@app.cell(hide_code=True)
236
236
def_(mo):
237
237
mo.md(
238
238
r"""
@@ -260,7 +260,7 @@ def _(expressions_df, pl):
260
260
return (docstring_length_df,)
261
261
262
262
263
-
@app.cell
263
+
@app.cell(hide_code=True)
264
264
def_(mo):
265
265
mo.md(r"""As the dataframe preview above and the scatterplot below show, the docstring length measured in bytes is almost always bigger than the length expressed in characters. This is due to the fact that the docstrings include characters which require more than a single byte to represent, such as "╞" for displaying dataframe header and body separators.""")
mo.md(r"""For scenarios where we want to combine multiple substrings to check for, we can use the [`contains`](https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.str.contains.html) expression to check for the presence of various patterns.""")
466
466
return
@@ -476,7 +476,7 @@ def _(expressions_df, pl):
476
476
return
477
477
478
478
479
-
@app.cell
479
+
@app.cell(hide_code=True)
480
480
def_(mo):
481
481
mo.md(
482
482
r"""
@@ -506,7 +506,7 @@ def _(expressions_df, pl):
506
506
return
507
507
508
508
509
-
@app.cell
509
+
@app.cell(hide_code=True)
510
510
def_(mo):
511
511
mo.md(r"""A related application example is to *find* the first index where a particular pattern is present, so that it can be used for downstream processing such as slicing. Below we use the [`find`](https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.str.find.html) expression to determine the index at which a code example starts in the docstring - identified by the Python shell substring `">>>"`.""")
512
512
return
@@ -522,7 +522,7 @@ def _(expressions_df, pl):
522
522
return
523
523
524
524
525
-
@app.cell
525
+
@app.cell(hide_code=True)
526
526
def_(mo):
527
527
mo.md(
528
528
r"""
@@ -562,7 +562,7 @@ def _(mo, slice, sliced_df):
562
562
return
563
563
564
564
565
-
@app.cell
565
+
@app.cell(hide_code=True)
566
566
def_(mo):
567
567
mo.md(
568
568
r"""
@@ -589,7 +589,7 @@ def _(expressions_df, pl):
589
589
return
590
590
591
591
592
-
@app.cell
592
+
@app.cell(hide_code=True)
593
593
def_(mo):
594
594
mo.md(r"""As a more practical example, we can use the `split` expression with some aggregation to count the number of times a particular word occurs in member names across all namespaces. This enables us to create a word cloud of the API members' constituents!""")
mo.md(r"""And of course, you can convert back into a human-readable representation using the [`decode`](https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.str.decode.html) expression.""")
0 commit comments