Skip to content

Commit e8c781e

Browse files
authored
update the polars cookbook walkthrough (#1657)
1 parent b0ab1be commit e8c781e

File tree

1 file changed

+9
-28
lines changed

1 file changed

+9
-28
lines changed

cookbook/polars_v_pandas_v_nushell.md

Lines changed: 9 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@ A dataframe example based on https://studioterabyte.nl/en/blog/polars-vs-pandas
99
## 1. Opening the file and show the shape of the DataFrame
1010

1111
```nu
12-
let df = (dfr open NYCTaxi.csv)
12+
let df = polars open NYCTaxi.csv
1313
```
1414

1515
```nu
16-
$df | shape
16+
$df | polars shape
1717
```
1818

1919
Output:
@@ -23,15 +23,13 @@ Output:
2323
│ # │ rows │ columns │
2424
├───┼─────────┼─────────┤
2525
│ 0 │ 1458644 │ 11 │
26-
├───┼─────────┼─────────┤
27-
│ # │ rows │ columns │
2826
╰───┴─────────┴─────────╯
2927
```
3028

3129
## 2. Opening the file and show the first 5 rows
3230

3331
```nu
34-
$df | first 5
32+
$df | polars first 5 | polars collect
3533
```
3634

3735
Output:
@@ -51,17 +49,14 @@ Output:
5149
│ │ │ │ 19:32:31 │ 19:39:40 │ │ │ │ │ │ │ │
5250
│ 4 │ id2181028 │ 2 │ 2016-03-26 │ 2016-03-26 │ 1 │ -73.97 │ 40.79 │ -73.97 │ 40.78 │ N │ 435 │
5351
│ │ │ │ 13:30:55 │ 13:38:10 │ │ │ │ │ │ │ │
54-
├───┼───────────┼───────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼──────────────┼──────────────┤
55-
│ # │ id │ vendor_id │ pickup_dateti │ dropoff_datet │ passenger_cou │ pickup_longit │ pickup_latitu │ dropoff_longi │ dropoff_latit │ store_and_fw │ trip_duratio │
56-
│ │ │ │ me │ ime │ nt │ ude │ de │ tude │ ude │ d_flag │ n │
5752
╰───┴───────────┴───────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┴──────────────┴──────────────╯
5853
```
5954

6055
## 3. Opening the file and get the length of all strings in the "id" column
6156

6257
```nu
63-
let ids = ($df | first 5 | get id | str-lengths)
64-
$df | first 5 | append $ids | rename id_x vendor_id_length
58+
let ids = $df | polars first 5 | polars get id | polars str-lengths
59+
$df | polars first 5 | polars append $ids | polars rename id_x vendor_id_length
6560
```
6661

6762
Output:
@@ -81,16 +76,13 @@ Output:
8176
│ │ │ │ 19:32:31 │ 19:39:40 │ │ │ │ │ │ │ │ │
8277
│ 4 │ id2181028 │ 2 │ 2016-03-26 │ 2016-03-26 │ 1 │ -73.97 │ 40.79 │ -73.97 │ 40.78 │ N │ 435 │ 9 │
8378
│ │ │ │ 13:30:55 │ 13:38:10 │ │ │ │ │ │ │ │ │
84-
├───┼───────────┼───────────┼──────────────┼──────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┤
85-
│ # │ id │ vendor_id │ pickup_datet │ dropoff_date │ passenger_c │ pickup_long │ pickup_lati │ dropoff_lon │ dropoff_lat │ store_and_f │ trip_durati │ vendor_id_l │
86-
│ │ │ │ ime │ time │ ount │ itude │ tude │ gitude │ itude │ wd_flag │ on │ ength │
8779
╰───┴───────────┴───────────┴──────────────┴──────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────╯
8880
```
8981

9082
Here's an alternate approach using `with-column`
9183

9284
```nu
93-
$df | first 5 | with-column ($df | first 5 | get id | str-lengths) --name vendor_id_length
85+
$df | polars with-column (polars col id | polars str-lengths | polars as vendor_id_lengths) | polars first 5 | polars collect
9486
```
9587

9688
Output:
@@ -110,16 +102,13 @@ Output:
110102
│ │ │ │ 19:32:31 │ 19:39:40 │ │ │ │ │ │ │ │ │
111103
│ 4 │ id2181028 │ 2 │ 2016-03-26 │ 2016-03-26 │ 1 │ -73.97 │ 40.79 │ -73.97 │ 40.78 │ N │ 435 │ 9 │
112104
│ │ │ │ 13:30:55 │ 13:38:10 │ │ │ │ │ │ │ │ │
113-
├───┼───────────┼───────────┼──────────────┼──────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┼─────────────┤
114-
│ # │ id │ vendor_id │ pickup_datet │ dropoff_date │ passenger_c │ pickup_long │ pickup_lati │ dropoff_lon │ dropoff_lat │ store_and_f │ trip_durati │ vendor_id_l │
115-
│ │ │ │ ime │ time │ ount │ itude │ tude │ gitude │ itude │ wd_flag │ on │ ength │
116105
╰───┴───────────┴───────────┴──────────────┴──────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────┴─────────────╯
117106
```
118107

119108
## 4. Opening the file and apply a function to the "trip_duration" to divide the number by 60 to go from the second value to a minute value
120109

121110
```nu
122-
$df | first 5 | with-column ((col trip_duration) / 60.0)
111+
$df | polars first 5 | polars with-column ((polars col trip_duration) / 60.0) | polars collect
123112
```
124113

125114
Output:
@@ -139,16 +128,13 @@ Output:
139128
│ │ │ │ 19:32:31 │ 19:39:40 │ │ │ │ │ │ │ │
140129
│ 4 │ id2181028 │ 2 │ 2016-03-26 │ 2016-03-26 │ 1 │ -73.97 │ 40.79 │ -73.97 │ 40.78 │ N │ 7.25 │
141130
│ │ │ │ 13:30:55 │ 13:38:10 │ │ │ │ │ │ │ │
142-
├───┼───────────┼───────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼──────────────┼──────────────┤
143-
│ # │ id │ vendor_id │ pickup_dateti │ dropoff_datet │ passenger_cou │ pickup_longit │ pickup_latitu │ dropoff_longi │ dropoff_latit │ store_and_fw │ trip_duratio │
144-
│ │ │ │ me │ ime │ nt │ ude │ de │ tude │ ude │ d_flag │ n │
145131
╰───┴───────────┴───────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┴──────────────┴──────────────╯
146132
```
147133

148134
## 5. Opening the file and filtering out all rows with a trip duration shorther than 500 seconds
149135

150136
```nu
151-
$df | filter-with ((col trip_duration) >= 500) | first 5
137+
$df | polars filter-with ((polars col trip_duration) >= 500) | polars first 5 | polars collect
152138
```
153139

154140
Output:
@@ -168,16 +154,13 @@ Output:
168154
│ │ │ │ 21:45:01 │ 22:05:26 │ │ │ │ │ │ │ │
169155
│ 4 │ id1436371 │ 2 │ 2016-05-10 │ 2016-05-10 │ 1 │ -73.98 │ 40.76 │ -74.00 │ 40.73 │ N │ 1274 │
170156
│ │ │ │ 22:08:41 │ 22:29:55 │ │ │ │ │ │ │ │
171-
├───┼───────────┼───────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼──────────────┼──────────────┤
172-
│ # │ id │ vendor_id │ pickup_dateti │ dropoff_datet │ passenger_cou │ pickup_longit │ pickup_latitu │ dropoff_longi │ dropoff_latit │ store_and_fw │ trip_duratio │
173-
│ │ │ │ me │ ime │ nt │ ude │ de │ tude │ ude │ d_flag │ n │
174157
╰───┴───────────┴───────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┴───────────────┴──────────────┴──────────────╯
175158
```
176159

177160
## 6. Opening the file, filtering out all the rows with a "Y" store_and_fwd_flag value, group by ID and calculate the mean duration time
178161

179162
```nu
180-
$df | filter-with ((col store_and_fwd_flag) == "N") | group-by id | agg (col trip_duration | mean) | sort-by id | first 5
163+
$df | polars filter-with ((polars col store_and_fwd_flag) == "N") | polars group-by id | polars agg (polars col trip_duration | polars mean) | polars sort-by id | polars first 5 | polars collect
181164
```
182165

183166
Output:
@@ -191,7 +174,5 @@ Output:
191174
│ 2 │ id0000005 │ 368.00 │
192175
│ 3 │ id0000008 │ 303.00 │
193176
│ 4 │ id0000009 │ 547.00 │
194-
├───┼───────────┼───────────────┤
195-
│ # │ id │ trip_duration │
196177
╰───┴───────────┴───────────────╯
197178
```

0 commit comments

Comments
 (0)