@@ -1195,7 +1195,7 @@ We show an example of this below.
1195
1195
1196
1196
First we create a seemingly innocuous NA
1197
1197
in the first row of the ` region_lang ` data frame,
1198
- in the most_at_home column:
1198
+ in the ` most_at_home column ` :
1199
1199
1200
1200
``` {r}
1201
1201
region_lang_na <- region_lang
@@ -1262,12 +1262,14 @@ group_by(region_lang, region)
1262
1262
1263
1263
### Calculating summary statistics on many columns
1264
1264
1265
+ #### ` summarize ` + ` across ` for calculating summary statistics on many columns
1266
+
1265
1267
Sometimes we need to summarize statistics across many columns.
1266
1268
In such a case, using ` summarize ` alone means that we have to
1267
1269
type out the name of each column we want to summarize.
1268
1270
To do this more efficiently, we can pair ` summarize ` with ` across `
1269
1271
and use the same syntax we use with the ` select ` function to
1270
- specify which columns we would like to perform the statistical summarries on,
1272
+ specify which columns we would like to perform the statistical summaries on,
1271
1273
as well as which function to use to calculate these.
1272
1274
Here we demonstrate finding the maximum value of each of the numeric
1273
1275
columns of the ` region_lang ` data set.
@@ -1277,6 +1279,29 @@ region_lang |>
1277
1279
summarize(across(mother_tongue:lang_known, max))
1278
1280
```
1279
1281
1282
+ > ** Note on calculating summary statistics with ` summarize ` + ` across ` **
1283
+ > ** when there are NA's** :
1284
+ >
1285
+ > Similarly to when we use base R statistical summary functions
1286
+ > (e.g., ` max ` , ` mix ` , ` mean ` , ` sum ` , etc) with ` summarize ` alone,
1287
+ > the use of the ` summarize ` + ` across ` functions paired
1288
+ > with base R statistical summary functions
1289
+ > also return NA's when we apply them to columns that
1290
+ > contain NAs in the data frame.
1291
+ >
1292
+ > To avoid this, again we need to add the argument ` na.rm = TRUE ` ,
1293
+ > but in this case we need to use it a little bit differently.
1294
+ > In this case, we need to add a ` , ` and then ` na.rm = TRUE ` ,
1295
+ > after specifying the function we want ` summarize ` + ` across ` to apply,
1296
+ > as illustrated below:
1297
+ >
1298
+ > ``` {r}
1299
+ > region_lang_na |>
1300
+ > summarize(across(mother_tongue:lang_known, max, na.rm = TRUE))
1301
+ > ```
1302
+
1303
+ #### `map` for calculating summary statistics on many columns
1304
+
1280
1305
An alternative to `summarize` and `across`
1281
1306
for applying a function to many columns is the `map` family of functions.
1282
1307
Let's again find the maximum value of each column of the
@@ -1339,21 +1364,23 @@ region_lang |>
1339
1364
Which ` map ` function you choose depends on what you want to do with the
1340
1365
output; you don't always have to pick ` map_dfc ` !
1341
1366
1342
- Similarly to when we use base R statistical summary functions
1343
- (e.g., ` max ` , ` mix ` , ` mean ` , ` sum ` , etc) with ` summarize ` ,
1344
- ` map ` functions paired with base R statistical summary functions
1345
- also return NA's when we apply them to columns that
1346
- contain NAs in the data frame.
1347
-
1348
- To avoid this, again we need to add the argument ` na.rm = TRUE ` .
1349
- When we use this with ` map ` we do this by adding a ` , ` and then ` na.rm = TRUE ` ,
1350
- after specifying the function we want map to apply, as illustrated below:
1351
-
1352
- ``` {r}
1353
- region_lang |>
1354
- select(mother_tongue:lang_known) |>
1355
- map_dfc(max, na.rm = TRUE)
1356
- ```
1367
+ > ** Note on calculating summary statistics with ` map ` when there are NA's** :
1368
+ >
1369
+ > Similarly to when we use base R statistical summary functions
1370
+ > (e.g., ` max ` , ` mix ` , ` mean ` , ` sum ` , etc) with ` summarize ` ,
1371
+ > ` map ` functions paired with base R statistical summary functions
1372
+ > also return NA's when we apply them to columns that
1373
+ > contain NAs in the data frame.
1374
+ >
1375
+ > To avoid this, again we need to add the argument ` na.rm = TRUE ` .
1376
+ > When we use this with ` map ` we do this by adding a ` , ` and then ` na.rm = TRUE ` ,
1377
+ > after specifying the function we want ` map ` to apply, as illustrated below:
1378
+ >
1379
+ > ``` {r}
1380
+ > region_lang_na |>
1381
+ > select(mother_tongue:lang_known) |>
1382
+ > map_dfc(max, na.rm = TRUE)
1383
+ > ```
1357
1384
1358
1385
The `map` family functions are generally quite useful for solving many problems
1359
1386
involving repeatedly applying functions in R.
@@ -1480,9 +1507,9 @@ Table: (#tab:summary-functions-table) Summary of wrangling functions
1480
1507
` pivot_longer ` /` pivot_wider ` and ` separate ` , but also covers missing values
1481
1508
and additional wrangling functions (like ` unite ` ). The [ data
1482
1509
transformation] ( https://r4ds.had.co.nz/transform.html ) chapter covers
1483
- ` select ` , ` filter ` , ` arrange ` , ` mutate ` , and ` summarize ` . And the [ ` map_* `
1510
+ ` select ` , ` filter ` , ` arrange ` , ` mutate ` , and ` summarize ` . And the [ ` map `
1484
1511
functions] ( https://r4ds.had.co.nz/iteration.html#the-map-functions ) chapter
1485
- provides more about the ` map_* ` functions.
1512
+ provides more about the ` map ` functions.
1486
1513
- You will occasionally encounter a case where you need to iterate over items
1487
1514
in a data frame, but none of the above functions are flexible enough to do
1488
1515
what you want. In that case, you may consider using [ a for
0 commit comments