edited wording in rowwise section

ttimbers · ttimbers · commit d03b1d0ca469 · 2021-09-22T23:48:39.000-07:00
diff --git a/wrangling.Rmd b/wrangling.Rmd
@@ -790,10 +790,10 @@ lang_messy_longer <- pivot_longer(lang_messy,
                cols = Toronto:Edmonton,
                names_to = "region",
                values_to = "value")
-tidy_lang <- separate(lang_messy_longer, col = value,
+tidy_lang_chr <- separate(lang_messy_longer, col = value,
            into = c("most_at_home", "most_at_work"),
            sep = "/") 
-official_langs_chr <- filter(tidy_lang, category == "Official languages")
+official_langs_chr <- filter(tidy_lang_chr, category == "Official languages")
 
 official_langs_chr 
 ```
@@ -1129,7 +1129,7 @@ and pipe that into more functions after that.
 
 As a part of many data analyses, we need to calculate a summary value for the
 data (a *summary statistic*). Examples of summary statistics we might want to calculate are the
-number of observations, the average/mean value for a column, the minimum value etc. 
+number of observations, the average/mean value for a column, the minimum value, etc. 
 A useful `dplyr` function for calculating summary statistics is
 `summarize`, where the first argument is the data frame and the proceeding arguments
 are the summaries we want to perform. Below we show how to use the `summarize` function to
@@ -1150,45 +1150,13 @@ primary language at home is spoken by
 `r format(lang_summary$most_most_at_home[1], scientific = FALSE, big.mark = ",")`
 people.
 
-<!-- Suppose we wanted to find the maximum value for all the numeric columns in the `tidy_lang` data set.  -->
-
-<!-- We could apply `summarize` in the same way that we did above to find the maximum values:  -->
-
-<!-- ```{r} -->
-
-<!-- lang_summary_max <- summarize(tidy_lang,  -->
-
-<!--              most_most_at_home = max(most_at_home),  -->
-
-<!--              most_most_at_work = max(most_at_work)) -->
-
-<!-- lang_summary_max -->
-
-<!-- ``` -->
-
-<!-- The approach above is a valid way to do this, but if we had many numeric columns in our data set then this method would take a lot of time since we would have to explicitly write out the name of each column! A faster and less error-prone way to apply function(s) to columns that satisfy a certain condition is to use the `summarize_if` function. The first argument is the data set we want to summarize (`tidy_lang`). The second argument is the required condition, here if a particular column is numeric then the function will be applied. The third argument is the function we want to summarize with, here `max`. Therefore we write: -->
-
-<!-- ```{r 02-summarize-if} -->
-
-<!-- summarize_if(tidy_lang,  -->
-
-<!--              is.numeric,  -->
-
-<!--              max) -->
-
-<!-- ``` -->
-
-<!-- Notice that we get the same output as we did above! From the table, we see that the most commonly spoken  -->
-
-<!-- primary language at home is spoken by X people and the most commonly spoken language at work is spoken by X people.  -->
-
 ### Calculating group summary statistics:
 
 A common pairing with `summarize` is `group_by`. Pairing these functions
 together can let you summarize values for subgroups within a data set. For
-example, here, we can use `group_by` to group the regions and then calculate the
-minimum and maximum number of Canadians reporting the language as the primary
-language at home for each of the groups.
+example, here, we can use `group_by` to group the regions of the `tidy_lang` dataframe
+and then calculate the minimum and maximum number of Canadians 
+reporting the language as the primary language at home for each of the groups.
 
 The `group_by` function takes at least two arguments. The first is the data
 frame that will be grouped, and the second and onwards are columns to use in the
@@ -1205,7 +1173,7 @@ lang_summary_by_region
 ```
 
 Notice that `group_by` on its own doesn't change the way the data looks. In the output below 
-the data set looks the same, and it doesn't *appear* to be grouped by `region`. 
+the grouped data set looks the same, and it doesn't *appear* to be grouped by `region`. 
 Instead, `group_by` simply changes how other functions work with the data, as we saw with `summarize` above.  
 
 ```{r}
@@ -1387,42 +1355,40 @@ iteration. Additionally, their use is not limited to columns of a data frame;
 `map_*` functions can be used to apply functions to elements of a vector or
 list, and even to lists of data frames, or nested data frames.
 
-## Iterating over rows in a data frame with `rowwise()`
+## Apply functions across columns within one row with `rowwise`
 
-
-What if you want to apply a function across rows instead of columns? 
+What if you want to apply a function across columns but within one row? 
 For instance, suppose we want to know the maximum value between `mother_tongue`,
-`most_at_home`, `most_at_work` and `lang_known` for each language in Vancouver.
+`most_at_home`, `most_at_work` and `lang_known` for each language in the `region_lang` data set?
 In other words, we want to apply the `max` function row-wise. We will use the aptly 
-named function `rowwise` to accomplish this task. First, we `filter` the data for 
-only the languages in Vancouver. We also `select` specific columns simply 
-so we can see all the columns in the data frame output 
-but note that this step is not strictly necessary.
+named function `rowwise` in combination with `mutate` to accomplish this task. 
+>**Note:** Before we apply `rowwise` we will `select` only the count columns 
+so we can see all the columns in the dataframe's output easily in the book. 
 
-```{r vancouver_filter}
-vancouver_lang <- region_lang |>
-  filter(region == "Vancouver") |> 
-  select(region, language:lang_known)
-vancouver_lang
-```
-Similar to `group_by`, `rowwise` doesn't do anything when it is called by itself, 
-however, we can apply `rowwise` in combination with other functions to change how 
-these other functions operate on the data. We will use `rowwise` and `mutate` 
-to find the maximum count for each language in the data set.  
 ```{r}
-vancouver_lang |> 
+region_lang |> 
+  select(mother_tongue:lang_known) |>
   rowwise() |> 
   mutate(maximum = max(c(mother_tongue, most_at_home, most_at_work, lang_known)))
 ```
+
+Similar to `group_by`, `rowwise` doesn't do anything when it is called by itself, 
+however, we can apply `rowwise` in combination with other functions to change how 
+these other functions operate on the data.  
 Notice if we used `mutate` without `rowwise`, we would have computed the maximum 
-value across *all* rows rather than the maximum value for *each* row. Therefore in the output below
-`r format(vancouver_lang |>  mutate(maximum = max(c(mother_tongue, most_at_home, most_at_work, lang_known))) |> slice(1) |> pull(maximum),  scientific = FALSE, big.mark = ",")` is reported as the maximum value in every single row since it is 
+value across *all* rows rather than the maximum value for *each* row. 
+Therefore in the output below the same maximum value is reported 
+in every single row since it is 
 the maximum value among *all* the rows, so this code is not doing what we want. 
 
 ```{r}
-vancouver_lang |> 
+region_lang |> 
+  select(mother_tongue:lang_known) |>
   mutate(maximum = max(c(mother_tongue, most_at_home, most_at_home, lang_known)))
 ```
+
+## Summary
+
 Cleaning and wrangling data can be a very time-consuming process, however, 
 it is a critical step in any data analysis. We have explored many different
 functions for cleaning and wrangling data into a tidy format. 
@@ -1435,16 +1401,17 @@ Table: (#tab:summary-functions-table) Summary of wrangling functions
 
 | Function | Description |
 | ---      | ----------- | 
+| `across` | allows you to apply function(s) to multiple columns  | 
+| `filter` | subsets rows of a data frame | 
+| `group_by` |  allows you to apply function(s) to groups of rows |
+| `mutate` | adds or modifies columns in a data frame |
+| `map` | generally iteration function |
 | `pivot_longer` | generally makes the data frame longer and narrower |
+| `rowwise` | applies functions across columns within one row | 
 | `pivot_wider` | generally makes a data frame wider and decreases the number of rows | 
 | `separate` | splits up a character column into multiple columns  | 
 | `select` | subsets columns of a data frame |
-| `filter` | subsets rows of a data frame | 
-| `mutate` | adds or modifies columns in a data frame | 
 | `summarize` | calculates summaries of inputs | 
-| `group_by` |  allows you to apply function(s) to groups of rows |
-| `across` | allows you to apply function(s) to multiple columns  | 
-| `rowwise` | allows you to apply function(s) across rows of a data frame | 
 
 ## Additional resources