Merge pull request #514 from UBC-DSCI/mutate-in-ch1

trevorcampbell · web-flow · commit aa2201c43879 · 2023-08-06T11:16:29.000-07:00
Added mutate to ch1
diff --git a/source/intro.Rmd b/source/intro.Rmd
@@ -445,7 +445,7 @@ selected_lang <- select(aboriginal_lang, language, mother_tongue)
 selected_lang
 ```
 
-### Using `arrange` to order and `slice` to select rows by index number
+## Using `arrange` to order and `slice` to select rows by index number
 
 We have used `filter` and `select` to obtain a table with only the Aboriginal
 languages in the data set and their associated counts. However, we want to know
@@ -484,19 +484,49 @@ ten_lang <- slice(arranged_lang, 1:10)
 ten_lang
 ```
 
-We have now answered our initial question by generating this table!
+## Adding and modifying columns using `mutate`
+
+Recall that our data analysis question referred to the *count* of Canadians
+that speak each of the top ten most commonly reported Aboriginal languages as
+their mother tongue, and the `ten_lang` data frame indeed contains those
+counts... But perhaps, seeing these numbers, we became curious about the
+*percentage* of the population of Canada associated with each count. It is
+common to come up with new data analysis questions in the process of answering
+a first one&mdash;so fear not and explore! To answer this small
+question-along-the-way, we need to divide each count in the `mother_tongue`
+column by the total Canadian population according to the 2016
+census&mdash;i.e., 35,151,728&mdash;and multiply it by 100. We can perform
+this computation using the `mutate` function. We pass the `ten_lang`
+data frame as its first argument, then specify the equation that computes the percentages
+in the second argument. By using a new variable name on the left hand side of the equation,
+we will create a new column in the data frame; and if we use an existing name, we will
+modify that variable. In this case, we will opt to
+create a new column called `mother_tongue_percent`. 
+
+```{r} 
+canadian_population = 35151728
+ten_lang_percent = mutate(ten_lang, mother_tongue_percent = 100 * mother_tongue / canadian_population)
+ten_lang_percent
+```
+
+The `ten_lang_percent` data frame shows that
+the ten Aboriginal languages in the `ten_lang` data frame were spoken 
+as a mother tongue by between 0.008% and 0.18% of the Canadian population.
+
+
+## Exploring data with visualizations
+
+We have now answered our initial question by generating the `ten_lang` table!
 Are we done? Well, not quite; tables are almost never the best way to present
-the result of your analysis to your audience. Even the simple table above with
+the result of your analysis to your audience. Even the `ten_lang` table with
 only two columns presents some difficulty: for example, you have to scrutinize
 the table quite closely to get a sense for the relative numbers of speakers of 
 each language. When you move on to more complicated analyses, this issue only 
 gets worse. In contrast, a *visualization* would convey this information in a much 
 more easily understood format. 
 Visualizations are a great tool for summarizing information to help you
-effectively communicate with your audience. 
-
-## Exploring data with visualizations
-Creating effective data visualizations \index{visualization} is an essential component of any data
+effectively communicate with your audience, and
+creating effective data visualizations \index{visualization} is an essential component of any data
 analysis. In this section we will develop a visualization of the 
  ten Aboriginal languages that were most often reported in 2016 as mother tongues in
 Canada, as well as the number of people that speak each of them.