Skip to content

Commit aa2201c

Browse files
Merge pull request #514 from UBC-DSCI/mutate-in-ch1
Added mutate to ch1
2 parents ec1c98b + be551ec commit aa2201c

File tree

1 file changed

+37
-7
lines changed

1 file changed

+37
-7
lines changed

source/intro.Rmd

Lines changed: 37 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -445,7 +445,7 @@ selected_lang <- select(aboriginal_lang, language, mother_tongue)
445445
selected_lang
446446
```
447447

448-
### Using `arrange` to order and `slice` to select rows by index number
448+
## Using `arrange` to order and `slice` to select rows by index number
449449

450450
We have used `filter` and `select` to obtain a table with only the Aboriginal
451451
languages in the data set and their associated counts. However, we want to know
@@ -484,19 +484,49 @@ ten_lang <- slice(arranged_lang, 1:10)
484484
ten_lang
485485
```
486486

487-
We have now answered our initial question by generating this table!
487+
## Adding and modifying columns using `mutate`
488+
489+
Recall that our data analysis question referred to the *count* of Canadians
490+
that speak each of the top ten most commonly reported Aboriginal languages as
491+
their mother tongue, and the `ten_lang` data frame indeed contains those
492+
counts... But perhaps, seeing these numbers, we became curious about the
493+
*percentage* of the population of Canada associated with each count. It is
494+
common to come up with new data analysis questions in the process of answering
495+
a first one&mdash;so fear not and explore! To answer this small
496+
question-along-the-way, we need to divide each count in the `mother_tongue`
497+
column by the total Canadian population according to the 2016
498+
census&mdash;i.e., 35,151,728&mdash;and multiply it by 100. We can perform
499+
this computation using the `mutate` function. We pass the `ten_lang`
500+
data frame as its first argument, then specify the equation that computes the percentages
501+
in the second argument. By using a new variable name on the left hand side of the equation,
502+
we will create a new column in the data frame; and if we use an existing name, we will
503+
modify that variable. In this case, we will opt to
504+
create a new column called `mother_tongue_percent`.
505+
506+
```{r}
507+
canadian_population = 35151728
508+
ten_lang_percent = mutate(ten_lang, mother_tongue_percent = 100 * mother_tongue / canadian_population)
509+
ten_lang_percent
510+
```
511+
512+
The `ten_lang_percent` data frame shows that
513+
the ten Aboriginal languages in the `ten_lang` data frame were spoken
514+
as a mother tongue by between 0.008% and 0.18% of the Canadian population.
515+
516+
517+
## Exploring data with visualizations
518+
519+
We have now answered our initial question by generating the `ten_lang` table!
488520
Are we done? Well, not quite; tables are almost never the best way to present
489-
the result of your analysis to your audience. Even the simple table above with
521+
the result of your analysis to your audience. Even the `ten_lang` table with
490522
only two columns presents some difficulty: for example, you have to scrutinize
491523
the table quite closely to get a sense for the relative numbers of speakers of
492524
each language. When you move on to more complicated analyses, this issue only
493525
gets worse. In contrast, a *visualization* would convey this information in a much
494526
more easily understood format.
495527
Visualizations are a great tool for summarizing information to help you
496-
effectively communicate with your audience.
497-
498-
## Exploring data with visualizations
499-
Creating effective data visualizations \index{visualization} is an essential component of any data
528+
effectively communicate with your audience, and
529+
creating effective data visualizations \index{visualization} is an essential component of any data
500530
analysis. In this section we will develop a visualization of the
501531
ten Aboriginal languages that were most often reported in 2016 as mother tongues in
502532
Canada, as well as the number of people that speak each of them.

0 commit comments

Comments
 (0)