our -> the in a lot of places, some minor typos

trevorcampbell · trevorcampbell · commit 63c76f3916f9 · 2021-12-07T16:37:27.000-08:00
diff --git a/reading.Rmd b/reading.Rmd
@@ -758,8 +758,8 @@ Databases are beneficial in a large-scale setting:
 - They provide security and data access control.
 - They allow multiple users to access data simultaneously and remotely without conflicts and errors.
   For example, [there are billions of Google searches conducted daily](https://www.internetlivestats.com/google-search-statistics/). 
-  Can you imagine if Google stored all of the data from those searches in a single `.csv
-  file`!? Chaos would ensue! 
+  Can you imagine if Google stored all of the data from those searches in a single `.csv` file!? 
+  Chaos would ensue! 
 
 ## Writing data from R to a `.csv` file
 
diff --git a/wrangling.Rmd b/wrangling.Rmd
@@ -311,7 +311,7 @@ Figure \@ref(fig:02-wide-to-long).
 knitr::include_graphics("img/pivot_functions/pivot_functions.001.jpeg")
 ```
 
-We can achieve this effect in R using the `pivot_longer` function.
+We can achieve this effect in R using the `pivot_longer` function from the `tidyverse` package.
 The `pivot_longer` function combines columns, 
 and is usually used during tidying data 
 when we need to make the data frame longer and narrower. 
@@ -329,7 +329,7 @@ lang_wide <- read_csv("data/region_lang_top5_cities_wide.csv")
 lang_wide
 ```
 
-What is wrong with our untidy format above? 
+What is wrong with the untidy format above? 
 The table on the left in Figure \@ref(fig:img-pivot-longer-with-table) 
 represents the data in the "wide" (messy) format.
 From a data analysis perspective, this format is not ideal because the values of 
@@ -356,8 +356,8 @@ to get the maximum value.
 knitr::include_graphics("img/pivot_functions/pivot_functions.003.jpeg")
 ```
 
-Figure \@ref(fig:img-pivot-longer) details what arguments we need to specify to
-use the `tidyverse` function, `pivot_longer`, to accomplish this data transformation.
+Figure \@ref(fig:img-pivot-longer) details the arguments that we need to specify 
+in the `pivot_longer` function to accomplish this data transformation.
 
 (ref:img-pivot-longer) Syntax for the `pivot_longer` function.
 
@@ -447,7 +447,7 @@ In this example, each observation is a language in a region.
 However, each observation is split across multiple rows: 
 one where the count for `most_at_home` is recorded, 
 and the other where the count for `most_at_work` is recorded. 
-Suppose our analysis goal with this data set was to 
+Suppose the goal with this data was to 
 visualize the relationship between the number of
 Canadians reporting their primary language at home and work. 
 Doing that would be difficult with this data in its current form,
@@ -461,8 +461,9 @@ will be tidied using the `pivot_wider` function.
 knitr::include_graphics("img/pivot_functions/pivot_functions.004.jpeg")
 ```
 
-Figure \@ref(fig:img-pivot-wider) details what we need to specify 
-to use the `pivot_wider` function.
+Figure \@ref(fig:img-pivot-wider) details the arguments that we need to specify 
+in the `pivot_wider` function.
+
 
 (ref:img-pivot-wider) Syntax for the `pivot_wider` function.
 
@@ -492,8 +493,8 @@ that this data is a tidy data set.
 3.  Each value is a single cell (i.e., its row, column position in the data
     frame is not shared with another value).
 
-You might notice that we have the same number of columns in our tidy data set as
-we did in our messy one. Therefore `pivot_wider` didn't really "widen" our data,
+You might notice that we have the same number of columns in the tidy data set as
+we did in the messy one. Therefore `pivot_wider` didn't really "widen" the data,
 as the name suggests. This is just because the original `type` column only had
 two categories in it. If it had more than two, `pivot_wider` would have created
 more columns, and we would see the data set "widen."
@@ -565,7 +566,7 @@ Is this data set now tidy? If we recall the three criteria for tidy data:
 We can see that this data now satisfies all three criteria, making it easier to
 analyze. But we aren't done yet! Notice in the table above that the word
 `<chr>` appears beneath each of the column names. The word under the column name
-indicates the data type of each column. Here all of our variables are
+indicates the data type of each column. Here all of the variables are
 "character" data types. Recall, character data types are letter(s) or digits(s)
 surrounded by quotes. In the previous example in Section \@ref(pivot-wider), the
 `most_at_home` and `most_at_work` variables were `<dbl>` (double)&mdash;you can
@@ -600,7 +601,7 @@ indicating they are integer data types (i.e., numbers)!
 
 ## Using `select` to extract a range of columns
 
-Now that our `tidy_lang` data is indeed *tidy*, we can start manipulating it \index{select!helpers}
+Now that the `tidy_lang` data is indeed *tidy*, we can start manipulating it \index{select!helpers}
 using the powerful suite of functions from the `tidyverse`. 
 For the first example, recall the `select` function from Chapter \@ref(intro), 
 which lets us create a subset of columns from a data frame. 
@@ -679,7 +680,7 @@ to compare the values of the `category` column
 with the value `"Official languages"`. 
 With these arguments, `filter` returns a data frame with all the columns 
 of the input data frame 
-but only the rows we asked for in our logical filter statement, i.e., 
+but only the rows we asked for in the logical statement, i.e., 
 those where the `category` column holds the value `"Official languages"`.
 We name this data frame `official_langs`.
 
@@ -728,8 +729,8 @@ filter(official_langs, region == "Montréal" & language == "French")
 
 ### Extracting rows satisfying at least one condition using `|`
 
-Suppose we were interested in the rows for only the Albertan cities 
-in our `official_langs` data set (Edmonton and Calgary). 
+Suppose we were interested in only those rows corresponding to cities in Alberta
+in the `official_langs` data set (Edmonton and Calgary). 
 We can't use `,` as we did above because `region`
 cannot be both Edmonton *and* Calgary simultaneously. 
 Instead, we can use the vertical pipe (`|`) logical operator, 
@@ -925,11 +926,11 @@ for our five cities of focus in this chapter.
 To accomplish this, we will need to do two tasks 
 beforehand:
 
-1. Create a vector containing the population values for our cities.
+1. Create a vector containing the population values for the cities.
 2. Filter the `official_langs` data frame 
 so that we only keep the rows where the language is English.
 
-To create a vector containing the population values for our cities
+To create a vector containing the population values for the five cities
 (Toronto, Montréal, Vancouver, Calgary, Edmonton),
 we will use the `c` function (recall that `c` stands for "concatenate"):
 
@@ -977,10 +978,10 @@ Failing to do this would have resulted in the incorrect math being performed.
 <!--
 #### Creating a visualization with tidy data {-}
 
-Now that we have cleaned and wrangled our data, we can make visualizations or do 
-statistical analyses to answer questions about our data! Let's suppose we want to
+Now that we have cleaned and wrangled the data, we can make visualizations or do 
+statistical analyses to answer questions about it! Let's suppose we want to
 answer the question "what proportion of people in each city speak English 
-as their primary language at home in these five cities?" Since our data is
+as their primary language at home in these five cities?" Since the data is
 cleaned already, in a few short lines of code, we can use `ggplot` to create a
 data visualization to answer this question! Here we create a bar plot to represent the proportions for
 each region and color the proportions by language.
@@ -1086,7 +1087,7 @@ output <- data |>
 
 ### Using `|>` to combine `filter` and `select`
 
-Let's work with our tidy `tidy_lang` data set from Section \@ref(separate), 
+Let's work with the tidy `tidy_lang` data set from Section \@ref(separate), 
 which contains the number of Canadians reporting their primary language at home 
 and work for five major cities 
 (Toronto, Montréal, Vancouver, Calgary, and Edmonton):
@@ -1125,7 +1126,7 @@ van_data_selected <- tidy_lang |>
 van_data_selected
 ```
 
-But wait...Why do our `select` and `filter` function calls 
+But wait...Why do the `select` and `filter` function calls 
 look different in these two examples? 
 Remember: when you use the pipe, 
 the output of the first function is automatically provided 
@@ -1273,8 +1274,8 @@ region_lang_na[["most_at_home"]][1] <- NA
 region_lang_na
 ```
 
-Now if we apply our `summarize` function as above, 
-we see that no longer get the minimum and maximum returned, 
+Now if we apply the `summarize` function as above, 
+we see that we no longer get the minimum and maximum returned, 
 but just an `NA` instead!
 
 ```{r}
@@ -1409,7 +1410,7 @@ region_lang |>
 > `purrr` is part of the tidyverse, once we call `library(tidyverse)` we 
 > do not need to load the `purrr` package separately.
 
-Our output looks a bit weird... we passed in a data frame, but our output
+The output looks a bit weird... we passed in a data frame, but the output
 doesn't look like a data frame. As it so happens, it is *not* a data frame, but
 rather a plain list:
 
@@ -1547,7 +1548,7 @@ region_lang |>
 ```
 
 Now we apply `rowwise` before `mutate`, to tell R that we would like
-our mutate function to be applied across, and within, a row,
+the mutate function to be applied across, and within, a row,
 as opposed to being applied on a column 
 (which is the default behavior of `mutate`):
 
@@ -1561,7 +1562,7 @@ region_lang |>
                          lang_known)))
 ```
 
-We see that we get an additional column added to our data frame, 
+We see that we get an additional column added to the data frame, 
 named `maximum`, which is the maximum value between `mother_tongue`,
 `most_at_home`, `most_at_work` and `lang_known` for each language
 and region.