Skip to content

Commit 63c76f3

Browse files
our -> the in a lot of places, some minor typos
1 parent 69e945a commit 63c76f3

File tree

2 files changed

+29
-28
lines changed

2 files changed

+29
-28
lines changed

reading.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -758,8 +758,8 @@ Databases are beneficial in a large-scale setting:
758758
- They provide security and data access control.
759759
- They allow multiple users to access data simultaneously and remotely without conflicts and errors.
760760
For example, [there are billions of Google searches conducted daily](https://www.internetlivestats.com/google-search-statistics/).
761-
Can you imagine if Google stored all of the data from those searches in a single `.csv
762-
file`!? Chaos would ensue!
761+
Can you imagine if Google stored all of the data from those searches in a single `.csv` file!?
762+
Chaos would ensue!
763763

764764
## Writing data from R to a `.csv` file
765765

wrangling.Rmd

Lines changed: 27 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,7 @@ Figure \@ref(fig:02-wide-to-long).
311311
knitr::include_graphics("img/pivot_functions/pivot_functions.001.jpeg")
312312
```
313313

314-
We can achieve this effect in R using the `pivot_longer` function.
314+
We can achieve this effect in R using the `pivot_longer` function from the `tidyverse` package.
315315
The `pivot_longer` function combines columns,
316316
and is usually used during tidying data
317317
when we need to make the data frame longer and narrower.
@@ -329,7 +329,7 @@ lang_wide <- read_csv("data/region_lang_top5_cities_wide.csv")
329329
lang_wide
330330
```
331331

332-
What is wrong with our untidy format above?
332+
What is wrong with the untidy format above?
333333
The table on the left in Figure \@ref(fig:img-pivot-longer-with-table)
334334
represents the data in the "wide" (messy) format.
335335
From a data analysis perspective, this format is not ideal because the values of
@@ -356,8 +356,8 @@ to get the maximum value.
356356
knitr::include_graphics("img/pivot_functions/pivot_functions.003.jpeg")
357357
```
358358

359-
Figure \@ref(fig:img-pivot-longer) details what arguments we need to specify to
360-
use the `tidyverse` function, `pivot_longer`, to accomplish this data transformation.
359+
Figure \@ref(fig:img-pivot-longer) details the arguments that we need to specify
360+
in the `pivot_longer` function to accomplish this data transformation.
361361

362362
(ref:img-pivot-longer) Syntax for the `pivot_longer` function.
363363

@@ -447,7 +447,7 @@ In this example, each observation is a language in a region.
447447
However, each observation is split across multiple rows:
448448
one where the count for `most_at_home` is recorded,
449449
and the other where the count for `most_at_work` is recorded.
450-
Suppose our analysis goal with this data set was to
450+
Suppose the goal with this data was to
451451
visualize the relationship between the number of
452452
Canadians reporting their primary language at home and work.
453453
Doing that would be difficult with this data in its current form,
@@ -461,8 +461,9 @@ will be tidied using the `pivot_wider` function.
461461
knitr::include_graphics("img/pivot_functions/pivot_functions.004.jpeg")
462462
```
463463

464-
Figure \@ref(fig:img-pivot-wider) details what we need to specify
465-
to use the `pivot_wider` function.
464+
Figure \@ref(fig:img-pivot-wider) details the arguments that we need to specify
465+
in the `pivot_wider` function.
466+
466467

467468
(ref:img-pivot-wider) Syntax for the `pivot_wider` function.
468469

@@ -492,8 +493,8 @@ that this data is a tidy data set.
492493
3. Each value is a single cell (i.e., its row, column position in the data
493494
frame is not shared with another value).
494495

495-
You might notice that we have the same number of columns in our tidy data set as
496-
we did in our messy one. Therefore `pivot_wider` didn't really "widen" our data,
496+
You might notice that we have the same number of columns in the tidy data set as
497+
we did in the messy one. Therefore `pivot_wider` didn't really "widen" the data,
497498
as the name suggests. This is just because the original `type` column only had
498499
two categories in it. If it had more than two, `pivot_wider` would have created
499500
more columns, and we would see the data set "widen."
@@ -565,7 +566,7 @@ Is this data set now tidy? If we recall the three criteria for tidy data:
565566
We can see that this data now satisfies all three criteria, making it easier to
566567
analyze. But we aren't done yet! Notice in the table above that the word
567568
`<chr>` appears beneath each of the column names. The word under the column name
568-
indicates the data type of each column. Here all of our variables are
569+
indicates the data type of each column. Here all of the variables are
569570
"character" data types. Recall, character data types are letter(s) or digits(s)
570571
surrounded by quotes. In the previous example in Section \@ref(pivot-wider), the
571572
`most_at_home` and `most_at_work` variables were `<dbl>` (double)&mdash;you can
@@ -600,7 +601,7 @@ indicating they are integer data types (i.e., numbers)!
600601

601602
## Using `select` to extract a range of columns
602603

603-
Now that our `tidy_lang` data is indeed *tidy*, we can start manipulating it \index{select!helpers}
604+
Now that the `tidy_lang` data is indeed *tidy*, we can start manipulating it \index{select!helpers}
604605
using the powerful suite of functions from the `tidyverse`.
605606
For the first example, recall the `select` function from Chapter \@ref(intro),
606607
which lets us create a subset of columns from a data frame.
@@ -679,7 +680,7 @@ to compare the values of the `category` column
679680
with the value `"Official languages"`.
680681
With these arguments, `filter` returns a data frame with all the columns
681682
of the input data frame
682-
but only the rows we asked for in our logical filter statement, i.e.,
683+
but only the rows we asked for in the logical statement, i.e.,
683684
those where the `category` column holds the value `"Official languages"`.
684685
We name this data frame `official_langs`.
685686

@@ -728,8 +729,8 @@ filter(official_langs, region == "Montréal" & language == "French")
728729

729730
### Extracting rows satisfying at least one condition using `|`
730731

731-
Suppose we were interested in the rows for only the Albertan cities
732-
in our `official_langs` data set (Edmonton and Calgary).
732+
Suppose we were interested in only those rows corresponding to cities in Alberta
733+
in the `official_langs` data set (Edmonton and Calgary).
733734
We can't use `,` as we did above because `region`
734735
cannot be both Edmonton *and* Calgary simultaneously.
735736
Instead, we can use the vertical pipe (`|`) logical operator,
@@ -925,11 +926,11 @@ for our five cities of focus in this chapter.
925926
To accomplish this, we will need to do two tasks
926927
beforehand:
927928

928-
1. Create a vector containing the population values for our cities.
929+
1. Create a vector containing the population values for the cities.
929930
2. Filter the `official_langs` data frame
930931
so that we only keep the rows where the language is English.
931932

932-
To create a vector containing the population values for our cities
933+
To create a vector containing the population values for the five cities
933934
(Toronto, Montréal, Vancouver, Calgary, Edmonton),
934935
we will use the `c` function (recall that `c` stands for "concatenate"):
935936

@@ -977,10 +978,10 @@ Failing to do this would have resulted in the incorrect math being performed.
977978
<!--
978979
#### Creating a visualization with tidy data {-}
979980
980-
Now that we have cleaned and wrangled our data, we can make visualizations or do
981-
statistical analyses to answer questions about our data! Let's suppose we want to
981+
Now that we have cleaned and wrangled the data, we can make visualizations or do
982+
statistical analyses to answer questions about it! Let's suppose we want to
982983
answer the question "what proportion of people in each city speak English
983-
as their primary language at home in these five cities?" Since our data is
984+
as their primary language at home in these five cities?" Since the data is
984985
cleaned already, in a few short lines of code, we can use `ggplot` to create a
985986
data visualization to answer this question! Here we create a bar plot to represent the proportions for
986987
each region and color the proportions by language.
@@ -1086,7 +1087,7 @@ output <- data |>
10861087

10871088
### Using `|>` to combine `filter` and `select`
10881089

1089-
Let's work with our tidy `tidy_lang` data set from Section \@ref(separate),
1090+
Let's work with the tidy `tidy_lang` data set from Section \@ref(separate),
10901091
which contains the number of Canadians reporting their primary language at home
10911092
and work for five major cities
10921093
(Toronto, Montréal, Vancouver, Calgary, and Edmonton):
@@ -1125,7 +1126,7 @@ van_data_selected <- tidy_lang |>
11251126
van_data_selected
11261127
```
11271128

1128-
But wait...Why do our `select` and `filter` function calls
1129+
But wait...Why do the `select` and `filter` function calls
11291130
look different in these two examples?
11301131
Remember: when you use the pipe,
11311132
the output of the first function is automatically provided
@@ -1273,8 +1274,8 @@ region_lang_na[["most_at_home"]][1] <- NA
12731274
region_lang_na
12741275
```
12751276

1276-
Now if we apply our `summarize` function as above,
1277-
we see that no longer get the minimum and maximum returned,
1277+
Now if we apply the `summarize` function as above,
1278+
we see that we no longer get the minimum and maximum returned,
12781279
but just an `NA` instead!
12791280

12801281
```{r}
@@ -1409,7 +1410,7 @@ region_lang |>
14091410
> `purrr` is part of the tidyverse, once we call `library(tidyverse)` we
14101411
> do not need to load the `purrr` package separately.
14111412
1412-
Our output looks a bit weird... we passed in a data frame, but our output
1413+
The output looks a bit weird... we passed in a data frame, but the output
14131414
doesn't look like a data frame. As it so happens, it is *not* a data frame, but
14141415
rather a plain list:
14151416

@@ -1547,7 +1548,7 @@ region_lang |>
15471548
```
15481549

15491550
Now we apply `rowwise` before `mutate`, to tell R that we would like
1550-
our mutate function to be applied across, and within, a row,
1551+
the mutate function to be applied across, and within, a row,
15511552
as opposed to being applied on a column
15521553
(which is the default behavior of `mutate`):
15531554

@@ -1561,7 +1562,7 @@ region_lang |>
15611562
lang_known)))
15621563
```
15631564

1564-
We see that we get an additional column added to our data frame,
1565+
We see that we get an additional column added to the data frame,
15651566
named `maximum`, which is the maximum value between `mother_tongue`,
15661567
`most_at_home`, `most_at_work` and `lang_known` for each language
15671568
and region.

0 commit comments

Comments
 (0)