Merge pull request #403 from UBC-DSCI/rohan-edits

trevorcampbell · web-flow · commit 9cb2c093bff6 · 2021-12-09T12:33:58.000-08:00
Rohan Alexander edits
diff --git a/clustering.Rmd b/clustering.Rmd
@@ -1098,4 +1098,4 @@ please follow the instructions for computer setup needed to run the worksheets
 found in Chapter \@ref(move-to-your-own-machine).
 
 ## Additional resources
-- Chapter 10 of [*An Introduction to Statistical Learning*](https://www.statlearning.com/) [-@james2013introduction] provides a great next stop in the process of learning about clustering and unsupervised learning in general. In the realm of clustering specifically, it provides a great companion introduction to K-means, but also covers *hierarchical* clustering for when you expect there to be subgroups, and then subgroups within subgroups, etc., in your data. In the realm of more general unsupervised learning, it covers *principal components analysis (PCA)*, which is a very popular technique in scientific applications for reducing the number of predictors in a dataset. 
+- Chapter 10 of [*An Introduction to Statistical Learning*](https://www.statlearning.com/) [-@james2013introduction] provides a great next stop in the process of learning about clustering and unsupervised learning in general. In the realm of clustering specifically, it provides a great companion introduction to K-means, but also covers *hierarchical* clustering for when you expect there to be subgroups, and then subgroups within subgroups, etc., in your data. In the realm of more general unsupervised learning, it covers *principal components analysis (PCA)*, which is a very popular technique for reducing the number of predictors in a dataset. 
diff --git a/intro.Rmd b/intro.Rmd
@@ -1,6 +1,6 @@
 \mainmatter
 
-# R and the tidyverse {#intro}
+# R and the Tidyverse {#intro}
 
 ```{r intro-setup, include=FALSE}
 library(magick)
diff --git a/reading.Rmd b/reading.Rmd
@@ -758,8 +758,8 @@ Databases are beneficial in a large-scale setting:
 - They provide security and data access control.
 - They allow multiple users to access data simultaneously and remotely without conflicts and errors.
   For example, [there are billions of Google searches conducted daily](https://www.internetlivestats.com/google-search-statistics/). 
-  Can you imagine if Google stored all of the data from those searches in a single `.csv
-  file`!? Chaos would ensue! 
+  Can you imagine if Google stored all of the data from those searches in a single `.csv` file!? 
+  Chaos would ensue! 
 
 ## Writing data from R to a `.csv` file
 
@@ -1220,9 +1220,9 @@ please follow the instructions for computer setup needed to run the worksheets
 found in Chapter \@ref(move-to-your-own-machine).
 
 ## Additional resources
-- The [`readr` page on the tidyverse website](https://readr.tidyverse.org/) is where you should look if you want to learn more about the functions in this chapter, the full set of arguments you can use, and other related functions. The site also provides a very nice cheat sheet that summarizes many of the data wrangling functions from this chapter.
+- The [`readr` page on the Tidyverse website](https://readr.tidyverse.org/) is where you should look if you want to learn more about the functions in this chapter, the full set of arguments you can use, and other related functions. The site also provides a very nice cheat sheet that summarizes many of the data wrangling functions from this chapter.
 - Sometimes you might run into data in such poor shape that none of the reading functions we cover in this chapter works. In that case, you can consult the [data import chapter](https://r4ds.had.co.nz/data-import.html) from [R for Data Science](https://r4ds.had.co.nz/), which goes into a lot more detail about how R parses text from files into data frames.
-- The documentation for many of the reading functions we cover in this chapter can be found [on the tidyverse website](https://readr.tidyverse.org/reference/read_delim.html). This site shows you the full set of arguments available for each function.
+- The documentation for many of the reading functions we cover in this chapter can be found [on the Tidyverse website](https://readr.tidyverse.org/reference/read_delim.html). This site shows you the full set of arguments available for each function.
 - The [`here` package](https://cran.r-project.org/web/packages/here/index.html) provides a way for you to construct or find your files' paths. 
 - The [`readxl` documentation](https://readxl.tidyverse.org/) provides more details on reading data from Excel, such as reading in data with multiple sheets, or specifying the cells to read in. 
 - The [`rio` package](https://github.com/leeper/rio) provides an alternative set of tools for reading and writing data in R. It aims to be a "Swiss army knife" for data reading/writing/converting, and supports a wide variety of data types (including data formats generated by other statistical software like SPSS and SAS).
diff --git a/regression2.Rmd b/regression2.Rmd
@@ -29,13 +29,12 @@ print_tidymodels <- function(tidymodels_object) {
 ## Overview 
 Up to this point, we have solved all of our predictive problems&mdash;both classification
 and regression&mdash;using K-nearest neighbors (KNN)-based approaches. In the context of regression, 
-there is another method commonly used 
-in scientific disciplines known as *linear regression*. This chapter provides an introduction
+there is another commonly used method known as *linear regression*. This chapter provides an introduction
 to the basic concept of linear regression, shows how to use `tidymodels` to perform linear regression in R,
 and characterizes its strengths and weaknesses compared to KNN regression. The focus is, as usual,
 on the case where there is a single predictor and single response variable of interest; but the chapter
- concludes with an example using *multivariable linear regression* when there is more than one
- predictor.
+concludes with an example using *multivariable linear regression* when there is more than one
+predictor.
 
 ## Chapter learning objectives 
 By the end of the chapter, readers will be able to do the following:
@@ -50,11 +49,10 @@ By the end of the chapter, readers will be able to do the following:
 At the end of the previous chapter, we noted some limitations of KNN regression.
 While the method is simple and easy to understand, KNN regression does not
 predict well beyond the range of the predictors in the training data, and
-the method gets significantly slower as the training data set grows.
-\index{regression!linear}
+the method gets significantly slower as the training data set grows. \index{regression!linear}
 Fortunately, there is an alternative to KNN regression&mdash;*linear regression*&mdash;that addresses
-both of these limitations. Linear regression is also much more commonly used in practice, especially
-in scientific applications, because it provides an interpretable mathematical equation that describes
+both of these limitations. Linear regression is also very commonly 
+used in practice because it provides an interpretable mathematical equation that describes
 the relationship between the predictor and response variables. In this first part of the chapter, we will focus on *simple* linear regression,
 which involves only one predictor variable and one response variable; later on, we will consider
  *multivariable* linear regression, which involves multiple predictor variables.
diff --git a/viz.Rmd b/viz.Rmd
@@ -1512,7 +1512,7 @@ please follow the instructions for computer setup needed to run the worksheets
 found in Chapter \@ref(move-to-your-own-machine).
 
 ## Additional resources
-- The [`ggplot2` page on the tidyverse website](https://ggplot2.tidyverse.org) is where you should look if you want to learn more about the functions in this chapter, the full set of arguments you can use, and other related functions. The site also provides a very nice cheat sheet that summarizes many of the data wrangling functions from this chapter.
+- The [`ggplot2` page on the Tidyverse website](https://ggplot2.tidyverse.org) is where you should look if you want to learn more about the functions in this chapter, the full set of arguments you can use, and other related functions. The site also provides a very nice cheat sheet that summarizes many of the data wrangling functions from this chapter.
 - The [Fundamentals of Data Visualization](https://serialmentor.com/dataviz/) has a wealth of information on designing effective visualizations. It is not specific to any particular programming language or library. If you want to improve your visualization skills, this is the next place to look.
 - [*R for Data Science*](https://r4ds.had.co.nz/) has a chapter on [creating visualizations using `ggplot2`](https://r4ds.had.co.nz/data-visualisation.html). This reference is specific to R and `ggplot2`, but provides a much more detailed introduction to the full set of tools that `ggplot2` provides. This chapter is where you should look if you want to learn how to make more intricate visualizations in `ggplot2` than what is included in this chapter.
 - The [`theme` function documentation](https://ggplot2.tidyverse.org/reference/theme.html)
diff --git a/wrangling.Rmd b/wrangling.Rmd