Skip to content

Commit 9cb2c09

Browse files
Merge pull request #403 from UBC-DSCI/rohan-edits
Rohan Alexander edits
2 parents 47d458a + 57be6d1 commit 9cb2c09

File tree

6 files changed

+45
-47
lines changed

6 files changed

+45
-47
lines changed

clustering.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1098,4 +1098,4 @@ please follow the instructions for computer setup needed to run the worksheets
10981098
found in Chapter \@ref(move-to-your-own-machine).
10991099

11001100
## Additional resources
1101-
- Chapter 10 of [*An Introduction to Statistical Learning*](https://www.statlearning.com/) [-@james2013introduction] provides a great next stop in the process of learning about clustering and unsupervised learning in general. In the realm of clustering specifically, it provides a great companion introduction to K-means, but also covers *hierarchical* clustering for when you expect there to be subgroups, and then subgroups within subgroups, etc., in your data. In the realm of more general unsupervised learning, it covers *principal components analysis (PCA)*, which is a very popular technique in scientific applications for reducing the number of predictors in a dataset.
1101+
- Chapter 10 of [*An Introduction to Statistical Learning*](https://www.statlearning.com/) [-@james2013introduction] provides a great next stop in the process of learning about clustering and unsupervised learning in general. In the realm of clustering specifically, it provides a great companion introduction to K-means, but also covers *hierarchical* clustering for when you expect there to be subgroups, and then subgroups within subgroups, etc., in your data. In the realm of more general unsupervised learning, it covers *principal components analysis (PCA)*, which is a very popular technique for reducing the number of predictors in a dataset.

intro.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
\mainmatter
22

3-
# R and the tidyverse {#intro}
3+
# R and the Tidyverse {#intro}
44

55
```{r intro-setup, include=FALSE}
66
library(magick)

reading.Rmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -758,8 +758,8 @@ Databases are beneficial in a large-scale setting:
758758
- They provide security and data access control.
759759
- They allow multiple users to access data simultaneously and remotely without conflicts and errors.
760760
For example, [there are billions of Google searches conducted daily](https://www.internetlivestats.com/google-search-statistics/).
761-
Can you imagine if Google stored all of the data from those searches in a single `.csv
762-
file`!? Chaos would ensue!
761+
Can you imagine if Google stored all of the data from those searches in a single `.csv` file!?
762+
Chaos would ensue!
763763

764764
## Writing data from R to a `.csv` file
765765

@@ -1220,9 +1220,9 @@ please follow the instructions for computer setup needed to run the worksheets
12201220
found in Chapter \@ref(move-to-your-own-machine).
12211221
12221222
## Additional resources
1223-
- The [`readr` page on the tidyverse website](https://readr.tidyverse.org/) is where you should look if you want to learn more about the functions in this chapter, the full set of arguments you can use, and other related functions. The site also provides a very nice cheat sheet that summarizes many of the data wrangling functions from this chapter.
1223+
- The [`readr` page on the Tidyverse website](https://readr.tidyverse.org/) is where you should look if you want to learn more about the functions in this chapter, the full set of arguments you can use, and other related functions. The site also provides a very nice cheat sheet that summarizes many of the data wrangling functions from this chapter.
12241224
- Sometimes you might run into data in such poor shape that none of the reading functions we cover in this chapter works. In that case, you can consult the [data import chapter](https://r4ds.had.co.nz/data-import.html) from [R for Data Science](https://r4ds.had.co.nz/), which goes into a lot more detail about how R parses text from files into data frames.
1225-
- The documentation for many of the reading functions we cover in this chapter can be found [on the tidyverse website](https://readr.tidyverse.org/reference/read_delim.html). This site shows you the full set of arguments available for each function.
1225+
- The documentation for many of the reading functions we cover in this chapter can be found [on the Tidyverse website](https://readr.tidyverse.org/reference/read_delim.html). This site shows you the full set of arguments available for each function.
12261226
- The [`here` package](https://cran.r-project.org/web/packages/here/index.html) provides a way for you to construct or find your files' paths.
12271227
- The [`readxl` documentation](https://readxl.tidyverse.org/) provides more details on reading data from Excel, such as reading in data with multiple sheets, or specifying the cells to read in.
12281228
- The [`rio` package](https://github.com/leeper/rio) provides an alternative set of tools for reading and writing data in R. It aims to be a "Swiss army knife" for data reading/writing/converting, and supports a wide variety of data types (including data formats generated by other statistical software like SPSS and SAS).

regression2.Rmd

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,12 @@ print_tidymodels <- function(tidymodels_object) {
2929
## Overview
3030
Up to this point, we have solved all of our predictive problems&mdash;both classification
3131
and regression&mdash;using K-nearest neighbors (KNN)-based approaches. In the context of regression,
32-
there is another method commonly used
33-
in scientific disciplines known as *linear regression*. This chapter provides an introduction
32+
there is another commonly used method known as *linear regression*. This chapter provides an introduction
3433
to the basic concept of linear regression, shows how to use `tidymodels` to perform linear regression in R,
3534
and characterizes its strengths and weaknesses compared to KNN regression. The focus is, as usual,
3635
on the case where there is a single predictor and single response variable of interest; but the chapter
37-
concludes with an example using *multivariable linear regression* when there is more than one
38-
predictor.
36+
concludes with an example using *multivariable linear regression* when there is more than one
37+
predictor.
3938

4039
## Chapter learning objectives
4140
By the end of the chapter, readers will be able to do the following:
@@ -50,11 +49,10 @@ By the end of the chapter, readers will be able to do the following:
5049
At the end of the previous chapter, we noted some limitations of KNN regression.
5150
While the method is simple and easy to understand, KNN regression does not
5251
predict well beyond the range of the predictors in the training data, and
53-
the method gets significantly slower as the training data set grows.
54-
\index{regression!linear}
52+
the method gets significantly slower as the training data set grows. \index{regression!linear}
5553
Fortunately, there is an alternative to KNN regression&mdash;*linear regression*&mdash;that addresses
56-
both of these limitations. Linear regression is also much more commonly used in practice, especially
57-
in scientific applications, because it provides an interpretable mathematical equation that describes
54+
both of these limitations. Linear regression is also very commonly
55+
used in practice because it provides an interpretable mathematical equation that describes
5856
the relationship between the predictor and response variables. In this first part of the chapter, we will focus on *simple* linear regression,
5957
which involves only one predictor variable and one response variable; later on, we will consider
6058
*multivariable* linear regression, which involves multiple predictor variables.

viz.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1512,7 +1512,7 @@ please follow the instructions for computer setup needed to run the worksheets
15121512
found in Chapter \@ref(move-to-your-own-machine).
15131513

15141514
## Additional resources
1515-
- The [`ggplot2` page on the tidyverse website](https://ggplot2.tidyverse.org) is where you should look if you want to learn more about the functions in this chapter, the full set of arguments you can use, and other related functions. The site also provides a very nice cheat sheet that summarizes many of the data wrangling functions from this chapter.
1515+
- The [`ggplot2` page on the Tidyverse website](https://ggplot2.tidyverse.org) is where you should look if you want to learn more about the functions in this chapter, the full set of arguments you can use, and other related functions. The site also provides a very nice cheat sheet that summarizes many of the data wrangling functions from this chapter.
15161516
- The [Fundamentals of Data Visualization](https://serialmentor.com/dataviz/) has a wealth of information on designing effective visualizations. It is not specific to any particular programming language or library. If you want to improve your visualization skills, this is the next place to look.
15171517
- [*R for Data Science*](https://r4ds.had.co.nz/) has a chapter on [creating visualizations using `ggplot2`](https://r4ds.had.co.nz/data-visualisation.html). This reference is specific to R and `ggplot2`, but provides a much more detailed introduction to the full set of tools that `ggplot2` provides. This chapter is where you should look if you want to learn how to make more intricate visualizations in `ggplot2` than what is included in this chapter.
15181518
- The [`theme` function documentation](https://ggplot2.tidyverse.org/reference/theme.html)

0 commit comments

Comments
 (0)