Skip to content

Commit 47d458a

Browse files
Merge pull request #400 from UBC-DSCI/index-refs-edits
Copyedits for references, index
2 parents a2a078f + fdcfc98 commit 47d458a

File tree

8 files changed

+24
-24
lines changed

8 files changed

+24
-24
lines changed

classification2.Rmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -643,7 +643,7 @@ knitr::include_graphics("img/cv.png")
643643
```
644644

645645
To perform 5-fold cross-validation in R with `tidymodels`, we use another
646-
function: `vfold_cv`. \index{tidymodels!vfold\_cv}\index{cross validation!vfold\_cv} This function splits our training data into `v` folds
646+
function: `vfold_cv`. \index{tidymodels!vfold\_cv}\index{cross-validation!vfold\_cv} This function splits our training data into `v` folds
647647
automatically. We set the `strata` argument to the categorical label variable
648648
(here, `Class`) to ensure that the training and validation subsets contain the
649649
right proportions of each category of observation.
@@ -653,7 +653,7 @@ cancer_vfold <- vfold_cv(cancer_train, v = 5, strata = Class)
653653
cancer_vfold
654654
```
655655

656-
Then, when we create our data analysis workflow, we use the `fit_resamples` function \index{cross validation!fit\_resamples}\index{tidymodels!fit\_resamples}
656+
Then, when we create our data analysis workflow, we use the `fit_resamples` function \index{cross-validation!fit\_resamples}\index{tidymodels!fit\_resamples}
657657
instead of the `fit` function for training. This runs cross-validation on each
658658
train/validation split.
659659

@@ -679,7 +679,7 @@ knn_fit <- workflow() |>
679679
knn_fit
680680
```
681681

682-
The `collect_metrics` \index{tidymodels!collect\_metrics}\index{cross validation!collect\_metrics} function is used to aggregate the *mean* and *standard error*
682+
The `collect_metrics` \index{tidymodels!collect\_metrics}\index{cross-validation!collect\_metrics} function is used to aggregate the *mean* and *standard error*
683683
of the classifier's validation accuracy across the folds. You will find results
684684
related to the accuracy in the row with `accuracy` listed under the `.metric` column.
685685
You should consider the mean (`mean`) to be the estimated accuracy, while the standard
@@ -747,7 +747,7 @@ knn_spec <- nearest_neighbor(weight_func = "rectangular",
747747
set_mode("classification")
748748
```
749749

750-
Then instead of using `fit` or `fit_resamples`, we will use the `tune_grid` function \index{cross validation!tune\_grid}\index{tidymodels!tune\_grid}
750+
Then instead of using `fit` or `fit_resamples`, we will use the `tune_grid` function \index{cross-validation!tune\_grid}\index{tidymodels!tune\_grid}
751751
to fit the model for each value in a range of parameter values.
752752
In particular, we first create a data frame with a `neighbors`
753753
variable that contains the sequence of values of $K$ to try; below we create the `k_vals`

clustering.Rmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -591,7 +591,7 @@ These, however, are beyond the scope of this book.
591591

592592
### Random restarts
593593

594-
Unlike the classification and regression models we studied in previous chapters, K-means \index{K-means!restart,nstart} can get "stuck" in a bad solution.
594+
Unlike the classification and regression models we studied in previous chapters, K-means \index{K-means!restart, nstart} can get "stuck" in a bad solution.
595595
For example, Figure \@ref(fig:10-toy-kmeans-bad-init) illustrates an unlucky random initialization by K-means.
596596

597597
```{r 10-toy-kmeans-bad-init, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 3.5, fig.width = 3.75, fig.align = "center", fig.cap = "Random initialization of labels."}
@@ -859,7 +859,7 @@ each other. Therefore, the *scale* of each of the variables in the data
859859
will influence which cluster data points end up being assigned.
860860
Variables with a large scale will have a much larger
861861
effect on deciding cluster assignment than variables with a small scale.
862-
To address this problem, we typically standardize \index{standardization!K-means}\index{K-means!stanardization} our data before clustering,
862+
To address this problem, we typically standardize \index{standardization!K-means}\index{K-means!standardization} our data before clustering,
863863
which ensures that each variable has a mean of 0 and standard deviation of 1.
864864
The `scale` function in R can be used to do this.
865865
We show an example of how to use this function
@@ -1050,7 +1050,7 @@ But why is there a "bump" in the total WSSD plot here?
10501050
Shouldn't total WSSD always decrease as we add more clusters?
10511051
Technically yes, but remember: K-means can get "stuck" in a bad solution.
10521052
Unfortunately, for K = 8 we had an unlucky initialization
1053-
and found a bad clustering! \index{K-means!restart,nstart}
1053+
and found a bad clustering! \index{K-means!restart, nstart}
10541054
We can help prevent finding a bad clustering
10551055
by trying a few different random initializations
10561056
via the `nstart` argument (Figure \@ref(fig:10-choose-k-nstart)

jupyter.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ that indicates the status of your kernel. If the circle is empty (`r fa("circle"
144144
the kernel is idle and ready to execute code. If the circle is filled in (`r fa("circle", fill = "black", stroke = "black", stroke_width = "10px", height = "12px")`),
145145
the kernel is busy running some code.
146146

147-
You may run into problems where your kernel \index{kernel!interrupt,restart} is stuck for an excessive amount
147+
You may run into problems where your kernel \index{kernel!interrupt, restart} is stuck for an excessive amount
148148
of time, your notebook is very slow and unresponsive, or your kernel loses its
149149
connection. If this happens, try the following steps:
150150

reading.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ into R, but before we can talk about *how* we read the data into R with these
6161
functions, we first need to talk about *where* the data lives. When you load a
6262
data set into R, you first need to tell R where those files live. The file
6363
could live on your computer (*local*)
64-
\index{location|see{path}} \index{path!local,remote,relative,absolute}
64+
\index{location|see{path}} \index{path!local, remote, relative, absolute}
6565
or somewhere on the internet (*remote*).
6666

6767
The place where the file lives on your computer is called the "path". You can

references.bib

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ @misc{cancensus2016
1616

1717
@misc{language2016,
1818
author = {{Statistics Canada}},
19-
title = {The Aboriginal languages of First Nations people, M\'etis and Inuit},
19+
title = {The {A}boriginal languages of {F}irst {N}ations people, {M}\'etis and {I}nuit},
2020
year = {2016},
2121
url = {https://www12.statcan.gc.ca/census-recensement/2016/as-sa/98-200-x/2016022/98-200-x2016022-eng.cfm}
2222
}
@@ -94,10 +94,10 @@ @Manual{Rlanguage
9494
url = {https://www.R-project.org/},
9595
}
9696

97-
@misc{tidyversestyleguide,
97+
@book{tidyversestyleguide,
9898
year = {2020},
9999
author = {Hadley Wickham},
100-
title = {The tidyverse style guide},
100+
title = {The Tidyverse Style Guide},
101101
url = {https://style.tidyverse.org/}
102102
}
103103

@@ -120,7 +120,7 @@ @book{wilson2018
120120

121121
@article{walker2017,
122122
author = {Nick Walker},
123-
title = {Mapping Indigenous languages in Canada},
123+
title = {Mapping indigenous languages in {C}anada},
124124
journal = {Canadian Geographic},
125125
year = {2017},
126126
publisher = {The Royal Canadian Geographical Society},
@@ -143,17 +143,17 @@ @misc{penguinsimage
143143
url = {https://upload.wikimedia.org/wikipedia/commons/0/00/Brown_Bluff-2016-Tabarin_Peninsula%E2%80%93Gentoo_penguin_%28Pygoscelis_papua%29_03.jpg}}
144144

145145
@book{children2012,
146-
title={They came for the children: Canada, aboriginal peoples, and the residential schools},
146+
title={They Came for the Children: Canada, Aboriginal Peoples, and the Residential Schools},
147147
author={{Truth and Reconciliation Commission of Canada}},
148148
year={2012},
149149
publisher = {Public Works \& Government Services Canada}
150150
}
151151

152-
@misc{calls2015,
153-
title={Calls to Action},
152+
@book{calls2015,
153+
title={Calls to {A}ction},
154154
author={{Truth and Reconciliation Commission of Canada}},
155155
year={2015},
156-
url={http://trc.ca/assets/pdf/Calls_to_Action_English2.pdf}
156+
url={https://www2.gov.bc.ca/assets/gov/british-columbians-our-governments/indigenous-people/aboriginal-peoples-documents/calls_to_action_english2.pdf}
157157
}
158158

159159
@misc{maunadata,

regression1.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,7 @@ is beyond the scope of this chapter; but roughly, if your estimated mean is 100,
442442
error is 1,000, you can expect the *true* RMSPE to be somewhere roughly between 99,000 and 101,000 (although it may
443443
fall outside this range). You may ignore the other columns in the metrics data frame,
444444
as they do not provide any additional insight.
445-
\index{cross validation!collect\_metrics}
445+
\index{cross-validation!collect\_metrics}
446446

447447
```{r 07-choose-k-knn-results}
448448
gridvals <- tibble(neighbors = seq(from = 1, to = 200, by = 3))

version-control.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@ Once you reach a point that you want Git to keep a record
203203
of the current version of your work, you need to commit
204204
(i.e., snapshot) your changes. A prerequisite to this is telling Git which
205205
files should be included in that snapshot. We call this step **adding** the
206-
files to the **staging area**. \index{git!add,staging area}
206+
files to the **staging area**. \index{git!add, staging area}
207207
Note that the staging area is not a real physical location on your computer;
208208
it is instead a conceptual placeholder for these files until they are committed.
209209
The benefit of the Git version control system using a staging area is that you

viz.Rmd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@ curated by [Dr. Pieter Tans, NOAA/GML](https://www.esrl.noaa.gov/gmd/staff/Piete
203203
and [Dr. Ralph Keeling, Scripps Institution of Oceanography,](https://scrippsco2.ucsd.edu/)
204204
records the atmospheric concentration of carbon dioxide
205205
(CO$_{\text{2}}$, in parts per million)
206-
at the Mauna Loa research station in \index{Mauna Loa CO2} Hawaii
206+
at the Mauna Loa research station in \index{Mauna Loa} Hawaii
207207
from 1959 onward [@maunadata].
208208
For this book, we are going to focus on the last 40 years of the data set,
209209
1980-2020.
@@ -665,11 +665,11 @@ to assess a few key characteristics of the data:
665665

666666
- **Direction:** if the y variable tends to increase when the x variable increases, then y has a **positive** relationship with x. If
667667
y tends to decrease when x increases, then y has a **negative** relationship with x. If y does not meaningfully increase or decrease
668-
as x increases, then y has **little or no** relationship with x. \index{relationship!positive,negative,none}
668+
as x increases, then y has **little or no** relationship with x. \index{relationship!positive, negative, none}
669669
- **Strength:** if the y variable *reliably* increases, decreases, or stays flat as x increases,
670-
then the relationship is **strong**. Otherwise, the relationship is **weak**. Intuitively, \index{relationship!strong,weak}
670+
then the relationship is **strong**. Otherwise, the relationship is **weak**. Intuitively, \index{relationship!strong, weak}
671671
the relationship is strong when the scatter points are close together and look more like a "line" or "curve" than a "cloud."
672-
- **Shape:** if you can draw a straight line roughly through the data points, the relationship is **linear**. Otherwise, it is **nonlinear**. \index{relationship!linear,nonlinear}
672+
- **Shape:** if you can draw a straight line roughly through the data points, the relationship is **linear**. Otherwise, it is **nonlinear**. \index{relationship!linear, nonlinear}
673673

674674
In Figure \@ref(fig:03-mother-tongue-vs-most-at-home-scale-props), we see that
675675
as the percentage of people who have a language as their mother tongue increases,
@@ -1335,7 +1335,7 @@ a story:
13351335
Below are two examples of how one might take these four steps in describing the example visualizations that appeared earlier in this chapter.
13361336
Each of the steps is denoted by its numeral in parentheses, e.g. (3).
13371337

1338-
**Mauna Loa Atmospheric CO$_{\text{2}}$ Measurements:** (1) \index{Mauna Loa CO2} Many
1338+
**Mauna Loa Atmospheric CO$_{\text{2}}$ Measurements:** (1) \index{Mauna Loa} Many
13391339
current forms of energy generation and conversion&mdash;from automotive
13401340
engines to natural gas power plants&mdash;rely on burning fossil fuels and produce
13411341
greenhouse gases, typically primarily carbon dioxide (CO$_{\text{2}}$), as a

0 commit comments

Comments
 (0)