Merge pull request #400 from UBC-DSCI/index-refs-edits

trevorcampbell · web-flow · commit 47d458a704ee · 2021-12-08T16:56:51.000-08:00
Copyedits for references, index
diff --git a/classification2.Rmd b/classification2.Rmd
@@ -643,7 +643,7 @@ knitr::include_graphics("img/cv.png")
 ```
 
 To perform 5-fold cross-validation in R with `tidymodels`, we use another
-function: `vfold_cv`. \index{tidymodels!vfold\_cv}\index{cross validation!vfold\_cv} This function splits our training data into `v` folds
+function: `vfold_cv`. \index{tidymodels!vfold\_cv}\index{cross-validation!vfold\_cv} This function splits our training data into `v` folds
 automatically. We set the `strata` argument to the categorical label variable
 (here, `Class`) to ensure that the training and validation subsets contain the
 right proportions of each category of observation.
@@ -653,7 +653,7 @@ cancer_vfold <- vfold_cv(cancer_train, v = 5, strata = Class)
 cancer_vfold
 ```
 
-Then, when we create our data analysis workflow, we use the `fit_resamples` function \index{cross validation!fit\_resamples}\index{tidymodels!fit\_resamples}
+Then, when we create our data analysis workflow, we use the `fit_resamples` function \index{cross-validation!fit\_resamples}\index{tidymodels!fit\_resamples}
 instead of the `fit` function for training. This runs cross-validation on each
 train/validation split. 
 
@@ -679,7 +679,7 @@ knn_fit <- workflow() |>
 knn_fit
 ```
 
-The `collect_metrics` \index{tidymodels!collect\_metrics}\index{cross validation!collect\_metrics} function is used to aggregate the *mean* and *standard error*
+The `collect_metrics` \index{tidymodels!collect\_metrics}\index{cross-validation!collect\_metrics} function is used to aggregate the *mean* and *standard error*
 of the classifier's validation accuracy across the folds. You will find results
 related to the accuracy in the row with `accuracy` listed under the `.metric` column. 
 You should consider the mean (`mean`) to be the estimated accuracy, while the standard 
@@ -747,7 +747,7 @@ knn_spec <- nearest_neighbor(weight_func = "rectangular",
   set_mode("classification")
 ```
 
-Then instead of using `fit` or `fit_resamples`, we will use the `tune_grid` function \index{cross validation!tune\_grid}\index{tidymodels!tune\_grid}
+Then instead of using `fit` or `fit_resamples`, we will use the `tune_grid` function \index{cross-validation!tune\_grid}\index{tidymodels!tune\_grid}
 to fit the model for each value in a range of parameter values. 
 In particular, we first create a data frame with a `neighbors`
 variable that contains the sequence of values of $K$ to try; below we create the `k_vals`
diff --git a/clustering.Rmd b/clustering.Rmd
@@ -591,7 +591,7 @@ These, however, are beyond the scope of this book.
 
 ### Random restarts
 
-Unlike the classification and regression models we studied in previous chapters, K-means \index{K-means!restart,nstart} can get "stuck" in a bad solution.
+Unlike the classification and regression models we studied in previous chapters, K-means \index{K-means!restart, nstart} can get "stuck" in a bad solution.
 For example, Figure \@ref(fig:10-toy-kmeans-bad-init) illustrates an unlucky random initialization by K-means.
 
 ```{r 10-toy-kmeans-bad-init, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 3.5, fig.width = 3.75, fig.align = "center", fig.cap = "Random initialization of labels."}
@@ -859,7 +859,7 @@ each other. Therefore, the *scale* of each of the variables in the data
 will influence which cluster data points end up being assigned.
 Variables with a large scale will have a much larger 
 effect on deciding cluster assignment than variables with a small scale. 
-To address this problem, we typically standardize \index{standardization!K-means}\index{K-means!stanardization} our data before clustering,
+To address this problem, we typically standardize \index{standardization!K-means}\index{K-means!standardization} our data before clustering,
 which ensures that each variable has a mean of 0 and standard deviation of 1.
 The `scale` function in R can be used to do this. 
 We show an example of how to use this function 
@@ -1050,7 +1050,7 @@ But why is there a "bump" in the total WSSD plot here?
 Shouldn't total WSSD always decrease as we add more clusters? 
 Technically yes, but remember:  K-means can get "stuck" in a bad solution. 
 Unfortunately, for K = 8 we had an unlucky initialization
-and found a bad clustering! \index{K-means!restart,nstart} 
+and found a bad clustering! \index{K-means!restart, nstart} 
 We can help prevent finding a bad clustering 
 by trying a few different random initializations 
 via the `nstart` argument (Figure \@ref(fig:10-choose-k-nstart) 
diff --git a/jupyter.Rmd b/jupyter.Rmd
@@ -144,7 +144,7 @@ that indicates the status of your kernel. If the circle is empty (`r fa("circle"
 the kernel is idle and ready to execute code. If the circle is filled in (`r fa("circle", fill = "black", stroke = "black", stroke_width = "10px", height = "12px")`), 
 the kernel is busy running some code.
 
-You may run into problems where your kernel \index{kernel!interrupt,restart} is stuck for an excessive amount 
+You may run into problems where your kernel \index{kernel!interrupt, restart} is stuck for an excessive amount 
 of time, your notebook is very slow and unresponsive, or your kernel loses its
 connection. If this happens, try the following steps:
 
diff --git a/reading.Rmd b/reading.Rmd
@@ -61,7 +61,7 @@ into R, but before we can talk about *how* we read the data into R with these
 functions, we first need to talk about *where* the data lives. When you load a
 data set into R, you first need to tell R where those files live. The file
 could live on your  computer (*local*) 
-\index{location|see{path}} \index{path!local,remote,relative,absolute} 
+\index{location|see{path}} \index{path!local, remote, relative, absolute} 
 or somewhere on the internet (*remote*). 
 
 The place where the file lives on your computer is called the "path". You can
diff --git a/references.bib b/references.bib
@@ -16,7 +16,7 @@ @misc{cancensus2016
 
 @misc{language2016,
   author = {{Statistics Canada}},
-  title = {The Aboriginal languages of First Nations people, M\'etis and Inuit},
+  title = {The {A}boriginal languages of {F}irst {N}ations people, {M}\'etis and {I}nuit},
   year = {2016},
   url = {https://www12.statcan.gc.ca/census-recensement/2016/as-sa/98-200-x/2016022/98-200-x2016022-eng.cfm}
 }
@@ -94,10 +94,10 @@ @Manual{Rlanguage
     url = {https://www.R-project.org/},
   }
 
-@misc{tidyversestyleguide,
+@book{tidyversestyleguide,
   year = {2020},
   author = {Hadley Wickham},
-  title = {The tidyverse style guide},
+  title = {The Tidyverse Style Guide},
   url = {https://style.tidyverse.org/}
 }
 
@@ -120,7 +120,7 @@ @book{wilson2018
 
 @article{walker2017,
   author = {Nick Walker},
-  title = {Mapping Indigenous languages in Canada},
+  title = {Mapping indigenous languages in {C}anada},
   journal = {Canadian Geographic},
   year = {2017},
   publisher = {The Royal Canadian Geographical Society},
@@ -143,17 +143,17 @@ @misc{penguinsimage
     url = {https://upload.wikimedia.org/wikipedia/commons/0/00/Brown_Bluff-2016-Tabarin_Peninsula%E2%80%93Gentoo_penguin_%28Pygoscelis_papua%29_03.jpg}}
 
 @book{children2012,
-  title={They came for the children: Canada, aboriginal peoples, and the residential schools},
+  title={They Came for the Children: Canada, Aboriginal Peoples, and the Residential Schools},
   author={{Truth and Reconciliation Commission of Canada}},
   year={2012},
   publisher = {Public Works \& Government Services Canada}
 }
 
-@misc{calls2015,
-  title={Calls to Action},
+@book{calls2015,
+  title={Calls to {A}ction},
   author={{Truth and Reconciliation Commission of Canada}},
   year={2015},
-  url={http://trc.ca/assets/pdf/Calls_to_Action_English2.pdf}
+  url={https://www2.gov.bc.ca/assets/gov/british-columbians-our-governments/indigenous-people/aboriginal-peoples-documents/calls_to_action_english2.pdf}
 }
 
 @misc{maunadata,
diff --git a/regression1.Rmd b/regression1.Rmd
@@ -442,7 +442,7 @@ is beyond the scope of this chapter; but roughly, if your estimated mean is 100,
 error is 1,000, you can expect the *true* RMSPE to be somewhere roughly between 99,000 and 101,000 (although it may
 fall outside this range). You may ignore the other columns in the metrics data frame,
 as they do not provide any additional insight.
-\index{cross validation!collect\_metrics}
+\index{cross-validation!collect\_metrics}
 
 ```{r 07-choose-k-knn-results}
 gridvals <- tibble(neighbors = seq(from = 1, to = 200, by = 3))
diff --git a/version-control.Rmd b/version-control.Rmd
@@ -203,7 +203,7 @@ Once you reach a point that you want Git to keep a record
 of the current version of your work, you need to commit 
 (i.e., snapshot) your changes. A prerequisite to this is telling Git which
 files should be included in that snapshot. We call this step **adding** the 
-files to the **staging area**. \index{git!add,staging area}
+files to the **staging area**. \index{git!add, staging area}
 Note that the staging area is not a real physical location on your computer; 
 it is instead a conceptual placeholder for these files until they are committed.
 The benefit of the Git version control system using a staging area is that you 
diff --git a/viz.Rmd b/viz.Rmd
@@ -203,7 +203,7 @@ curated by [Dr. Pieter Tans, NOAA/GML](https://www.esrl.noaa.gov/gmd/staff/Piete
 and [Dr. Ralph Keeling, Scripps Institution of Oceanography,](https://scrippsco2.ucsd.edu/)
 records the atmospheric concentration of carbon dioxide 
 (CO$_{\text{2}}$, in parts per million) 
-at the Mauna Loa research station in \index{Mauna Loa CO2} Hawaii 
+at the Mauna Loa research station in \index{Mauna Loa} Hawaii 
 from 1959 onward [@maunadata].
 For this book, we are going to focus on the last 40 years of the data set,
 1980-2020.
@@ -665,11 +665,11 @@ to assess a few key characteristics of the data:
 
 - **Direction:** if the y variable tends to increase when the x variable increases, then y has a **positive** relationship with x. If 
   y tends to decrease when x increases, then y has a **negative** relationship with x. If y does not meaningfully increase or decrease 
-  as x increases, then y has **little or no** relationship with x. \index{relationship!positive,negative,none}
+  as x increases, then y has **little or no** relationship with x. \index{relationship!positive, negative, none}
 - **Strength:** if the y variable *reliably* increases, decreases, or stays flat as x increases,
-  then the relationship is **strong**. Otherwise, the relationship is **weak**. Intuitively, \index{relationship!strong,weak}
+  then the relationship is **strong**. Otherwise, the relationship is **weak**. Intuitively, \index{relationship!strong, weak}
   the relationship is strong when the scatter points are close together and look more like a "line" or "curve" than a "cloud."
-- **Shape:** if you can draw a straight line roughly through the data points, the relationship is **linear**. Otherwise, it is **nonlinear**. \index{relationship!linear,nonlinear}
+- **Shape:** if you can draw a straight line roughly through the data points, the relationship is **linear**. Otherwise, it is **nonlinear**. \index{relationship!linear, nonlinear}
 
 In Figure \@ref(fig:03-mother-tongue-vs-most-at-home-scale-props), we see that 
 as the percentage of people who have a language as their mother tongue increases, 
@@ -1335,7 +1335,7 @@ a story:
 Below are two examples of how one might take these four steps in describing the example visualizations that appeared earlier in this chapter.
 Each of the steps is denoted by its numeral in parentheses, e.g. (3).
 
-**Mauna Loa Atmospheric CO$_{\text{2}}$ Measurements:** (1) \index{Mauna Loa CO2} Many 
+**Mauna Loa Atmospheric CO$_{\text{2}}$ Measurements:** (1) \index{Mauna Loa} Many 
 current forms of energy generation and conversion&mdash;from automotive
 engines to natural gas power plants&mdash;rely on burning fossil fuels and produce
 greenhouse gases, typically primarily carbon dioxide (CO$_{\text{2}}$), as a