cls1 index

trevorcampbell · trevorcampbell · commit 11256f454b78 · 2023-11-16T13:00:47.000-08:00
diff --git a/source/classification1.Rmd b/source/classification1.Rmd
@@ -1295,7 +1295,7 @@ upsampled_plot
 
 ### Missing data
 
-One of the most common issues in real data sets in the wild is *missing data*,
+One of the most common issues in real data sets in the wild is *missing data*,\index{missing data}
 i.e., observations where the values of some of the variables were not recorded.
 Unfortunately, as common as it is, handling missing data properly is very
 challenging and generally relies on expert knowledge about the data, setting,
@@ -1329,7 +1329,7 @@ data.  So how can we perform K-nearest neighbors classification in the presence
 of missing data?  Well, since there are not too many observations with missing
 entries, one option is to simply remove those observations prior to building
 the K-nearest neighbors classifier. We can accomplish this by using the
-`drop_na` function from `tidyverse` prior to working with the data.
+`drop_na` function from `tidyverse` prior to working with the data.\label{missing data!drop\_na}
 
 ```{r 05-naomit}
 no_missing_cancer <- missing_cancer |> drop_na()
@@ -1342,7 +1342,8 @@ possible approach is to *impute* the missing entries, i.e., fill in synthetic
 values based on the other observations in the data set. One reasonable choice
 is to perform *mean imputation*, where missing entries are filled in using the
 mean of the present entries in each variable. To perform mean imputation, we
-add the `step_impute_mean` step to the `tidymodels` preprocessing recipe.
+add the `step_impute_mean` \index{recipe!step\_impute\_mean}\index{missing data!mean imputation}
+step to the `tidymodels` preprocessing recipe.
 ```{r 05-impute, results=FALSE, message=FALSE, echo=TRUE}
 impute_missing_recipe <- recipe(Class ~ ., data = missing_cancer) |>
   step_impute_mean(all_predictors()) |>