You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
more folds we choose, the more computation it takes, and hence the more time
704
704
it takes to run the analysis. So when you do cross-validation, you need to
705
705
consider the size of the data, and the speed of the algorithm (e.g., $K$-nearest
706
-
neighbor) and the speed of your computer. In practice, this is a trial and
707
-
error process, but typically $C$ is chosen to be either 5 or 10. Here we show
708
-
how the standard error decreases when we use 10-fold crossvalidation rather
706
+
neighbor) and the speed of your computer. In practice, this is a
707
+
trial-and-error process, but typically $C$ is chosen to be either 5 or 10. Here we show
708
+
how the standard error decreases when we use 10-fold cross-validation rather
709
709
than 5-fold:
710
710
711
711
```{r 06-10-fold}
@@ -800,9 +800,9 @@ that doesn't mean the classifier is actually more accurate with this parameter
800
800
value! Generally, when selecting $K$ (and other parameters for other predictive
801
801
models), we are looking for a value where:
802
802
803
-
- we get roughly optimal accuracy, so that our model will likely be accurate
804
-
- changing the value to a nearby one (e.g., adding or subtracting a small number) doesn't decrease accuracy too much, so that our choice is reliable in the presence of uncertainty
805
-
- the cost of training the model is not prohibitive (e.g., in our situation, if $K$ is too large, predicting becomes expensive!)
803
+
- we get roughly optimal accuracy, so that our model will likely be accurate;
804
+
- changing the value to a nearby one (e.g., adding or subtracting a small number) doesn't decrease accuracy too much, so that our choice is reliable in the presence of uncertainty;
805
+
- the cost of training the model is not prohibitive (e.g., in our situation, if $K$ is too large, predicting becomes expensive!).
806
806
807
807
We know that $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`
808
808
provides the highest estimated accuracy. Further, Figure \@ref(fig:06-find-k) shows that the estimated accuracy
@@ -949,7 +949,7 @@ The overall workflow for performing $K$-nearest neighbors classification using `
1. Use the `initial_split` function to split the data into a training and test set. Set the `strata` argument to the class label variable. Put the test set aside for now.
952
-
2. Use the `vfold_cv` function to split up the training data for crossvalidation.
952
+
2. Use the `vfold_cv` function to split up the training data for cross-validation.
953
953
3. Create a `recipe` that specifies the class label and predictors, as well as preprocessing steps for all variables. Pass the training data as the `data` argument of the recipe.
954
954
4. Create a `nearest_neighbors` model specification, with `neighbors = tune()`.
955
955
5. Add the recipe and model specification to a `workflow()`, and use the `tune_grid` function on the train/validation splits to estimate the classifier accuracy for a range of $K$ values.
1. becomes very slow as the training data gets larger
974
-
2. may not perform well with a large number of predictors
975
-
3. may not perform well when classes are imbalanced
973
+
1. becomes very slow as the training data gets larger,
974
+
2. may not perform well with a large number of predictors, and
975
+
3. may not perform well when classes are imbalanced.
976
976
977
977
## Predictor variable selection
978
978
@@ -1168,9 +1168,9 @@ This procedure is indeed a well-known variable selection method referred to
1168
1168
as *best subset selection*. \index{variable selection!best subset}\index{predictor selection|see{variable selection}}
1169
1169
In particular, you
1170
1170
1171
-
1. create a separate model for every possible subset of predictors
1172
-
2. tune each one using crossvalidation
1173
-
3. pick the subset of predictors that gives you the highest cross-validation accuracy
1171
+
1. create a separate model for every possible subset of predictors,
1172
+
2. tune each one using cross-validation, and
1173
+
3. pick the subset of predictors that gives you the highest cross-validation accuracy.
1174
1174
1175
1175
Best subset selection is applicable to any classification method ($K$-NN or otherwise).
1176
1176
However, it becomes very slow when you have even a moderate
@@ -1190,12 +1190,12 @@ Another idea is to iteratively build up a model by adding one predictor variable
1190
1190
at a time. This method—known as *forward selection*—is also widely \index{variable selection!forward}
1191
1191
applicable and fairly straightforward. It involves the following steps:
1192
1192
1193
-
1.start with a model having no predictors
1194
-
2.run the following 3 steps until you run out of predictors:
1195
-
1.for each unused predictor, add it to the model to form a *candidate model*
1196
-
2.tune all of the candidate models
1197
-
3.update the model to be the candidate model with the highest cross-validation accuracy
1198
-
3.select the model that provides the best trade-off between accuracy and simplicity
1193
+
1.Start with a model having no predictors.
1194
+
2.Run the following 3 steps until you run out of predictors:
1195
+
1.For each unused predictor, add it to the model to form a *candidate model*.
1196
+
2.Tune all of the candidate models.
1197
+
3.Update the model to be the candidate model with the highest cross-validation accuracy.
1198
+
3.Select the model that provides the best trade-off between accuracy and simplicity.
1199
1199
1200
1200
Say you have $m$ total predictors to work with. In the first iteration, you have to make
1201
1201
$m$ candidate models, each with 1 predictor. Then in the second iteration, you have
@@ -1266,7 +1266,7 @@ Finally, we need to write some code that performs the task of sequentially
1266
1266
finding the best predictor to add to the model.
1267
1267
If you recall the end of the wrangling chapter, we mentioned
1268
1268
that sometimes one needs more flexible forms of iteration than what
1269
-
we have used earlier, and in these cases one typically resorts to
1269
+
we have used earlier, and in these cases, one typically resorts to
1270
1270
[a for loop](https://r4ds.had.co.nz/iteration.html#iteration).
1271
1271
This is one of those cases! Here we will use two for loops:
1272
1272
one over increasing predictor set sizes
@@ -1358,7 +1358,7 @@ in Figure \@ref(fig:06-fwdsel-3), i.e., the place on the plot where the accuracy
1358
1358
levels off or begins to decrease. The elbow in Figure \@ref(fig:06-fwdsel-3) appears to occur at the model with
1359
1359
3 predictors; after that point the accuracy levels off. So here the right trade-off of accuracy and number of predictors
1360
1360
occurs with 3 variables: `Class ~ Perimeter + Concavity + Smoothness`. In other words, we have successfully removed irrelevant
1361
-
predictors from the model! It is always worth remembering, however, that what crossvalidation gives you
1361
+
predictors from the model! It is always worth remembering, however, that what cross-validation gives you
1362
1362
is an *estimate* of the true accuracy; you have to use your judgement when looking at this plot to decide
1363
1363
where the elbow occurs, and whether adding a variable provides a meaningful increase in accuracy.
1364
1364
@@ -1388,4 +1388,4 @@ found in Chapter \@ref(move-to-your-own-machine).
1388
1388
1389
1389
## Additional resources
1390
1390
- The [`tidymodels` website](https://tidymodels.org/packages) is an excellent reference for more details on, and advanced usage of, the functions and packages in the past two chapters. Aside from that, it also has a [nice beginner's tutorial](https://www.tidymodels.org/start/) and [an extensive list of more advanced examples](https://www.tidymodels.org/learn/) that you can use to continue learning beyond the scope of this book. It's worth noting that the `tidymodels` package does a lot more than just classification, and so the examples on the website similarly go beyond classification as well. In the next two chapters, you'll learn about another kind of predictive modeling setting, so it might be worth visiting the website only after reading through those chapters.
1391
-
-[An Introduction to Statistical Learning](https://www.statlearning.com/)[-@james2013introduction] provides a great next stop in the process of learning about classification. Chapter 4 discusses additional basic techniques for classification that we do not cover, such as logistic regression, linear discriminant analysis, and naive Bayes. Chapter 5 goes into much more detail about cross-validation. Chapters 8 and 9 cover decision trees and support vector machines, two very popular but more advanced classification methods. Finally, Chapter 6 covers a number of methods for selecting predictor variables. Note that while this book is still a very accessible introductory text, it requires a bit more mathematical background than we require.
1391
+
-[*An Introduction to Statistical Learning*](https://www.statlearning.com/)[-@james2013introduction] provides a great next stop in the process of learning about classification. Chapter 4 discusses additional basic techniques for classification that we do not cover, such as logistic regression, linear discriminant analysis, and naive Bayes. Chapter 5 goes into much more detail about cross-validation. Chapters 8 and 9 cover decision trees and support vector machines, two very popular but more advanced classification methods. Finally, Chapter 6 covers a number of methods for selecting predictor variables. Note that while this book is still a very accessible introductory text, it requires a bit more mathematical background than we require.
0 commit comments