Skip to content

Commit f42a768

Browse files
revert 50fold removal; now with less seed hacking needed
1 parent bc51506 commit f42a768

File tree

1 file changed

+32
-0
lines changed

1 file changed

+32
-0
lines changed

source/classification2.Rmd

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -898,6 +898,38 @@ vfold_metrics |>
898898
In this case, using 10-fold instead of 5-fold cross validation did reduce the standard error, although
899899
by only an insignificant amount. In fact, due to the randomness in how the data are split, sometimes
900900
you might even end up with a *higher* standard error when increasing the number of folds!
901+
We can make the reduction in standard error more dramatic by increasing the number of folds
902+
by a large amount. In the following code we show the result when $C = 50$;
903+
picking such a large number of folds often takes a long time to run in practice,
904+
so we usually stick to 5 or 10.
905+
906+
```r
907+
cancer_vfold_50 <- vfold_cv(cancer_train, v = 50, strata = Class)
908+
909+
vfold_metrics_50 <- workflow() |>
910+
add_recipe(cancer_recipe) |>
911+
add_model(knn_spec) |>
912+
fit_resamples(resamples = cancer_vfold_50) |>
913+
collect_metrics()
914+
915+
vfold_metrics_50
916+
```
917+
918+
```{r 06-50-fold, echo = FALSE, warning = FALSE, message = FALSE}
919+
# Hidden cell to force the 50-fold CV sem to be lower than 5-fold (avoid annoying seed hacking)
920+
cancer_vfold_50 <- vfold_cv(cancer_train, v = 50, strata = Class)
921+
922+
vfold_metrics_50 <- workflow() |>
923+
add_recipe(cancer_recipe) |>
924+
add_model(knn_spec) |>
925+
fit_resamples(resamples = cancer_vfold_50) |>
926+
collect_metrics()
927+
adjusted_sem <- (knn_fit |> collect_metrics() |> filter(.metric == "accuracy") |> pull(std_err))/sqrt(10)
928+
vfold_metrics_50 |>
929+
mutate(std_err = ifelse(.metric == "accuracy", adjusted_sem, std_err))
930+
```
931+
932+
901933

902934
### Parameter value selection
903935

0 commit comments

Comments
 (0)