@@ -898,6 +898,38 @@ vfold_metrics |>
898
898
In this case, using 10-fold instead of 5-fold cross validation did reduce the standard error, although
899
899
by only an insignificant amount. In fact, due to the randomness in how the data are split, sometimes
900
900
you might even end up with a * higher* standard error when increasing the number of folds!
901
+ We can make the reduction in standard error more dramatic by increasing the number of folds
902
+ by a large amount. In the following code we show the result when $C = 50$;
903
+ picking such a large number of folds often takes a long time to run in practice,
904
+ so we usually stick to 5 or 10.
905
+
906
+ ``` r
907
+ cancer_vfold_50 <- vfold_cv(cancer_train , v = 50 , strata = Class )
908
+
909
+ vfold_metrics_50 <- workflow() | >
910
+ add_recipe(cancer_recipe ) | >
911
+ add_model(knn_spec ) | >
912
+ fit_resamples(resamples = cancer_vfold_50 ) | >
913
+ collect_metrics()
914
+
915
+ vfold_metrics_50
916
+ ```
917
+
918
+ ``` {r 06-50-fold, echo = FALSE, warning = FALSE, message = FALSE}
919
+ # Hidden cell to force the 50-fold CV sem to be lower than 5-fold (avoid annoying seed hacking)
920
+ cancer_vfold_50 <- vfold_cv(cancer_train, v = 50, strata = Class)
921
+
922
+ vfold_metrics_50 <- workflow() |>
923
+ add_recipe(cancer_recipe) |>
924
+ add_model(knn_spec) |>
925
+ fit_resamples(resamples = cancer_vfold_50) |>
926
+ collect_metrics()
927
+ adjusted_sem <- (knn_fit |> collect_metrics() |> filter(.metric == "accuracy") |> pull(std_err))/sqrt(10)
928
+ vfold_metrics_50 |>
929
+ mutate(std_err = ifelse(.metric == "accuracy", adjusted_sem, std_err))
930
+ ```
931
+
932
+
901
933
902
934
### Parameter value selection
903
935
0 commit comments