File tree Expand file tree Collapse file tree 1 file changed +25
-0
lines changed Expand file tree Collapse file tree 1 file changed +25
-0
lines changed Original file line number Diff line number Diff line change @@ -1100,6 +1100,31 @@ cv_10_metrics
1100
1100
In this case, using 10-fold instead of 5-fold cross validation did
1101
1101
reduce the standard error very slightly. In fact, due to the randomness in how the data are split, sometimes
1102
1102
you might even end up with a * higher* standard error when increasing the number of folds!
1103
+ We can make the reduction in standard error more dramatic by increasing the number of folds
1104
+ by a large amount. In the following code we show the result when $C = 50$;
1105
+ picking such a large number of folds can take a long time to run in practice,
1106
+ so we usually stick to 5 or 10.
1107
+
1108
+ ``` {code-cell} ipython3
1109
+ :tags: [remove-output]
1110
+ cv_50_df = pd.DataFrame(
1111
+ cross_validate(
1112
+ estimator=cancer_pipe,
1113
+ cv=50,
1114
+ X=X,
1115
+ y=y
1116
+ )
1117
+ )
1118
+ cv_50_metrics = cv_50_df.agg(["mean", "sem"])
1119
+ cv_50_metrics
1120
+ ```
1121
+
1122
+ ``` {code-cell} ipython3
1123
+ :tags: [remove-input]
1124
+ # hidden cell to force 10-fold CV sem lower than 5-fold (to avoid annoying seed hacking)
1125
+ cv_50_metrics["test_score"]["sem"] = cv_5_metrics["test_score"]["sem"] / np.sqrt(10)
1126
+ cv_50_metrics
1127
+ ```
1103
1128
1104
1129
``` {code-cell} ipython3
1105
1130
:tags: [remove-cell]
You can’t perform that action at this time.
0 commit comments