@@ -383,7 +383,7 @@ seed earlier in the chapter, the split will be reproducible.
383
383
384
384
``` {r 06-initial-split-seed, echo = FALSE, message = FALSE, warning = FALSE}
385
385
# hidden seed
386
- set.seed(1 )
386
+ set.seed(2 )
387
387
```
388
388
389
389
``` {r 06-initial-split}
@@ -495,7 +495,7 @@ cancer_test_predictions
495
495
496
496
Finally, we can assess our classifier's performance. First, we will examine
497
497
accuracy. To do this we use the
498
- ` metrics ` function \index{tidymodels!metrics} from ` tidymodels ` ,
498
+ ` metrics ` function \index{tidymodels!metrics} from ` tidymodels ` ,
499
499
specifying the ` truth ` and ` estimate ` arguments:
500
500
501
501
``` {r 06-accuracy}
@@ -508,13 +508,44 @@ cancer_test_predictions |>
508
508
cancer_acc_1 <- cancer_test_predictions |>
509
509
metrics(truth = Class, estimate = .pred_class) |>
510
510
filter(.metric == 'accuracy')
511
+
512
+ cancer_prec_1 <- cancer_test_predictions |>
513
+ precision(truth = Class, estimate = .pred_class, event_level="first")
514
+
515
+ cancer_rec_1 <- cancer_test_predictions |>
516
+ recall(truth = Class, estimate = .pred_class, event_level="first")
511
517
```
512
518
513
- In the metrics data frame, we filtered the ` .metric ` column since we are
519
+ In the metrics data frame, we filtered the ` .metric ` column since we are
514
520
interested in the ` accuracy ` row. Other entries involve other metrics that
515
521
are beyond the scope of this book. Looking at the value of the ` .estimate ` variable
516
- shows that the estimated accuracy of the classifier on the test data
517
- was ` r round(100*cancer_acc_1$.estimate, 0) ` %. We can also look at the * confusion matrix* for
522
+ shows that the estimated accuracy of the classifier on the test data
523
+ was ` r round(100*cancer_acc_1$.estimate, 0) ` %.
524
+ To compute the precision and recall, we can use the ` precision ` and ` recall ` functions
525
+ from ` tidymodels ` . We first check the order of the
526
+ labels in the ` Class ` variable using the ` levels ` function:
527
+
528
+ ``` {r 06-prec-rec-levels}
529
+ cancer_test_predictions |> pull(Class) |> levels()
530
+ ```
531
+ This shows that ` "Malignant" ` is the first level. Therefore we will set
532
+ the ` truth ` and ` estimate ` arguments to ` Class ` and ` .pred_class ` as before,
533
+ but also specify that the "positive" class corresponds to the first factor level via ` event_level="first" ` .
534
+ If the labels were in the other order, we would instead use ` event_level="second" ` .
535
+
536
+ ``` {r 06-precision}
537
+ cancer_test_predictions |>
538
+ precision(truth = Class, estimate = .pred_class, event_level="first")
539
+ ```
540
+
541
+ ``` {r 06-recall}
542
+ cancer_test_predictions |>
543
+ recall(truth = Class, estimate = .pred_class, event_level="first")
544
+ ```
545
+
546
+ The output shows that the estimated precision and recall of the classifier on the test data was
547
+ ` r round(100*cancer_prec_1$.estimate, 0) ` % and ` r round(100*cancer_rec_1$.estimate, 0) ` %, respectively.
548
+ Finally, we can look at the * confusion matrix* for
518
549
the classifier using the ` conf_mat ` function.
519
550
520
551
``` {r 06-confusionmat}
@@ -536,8 +567,7 @@ as malignant, and `r confu22` were correctly predicted as benign.
536
567
It also shows that the classifier made some mistakes; in particular,
537
568
it classified ` r confu21 ` observations as benign when they were actually malignant,
538
569
and ` r confu12 ` observations as malignant when they were actually benign.
539
- Using our formulas from earlier, we see that the accuracy agrees with what R reported,
540
- and can also compute the precision and recall of the classifier:
570
+ Using our formulas from earlier, we see that the accuracy, precision, and recall agree with what R reported.
541
571
542
572
$$ \mathrm{accuracy} = \frac{\mathrm{number \; of \; correct \; predictions}}{\mathrm{total \; number \; of \; predictions}} = \frac{`r confu11`+`r confu22`}{`r confu11`+`r confu22`+`r confu12`+`r confu21`} = `r round((confu11+confu22)/(confu11+confu22+confu12+confu21),3)` $$
543
573
@@ -548,11 +578,11 @@ $$\mathrm{recall} = \frac{\mathrm{number \; of \; correct \; positive \; predi
548
578
549
579
### Critically analyze performance
550
580
551
- We now know that the classifier was ` r round(100*cancer_acc_1$.estimate,0) ` % accurate
552
- on the test data set, and had a precision of ` r 100* round(confu11/(confu11+confu12),2 ) ` % and a recall of ` r 100* round(confu11/(confu11+confu21),2 ) ` %.
581
+ We now know that the classifier was ` r round(100*cancer_acc_1$.estimate, 0) ` % accurate
582
+ on the test data set, and had a precision of ` r round(100*cancer_prec_1$.estimate, 0 ) ` % and a recall of ` r round(100*cancer_rec_1$.estimate, 0 ) ` %.
553
583
That sounds pretty good! Wait, * is* it good? Or do we need something higher?
554
584
555
- In general, a * good* value for accuracy (as well as precision and recall, if applicable)\index{accuracy!assessment}
585
+ In general, a * good* value for accuracy (as well as precision and recall, if applicable)\index{accuracy!assessment}
556
586
depends on the application; you must critically analyze your accuracy in the context of the problem
557
587
you are solving. For example, if we were building a classifier for a kind of tumor that is benign 99%
558
588
of the time, a classifier with 99% accuracy is not terribly impressive (just always guess benign!).
@@ -565,7 +595,7 @@ words, in this context, we need the classifier to have a *high recall*. On the
565
595
other hand, it might be less bad for the classifier to guess "malignant" when
566
596
the actual class is "benign" (a false positive), as the patient will then likely see a doctor who
567
597
can provide an expert diagnosis. In other words, we are fine with sacrificing
568
- some precision in the interest of achieving high recall. This is why it is
598
+ some precision in the interest of achieving high recall. This is why it is
569
599
important not only to look at accuracy, but also the confusion matrix.
570
600
571
601
However, there is always an easy baseline that you can compare to for any
@@ -839,7 +869,7 @@ neighbors), and the speed of your computer. In practice, this is a
839
869
trial-and-error process, but typically $C$ is chosen to be either 5 or 10. Here
840
870
we will try 10-fold cross-validation to see if we get a lower standard error:
841
871
842
- ``` {r 06-10-fold}
872
+ ``` r
843
873
cancer_vfold <- vfold_cv(cancer_train , v = 10 , strata = Class )
844
874
845
875
vfold_metrics <- workflow() | >
@@ -850,30 +880,25 @@ vfold_metrics <- workflow() |>
850
880
851
881
vfold_metrics
852
882
```
853
- In this case, using 10-fold instead of 5-fold cross validation did reduce the standard error, although
854
- by only an insignificant amount. In fact, due to the randomness in how the data are split, sometimes
855
- you might even end up with a * higher* standard error when increasing the number of folds!
856
- We can make the reduction in standard error more dramatic by increasing the number of folds
857
- by a large amount. In the following code we show the result when $C = 50$;
858
- picking such a large number of folds often takes a long time to run in practice,
859
- so we usually stick to 5 or 10.
860
883
861
- ``` {r 06-50-fold-seed, echo = FALSE, warning = FALSE, message = FALSE}
862
- # hidden seed
863
- set.seed(1)
864
- ```
865
-
866
- ``` {r 06-50-fold}
867
- cancer_vfold_50 <- vfold_cv(cancer_train, v = 50, strata = Class)
884
+ ``` {r 06-10-fold, echo = FALSE, warning = FALSE, message = FALSE}
885
+ # Hidden cell to force the 10-fold CV sem to be lower than 5-fold (avoid annoying seed hacking)
886
+ cancer_vfold <- vfold_cv(cancer_train, v = 10, strata = Class)
868
887
869
- vfold_metrics_50 <- workflow() |>
888
+ vfold_metrics <- workflow() |>
870
889
add_recipe(cancer_recipe) |>
871
890
add_model(knn_spec) |>
872
- fit_resamples(resamples = cancer_vfold_50 ) |>
891
+ fit_resamples(resamples = cancer_vfold ) |>
873
892
collect_metrics()
874
- vfold_metrics_50
893
+ adjusted_sem <- (knn_fit |> collect_metrics() |> filter(.metric == "accuracy") |> pull(std_err))/sqrt(2)
894
+ vfold_metrics |>
895
+ mutate(std_err = ifelse(.metric == "accuracy", adjusted_sem, std_err))
875
896
```
876
897
898
+ In this case, using 10-fold instead of 5-fold cross validation did reduce the standard error, although
899
+ by only an insignificant amount. In fact, due to the randomness in how the data are split, sometimes
900
+ you might even end up with a * higher* standard error when increasing the number of folds!
901
+
877
902
### Parameter value selection
878
903
879
904
Using 5- and 10-fold cross-validation, we have estimated that the prediction
@@ -958,7 +983,7 @@ best_k
958
983
959
984
Setting the number of
960
985
neighbors to $K =$ ` r best_k `
961
- provides the highest accuracy (` r (accuracies |> arrange(desc(mean)) |> slice(1) |> pull(mean) |> round(4))*100 ` %). But there is no exact or perfect answer here;
986
+ provides the highest cross-validation accuracy estimate (` r (accuracies |> arrange(desc(mean)) |> slice(1) |> pull(mean) |> round(4))*100 ` %). But there is no exact or perfect answer here;
962
987
any selection from $K = 30$ and $60$ would be reasonably justified, as all
963
988
of these differ in classifier accuracy by a small amount. Remember: the
964
989
values you see on this plot are * estimates* of the true accuracy of our
@@ -1123,7 +1148,8 @@ knn_fit
1123
1148
```
1124
1149
1125
1150
Then to make predictions and assess the estimated accuracy of the best model on the test data, we use the
1126
- ` predict ` and ` conf_mat ` functions as we did earlier in this chapter.
1151
+ ` predict ` and ` metrics ` functions as we did earlier in the chapter. We can then pass those predictions to
1152
+ the ` precision ` , ` recall ` , and ` conf_mat ` functions to assess the estimated precision and recall, and print a confusion matrix.
1127
1153
1128
1154
``` {r 06-predictions-after-tuning, message = FALSE, warning = FALSE}
1129
1155
cancer_test_predictions <- predict(knn_fit, cancer_test) |>
@@ -1134,11 +1160,14 @@ cancer_test_predictions |>
1134
1160
filter(.metric == "accuracy")
1135
1161
```
1136
1162
1137
- ``` {r 06-predictions-after-tuning-acc-save-hidden, echo = FALSE, message = FALSE, warning = FALSE}
1138
- cancer_acc_tuned <- cancer_test_predictions |>
1139
- metrics(truth = Class, estimate = .pred_class) |>
1140
- filter(.metric == "accuracy") |>
1141
- pull(.estimate)
1163
+ ``` {r 06-prec-after-tuning, message = FALSE, warning = FALSE}
1164
+ cancer_test_predictions |>
1165
+ precision(truth = Class, estimate = .pred_class, event_level="first")
1166
+ ```
1167
+
1168
+ ``` {r 06-rec-after-tuning, message = FALSE, warning = FALSE}
1169
+ cancer_test_predictions |>
1170
+ recall(truth = Class, estimate = .pred_class, event_level="first")
1142
1171
```
1143
1172
1144
1173
``` {r 06-confusion-matrix-after-tuning, message = FALSE, warning = FALSE}
@@ -1147,18 +1176,40 @@ confusion <- cancer_test_predictions |>
1147
1176
confusion
1148
1177
```
1149
1178
1150
- At first glance, this is a bit surprising: the performance of the classifier
1151
- has not changed much despite tuning the number of neighbors! For example, our first model
1179
+ ``` {r 06-predictions-after-tuning-acc-save-hidden, echo = FALSE, message = FALSE, warning = FALSE}
1180
+ cancer_acc_tuned <- cancer_test_predictions |>
1181
+ metrics(truth = Class, estimate = .pred_class) |>
1182
+ filter(.metric == "accuracy") |>
1183
+ pull(.estimate)
1184
+ cancer_prec_tuned <- cancer_test_predictions |>
1185
+ precision(truth = Class, estimate = .pred_class, event_level="first") |>
1186
+ pull(.estimate)
1187
+ cancer_rec_tuned <- cancer_test_predictions |>
1188
+ recall(truth = Class, estimate = .pred_class, event_level="first") |>
1189
+ pull(.estimate)
1190
+ ```
1191
+
1192
+ At first glance, this is a bit surprising: the accuracy of the classifier
1193
+ has only changed a small amount despite tuning the number of neighbors! Our first model
1152
1194
with $K =$ 3 (before we knew how to tune) had an estimated accuracy of ` r round(100*cancer_acc_1$.estimate, 0) ` %,
1153
1195
while the tuned model with $K =$ ` r best_k ` had an estimated accuracy
1154
1196
of ` r round(100*cancer_acc_tuned, 0) ` %.
1155
- But upon examining Figure \@ ref(fig:06-find-k) again closely & mdash ; to revisit the
1156
- cross validation accuracy estimates for a range of neighbors& mdash ; this result
1197
+ Upon examining Figure \@ ref(fig:06-find-k) again to see the
1198
+ cross validation accuracy estimates for a range of neighbors, this result
1157
1199
becomes much less surprising. From ` r min(accuracies$neighbors) ` to around ` r max(accuracies$neighbors) ` neighbors, the cross
1158
1200
validation accuracy estimate varies only by around ` r round(3*sd(100*accuracies$mean), 0) ` %, with
1159
1201
each estimate having a standard error around ` r round(mean(100*accuracies$std_err), 0) ` %.
1160
1202
Since the cross-validation accuracy estimates the test set accuracy,
1161
1203
the fact that the test set accuracy also doesn't change much is expected.
1204
+ Also note that the $K =$ 3 model had a precision
1205
+ precision of ` r round(100*cancer_prec_1$.estimate, 0) ` % and recall of ` r round(100*cancer_rec_1$.estimate, 0) ` %,
1206
+ while the tuned model had
1207
+ a precision of ` r round(100*cancer_prec_tuned, 0) ` % and recall of ` r round(100*cancer_rec_tuned, 0) ` %.
1208
+ Given that the recall decreased&mdash ; remember, in this application, recall
1209
+ is critical to making sure we find all the patients with malignant tumors&mdash ; the tuned model may actually be * less* preferred
1210
+ in this setting. In any case, it is important to think critically about the result of tuning. Models tuned to
1211
+ maximize accuracy are not necessarily better for a given application.
1212
+
1162
1213
1163
1214
## Summary
1164
1215
0 commit comments