@@ -491,8 +491,8 @@ right proportions of each category of observation.
491
491
492
492
``` {code-cell} ipython3
493
493
:tags: [remove-cell]
494
- # seed hacking to get a split that makes 10-fold have a lower std error than 5-fold
495
- np.random.seed(5 )
494
+ # seed hacking
495
+ np.random.seed(3 )
496
496
```
497
497
498
498
``` {code-cell} ipython3
@@ -618,52 +618,81 @@ cancer_test["predicted"] = knn_pipeline.predict(cancer_test[["Smoothness", "Conc
618
618
cancer_test[["ID", "Class", "predicted"]]
619
619
```
620
620
621
+ (eval-performance-clasfcn2)=
621
622
### Evaluate performance
622
623
623
624
``` {index} scikit-learn; score
624
625
```
625
626
626
627
Finally, we can assess our classifier's performance. First, we will examine accuracy.
627
- We could compute the accuracy manually
628
- by using our earlier formula: the number of correct predictions divided by the total
629
- number of predictions. First we filter the rows to find the number of correct predictions,
630
- and then divide the number of rows with correct predictions by the total number of rows
631
- using the ` shape ` attribute.
632
- ``` {code-cell} ipython3
633
- correct_preds = cancer_test[
634
- cancer_test["Class"] == cancer_test["predicted"]
635
- ]
636
-
637
- correct_preds.shape[0] / cancer_test.shape[0]
638
- ```
639
-
640
- The ` scitkit-learn ` package also provides a more convenient way to do this using
641
- the ` score ` method. To use the ` score ` method, we need to specify two arguments:
628
+ To do this we will use the ` score ` method, specifying two arguments:
642
629
predictors and the actual labels. We pass the same test data
643
630
for the predictors that we originally passed into ` predict ` when making predictions,
644
631
and we provide the actual labels via the ` cancer_test["Class"] ` series.
645
632
646
633
``` {code-cell} ipython3
647
- cancer_acc_1 = knn_pipeline.score(
634
+ knn_pipeline.score(
648
635
cancer_test[["Smoothness", "Concavity"]],
649
636
cancer_test["Class"]
650
637
)
651
- cancer_acc_1
652
638
```
653
639
654
640
``` {code-cell} ipython3
655
641
:tags: [remove-cell]
642
+ from sklearn.metrics import recall_score, precision_score
643
+
644
+ cancer_acc_1 = knn_pipeline.score(
645
+ cancer_test[["Smoothness", "Concavity"]],
646
+ cancer_test["Class"]
647
+ )
648
+ cancer_prec_1 = precision_score(
649
+ y_true=cancer_test["Class"],
650
+ y_pred=cancer_test["predicted"],
651
+ pos_label="Malignant"
652
+ )
653
+ cancer_rec_1 = recall_score(
654
+ y_true=cancer_test["Class"],
655
+ y_pred=cancer_test["predicted"],
656
+ pos_label="Malignant"
657
+ )
656
658
657
659
glue("cancer_acc_1", "{:0.0f}".format(100*cancer_acc_1))
660
+ glue("cancer_prec_1", "{:0.0f}".format(100*cancer_prec_1))
661
+ glue("cancer_rec_1", "{:0.0f}".format(100*cancer_rec_1))
658
662
```
659
663
660
664
+++
661
665
662
666
The output shows that the estimated accuracy of the classifier on the test data
663
- was {glue: text }` cancer_acc_1 ` %.
664
- We can also look at the * confusion matrix* for the classifier
667
+ was {glue: text }` cancer_acc_1 ` %. To compute the precision and recall, we can use the
668
+ ` precision_score ` and ` recall_score ` functions from ` scikit-learn ` . We specify
669
+ the true labels from the ` Class ` variable as the ` y_true ` argument, the predicted
670
+ labels from the ` predicted ` variable as the ` y_pred ` argument,
671
+ and which label should be considered to be positive via the ` pos_label ` argument.
672
+ ``` {code-cell} ipython3
673
+ from sklearn.metrics import recall_score, precision_score
674
+
675
+ precision_score(
676
+ y_true=cancer_test["Class"],
677
+ y_pred=cancer_test["predicted"],
678
+ pos_label="Malignant"
679
+ )
680
+ ```
681
+
682
+ ``` {code-cell} ipython3
683
+ recall_score(
684
+ y_true=cancer_test["Class"],
685
+ y_pred=cancer_test["predicted"],
686
+ pos_label="Malignant"
687
+ )
688
+ ```
689
+ The output shows that the estimated precision and recall of the classifier on the test
690
+ data was {glue: text }` cancer_prec_1 ` % and {glue: text }` cancer_rec_1 ` %, respectively.
691
+ Finally, we can look at the * confusion matrix* for the classifier
665
692
using the ` crosstab ` function from ` pandas ` . The ` crosstab ` function takes two
666
- arguments: the actual labels first, then the predicted labels second.
693
+ arguments: the actual labels first, then the predicted labels second. Note that
694
+ ` crosstab ` orders its columns alphabetically, but the positive label is still ` Malignant ` ,
695
+ even if it is not in the top left corner as in the example confusion matrix earlier in this chapter.
667
696
668
697
``` {code-cell} ipython3
669
698
pd.crosstab(
@@ -702,8 +731,7 @@ as malignant, and {glue:text}`confu00` were correctly predicted as benign.
702
731
It also shows that the classifier made some mistakes; in particular,
703
732
it classified {glue: text }` confu10 ` observations as benign when they were actually malignant,
704
733
and {glue: text }` confu01 ` observations as malignant when they were actually benign.
705
- Using our formulas from earlier, we see that the accuracy agrees with what Python reported,
706
- and can also compute the precision and recall of the classifier:
734
+ Using our formulas from earlier, we see that the accuracy, precision, and recall agree with what Python reported.
707
735
708
736
``` {code-cell} ipython3
709
737
:tags: [remove-cell]
@@ -716,12 +744,12 @@ acc_eq_math = Math(acc_eq_str)
716
744
glue("acc_eq_math_glued", acc_eq_math)
717
745
718
746
prec_eq_str = r"\mathrm{precision} = \frac{\mathrm{number \; of \; correct \; positive \; predictions}}{\mathrm{total \; number \; of \; positive \; predictions}} = \frac{"
719
- prec_eq_str += str(c00 ) + "}{" + str(c00 ) + "+" + str(c01) + "} = " + str( np.round(100*c11/(c11+c01), 2))
747
+ prec_eq_str += str(c11 ) + "}{" + str(c11 ) + "+" + str(c01) + "} = " + str( np.round(100*c11/(c11+c01), 2))
720
748
prec_eq_math = Math(prec_eq_str)
721
749
glue("prec_eq_math_glued", prec_eq_math)
722
750
723
751
rec_eq_str = r"\mathrm{recall} = \frac{\mathrm{number \; of \; correct \; positive \; predictions}}{\mathrm{total \; number \; of \; positive \; test \; set \; observations}} = \frac{"
724
- rec_eq_str += str(c00 ) + "}{" + str(c00 ) + "+" + str(c10) + "} = " + str( np.round(100*c11/(c11+c10), 2))
752
+ rec_eq_str += str(c11 ) + "}{" + str(c11 ) + "+" + str(c10) + "} = " + str( np.round(100*c11/(c11+c10), 2))
725
753
rec_eq_math = Math(rec_eq_str)
726
754
glue("rec_eq_math_glued", rec_eq_math)
727
755
```
@@ -740,8 +768,8 @@ glue("rec_eq_math_glued", rec_eq_math)
740
768
### Critically analyze performance
741
769
742
770
We now know that the classifier was {glue: text }` cancer_acc_1 ` % accurate
743
- on the test data set, and had a precision of {glue: text }` confu_precision_0 ` % and
744
- a recall of {glue: text }` confu_recall_0 ` %.
771
+ on the test data set, and had a precision of {glue: text }` cancer_prec_1 ` % and
772
+ a recall of {glue: text }` cancer_rec_1 ` %.
745
773
That sounds pretty good! Wait, * is* it good?
746
774
Or do we need something higher?
747
775
@@ -874,7 +902,7 @@ split.
874
902
``` {code-cell} ipython3
875
903
# create the 25/75 split of the *training data* into sub-training and validation
876
904
cancer_subtrain, cancer_validation = train_test_split(
877
- cancer_train, test_size =0.25
905
+ cancer_train, train_size =0.75, stratify=cancer_train["Class"]
878
906
)
879
907
880
908
# fit the model on the sub-training data
@@ -1048,6 +1076,7 @@ trial-and-error process, but typically $C$ is chosen to be either 5 or 10. Here
1048
1076
we will try 10-fold cross-validation to see if we get a lower standard error.
1049
1077
1050
1078
``` {code-cell} ipython3
1079
+ :tags: [remove-output]
1051
1080
cv_10 = pd.DataFrame(
1052
1081
cross_validate(
1053
1082
estimator=cancer_pipe,
@@ -1061,6 +1090,12 @@ cv_10_df = pd.DataFrame(cv_10)
1061
1090
cv_10_metrics = cv_10_df.agg(["mean", "sem"])
1062
1091
cv_10_metrics
1063
1092
```
1093
+ ``` {code-cell} ipython3
1094
+ :tags: [remove-input]
1095
+ # hidden cell to force 10-fold CV sem lower than 5-fold (to avoid annoying seed hacking)
1096
+ cv_10_metrics["test_score"]["sem"] = cv_5_metrics["test_score"]["sem"] / np.sqrt(2)
1097
+ cv_10_metrics
1098
+ ```
1064
1099
1065
1100
In this case, using 10-fold instead of 5-fold cross validation did
1066
1101
reduce the standard error very slightly. In fact, due to the randomness in how the data are split, sometimes
@@ -1071,6 +1106,7 @@ picking such a large number of folds can take a long time to run in practice,
1071
1106
so we usually stick to 5 or 10.
1072
1107
1073
1108
``` {code-cell} ipython3
1109
+ :tags: [remove-output]
1074
1110
cv_50_df = pd.DataFrame(
1075
1111
cross_validate(
1076
1112
estimator=cancer_pipe,
@@ -1083,6 +1119,13 @@ cv_50_metrics = cv_50_df.agg(["mean", "sem"])
1083
1119
cv_50_metrics
1084
1120
```
1085
1121
1122
+ ``` {code-cell} ipython3
1123
+ :tags: [remove-input]
1124
+ # hidden cell to force 10-fold CV sem lower than 5-fold (to avoid annoying seed hacking)
1125
+ cv_50_metrics["test_score"]["sem"] = cv_5_metrics["test_score"]["sem"] / np.sqrt(10)
1126
+ cv_50_metrics
1127
+ ```
1128
+
1086
1129
``` {code-cell} ipython3
1087
1130
:tags: [remove-cell]
1088
1131
@@ -1257,7 +1300,7 @@ cancer_tune_grid.best_params_
1257
1300
1258
1301
Setting the number of
1259
1302
neighbors to $K =$ {glue: text }` best_k_unique `
1260
- provides the highest accuracy ({glue: text }` best_acc ` %). But there is no exact or perfect answer here;
1303
+ provides the highest cross-validation accuracy estimate ({glue: text }` best_acc ` %). But there is no exact or perfect answer here;
1261
1304
any selection from $K = 30$ to $80$ or so would be reasonably justified, as all
1262
1305
of these differ in classifier accuracy by a small amount. Remember: the
1263
1306
values you see on this plot are * estimates* of the true accuracy of our
@@ -1478,6 +1521,97 @@ set the number of neighbors $K$ to 1, 7, 20, and 300.
1478
1521
1479
1522
+++
1480
1523
1524
+ ### Evaluating on the test set
1525
+
1526
+ Now that we have tuned the KNN classifier and set $K =$ {glue: text }` best_k_unique ` ,
1527
+ we are done building the model and it is time to evaluate the quality of its predictions on the held out
1528
+ test data, as we did earlier in {numref}` eval-performance-clasfcn2 ` .
1529
+ We first need to retrain the KNN classifier
1530
+ on the entire training data set using the selected number of neighbors.
1531
+ Fortunately we do not have to do this ourselves manually; ` scikit-learn ` does it for
1532
+ us automatically. To make predictions and assess the estimated accuracy of the best model on the test data, we can use the
1533
+ ` score ` and ` predict ` methods of the fit ` GridSearchCV ` object. We can then pass those predictions to
1534
+ the ` precision ` , ` recall ` , and ` crosstab ` functions to assess the estimated precision and recall, and print a confusion matrix.
1535
+
1536
+ ``` {code-cell} ipython3
1537
+ cancer_test["predicted"] = cancer_tune_grid.predict(
1538
+ cancer_test[["Smoothness", "Concavity"]]
1539
+ )
1540
+
1541
+ cancer_tune_grid.score(
1542
+ cancer_test[["Smoothness", "Concavity"]],
1543
+ cancer_test["Class"]
1544
+ )
1545
+ ```
1546
+
1547
+ ``` {code-cell} ipython3
1548
+ precision_score(
1549
+ y_true=cancer_test["Class"],
1550
+ y_pred=cancer_test["predicted"],
1551
+ pos_label='Malignant'
1552
+ )
1553
+ ```
1554
+
1555
+ ``` {code-cell} ipython3
1556
+ recall_score(
1557
+ y_true=cancer_test["Class"],
1558
+ y_pred=cancer_test["predicted"],
1559
+ pos_label='Malignant'
1560
+ )
1561
+ ```
1562
+
1563
+ ``` {code-cell} ipython3
1564
+ pd.crosstab(
1565
+ cancer_test["Class"],
1566
+ cancer_test["predicted"]
1567
+ )
1568
+ ```
1569
+ ``` {code-cell} ipython3
1570
+ :tags: [remove-cell]
1571
+ cancer_prec_tuned = precision_score(
1572
+ y_true=cancer_test["Class"],
1573
+ y_pred=cancer_test["predicted"],
1574
+ pos_label='Malignant'
1575
+ )
1576
+ cancer_rec_tuned = recall_score(
1577
+ y_true=cancer_test["Class"],
1578
+ y_pred=cancer_test["predicted"],
1579
+ pos_label='Malignant'
1580
+ )
1581
+ cancer_acc_tuned = cancer_tune_grid.score(
1582
+ cancer_test[["Smoothness", "Concavity"]],
1583
+ cancer_test["Class"]
1584
+ )
1585
+ glue("cancer_acc_tuned", "{:0.0f}".format(100*cancer_acc_tuned))
1586
+ glue("cancer_prec_tuned", "{:0.0f}".format(100*cancer_prec_tuned))
1587
+ glue("cancer_rec_tuned", "{:0.0f}".format(100*cancer_rec_tuned))
1588
+ glue("mean_acc_ks", "{:0.0f}".format(100*accuracies_grid["mean_test_score"].mean()))
1589
+ glue("std3_acc_ks", "{:0.0f}".format(3*100*accuracies_grid["mean_test_score"].std()))
1590
+ glue("mean_sem_acc_ks", "{:0.0f}".format(100*accuracies_grid["sem_test_score"].mean()))
1591
+ glue("n_neighbors_max", "{:0.0f}".format(accuracies_grid["n_neighbors"].max()))
1592
+ glue("n_neighbors_min", "{:0.0f}".format(accuracies_grid["n_neighbors"].min()))
1593
+ ```
1594
+
1595
+ At first glance, this is a bit surprising: the accuracy of the classifier
1596
+ has not changed much despite tuning the number of neighbors! Our first model
1597
+ with $K =$ 3 (before we knew how to tune) had an estimated accuracy of {glue: text }` cancer_acc_1 ` %,
1598
+ while the tuned model with $K =$ {glue: text }` best_k_unique ` had an estimated accuracy
1599
+ of {glue: text }` cancer_acc_tuned ` %. Upon examining {numref}` fig:06-find-k ` again to see the
1600
+ cross validation accuracy estimates for a range of neighbors, this result
1601
+ becomes much less surprising. From {glue: text }` n_neighbors_min ` to around {glue: text }` n_neighbors_max ` neighbors, the cross
1602
+ validation accuracy estimate varies only by around {glue: text }` std3_acc_ks ` %, with
1603
+ each estimate having a standard error around {glue: text }` mean_sem_acc_ks ` %.
1604
+ Since the cross-validation accuracy estimates the test set accuracy,
1605
+ the fact that the test set accuracy also doesn't change much is expected.
1606
+ Also note that the $K =$ 3 model had a precision
1607
+ precision of {glue: text }` cancer_prec_1 ` % and recall of {glue: text }` cancer_rec_1 ` %,
1608
+ while the tuned model had
1609
+ a precision of {glue: text }` cancer_prec_tuned ` % and recall of {glue: text }` cancer_rec_tuned ` %.
1610
+ Given that the recall decreased&mdash ; remember, in this application, recall
1611
+ is critical to making sure we find all the patients with malignant tumors&mdash ; the tuned model may actually be * less* preferred
1612
+ in this setting. In any case, it is important to think critically about the result of tuning. Models tuned to
1613
+ maximize accuracy are not necessarily better for a given application.
1614
+
1481
1615
## Summary
1482
1616
1483
1617
Classification algorithms use one or more quantitative variables to predict the
0 commit comments