@@ -75,7 +75,7 @@ and the classifier will be asked to decide whether the tumor is benign or
75
75
malignant. The key word here is * new* : our classifier is "good" if it provides
76
76
accurate predictions on data * not seen during training* , as this implies that
77
77
it has actually learned about the relationship between the predictor variables and response variable,
78
- as opposed to simply memorizing and regurgitating individual training data examples.
78
+ as opposed to simply memorizing the labels of individual training data examples.
79
79
But then, how can we evaluate our classifier without visiting the hospital to collect more
80
80
tumor images?
81
81
@@ -142,9 +142,10 @@ $$\mathrm{accuracy} = \frac{\mathrm{number \; of \; correct \; predictions}}{\
142
142
143
143
But we can also see that the classifier only identified 1 out of 4 total malignant
144
144
tumors; in other words, it misclassified 75% of the malignant cases present in the
145
- data set! Since we are particularly interested in identifying malignant cases
146
- in this data analysis context, this classifier would likely be unacceptable
147
- even with an accuracy of 89%.
145
+ data set! In this example, misclassifying a malignant tumor is a potentially
146
+ disastrous error, since it may lead to a patient who requires treatment not receiving it.
147
+ Since we are particularly interested in identifying malignant cases, this
148
+ classifier would likely be unacceptable even with an accuracy of 89%.
148
149
149
150
Focusing more on one label than the other is
150
151
common in classification problems. In such cases, we typically refer to the label we are more
0 commit comments