Skip to content

Commit fa8b892

Browse files
true to actual
1 parent bc659ef commit fa8b892

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

source/classification2.Rmd

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,8 @@ tumor images?
8181

8282
The trick is to split the data into a **training set** \index{training set} and **test set** \index{test set} (Figure \@ref(fig:06-training-test))
8383
and use only the **training set** when building the classifier.
84-
Then, to evaluate the performance of the classifier, we first set aside the true labels from the **test set**,
85-
and then use the classifier to predict the labels in the **test set**. If our predictions match the true
84+
Then, to evaluate the performance of the classifier, we first set aside the labels from the **test set**,
85+
and then use the classifier to predict the labels in the **test set**. If our predictions match the actual
8686
labels for the observations in the **test set**, then we have some
8787
confidence that our classifier might also accurately predict the class
8888
labels for new observations without known class labels.
@@ -97,12 +97,12 @@ labels for new observations without known class labels.
9797
knitr::include_graphics("img/classification2/training_test.jpeg")
9898
```
9999

100-
How exactly can we assess how well our predictions match the true labels for
100+
How exactly can we assess how well our predictions match the actual labels for
101101
the observations in the test set? One way we can do this is to calculate the
102102
prediction **accuracy**. \index{prediction accuracy|see{accuracy}}\index{accuracy} This is the fraction of examples for which the
103103
classifier made the correct prediction. To calculate this, we divide the number
104104
of correct predictions by the number of predictions made.
105-
The process for assessing if our predictions match the true labels in the
105+
The process for assessing if our predictions match the actual labels in the
106106
test set is illustrated in Figure \@ref(fig:06-ML-paradigm-test).
107107

108108
$$\mathrm{accuracy} = \frac{\mathrm{number \; of \; correct \; predictions}}{\mathrm{total \; number \; of \; predictions}}$$
@@ -125,7 +125,7 @@ a test set of 65 observations.
125125

126126
Table: (\#tab:confusion-matrix) An example confusion matrix for the tumor image data.
127127

128-
| | Truly Malignant | Truly Benign |
128+
| | Actually Malignant | Actually Benign |
129129
| ---------------------- | --------------- | -------------- |
130130
| **Predicted Malignant** | 1 | 4 |
131131
| **Predicted Benign** | 3 | 57 |
@@ -480,7 +480,7 @@ hidden_print(knn_fit)
480480
Now that we have a $K$-nearest neighbors classifier object, we can use it to
481481
predict the class labels for our test set. We use the `bind_cols` \index{bind\_cols} to add the
482482
column of predictions to the original test data, creating the
483-
`cancer_test_predictions` data frame. The `Class` variable contains the true
483+
`cancer_test_predictions` data frame. The `Class` variable contains the actual
484484
diagnoses, while the `.pred_class` contains the predicted diagnoses from the
485485
classifier.
486486

@@ -536,8 +536,8 @@ confu22 <- (confusionmt |> filter(name == "cell_2_2"))$value
536536
The confusion matrix shows `r confu11` observations were correctly predicted
537537
as malignant, and `r confu22` were correctly predicted as benign.
538538
It also shows that the classifier made some mistakes; in particular,
539-
it classified `r confu21` observations as benign when they were truly malignant,
540-
and `r confu12` observations as malignant when they were truly benign.
539+
it classified `r confu21` observations as benign when they were actually malignant,
540+
and `r confu12` observations as malignant when they were actually benign.
541541
Using our formulas from earlier, we see that the accuracy agrees with what R reported,
542542
and can also compute the precision and recall of the classifier:
543543

@@ -561,11 +561,11 @@ of the time, a classifier with 99% accuracy is not terribly impressive (just alw
561561
And beyond just accuracy, we need to consider the precision and recall: as mentioned
562562
earlier, the *kind* of mistake the classifier makes is
563563
important in many applications as well. In the previous example with 99% benign observations, it might be very bad for the
564-
classifier to predict "benign" when the true class is "malignant" (a false negative), as this
564+
classifier to predict "benign" when the actual class is "malignant" (a false negative), as this
565565
might result in a patient not receiving appropriate medical attention. In other
566566
words, in this context, we need the classifier to have a *high recall*. On the
567567
other hand, it might be less bad for the classifier to guess "malignant" when
568-
the true class is "benign" (a false positive), as the patient will then likely see a doctor who
568+
the actual class is "benign" (a false positive), as the patient will then likely see a doctor who
569569
can provide an expert diagnosis. In other words, we are fine with sacrificing
570570
some precision in the interest of achieving high recall. This is why it is
571571
important not only to look at accuracy, but also the confusion matrix.

0 commit comments

Comments
 (0)