true to actual

trevorcampbell · trevorcampbell · commit fa8b8922ad6a · 2023-11-10T20:55:09.000-08:00
diff --git a/source/classification2.Rmd b/source/classification2.Rmd
@@ -81,8 +81,8 @@ tumor images?
 
 The trick is to split the data into a **training set** \index{training set} and **test set** \index{test set} (Figure \@ref(fig:06-training-test))
 and use only the **training set** when building the classifier.
-Then, to evaluate the performance of the classifier, we first set aside the true labels from the **test set**,
-and then use the classifier to predict the labels in the **test set**. If our predictions match the true
+Then, to evaluate the performance of the classifier, we first set aside the labels from the **test set**,
+and then use the classifier to predict the labels in the **test set**. If our predictions match the actual
 labels for the observations in the **test set**, then we have some
 confidence that our classifier might also accurately predict the class
 labels for new observations without known class labels.
@@ -97,12 +97,12 @@ labels for new observations without known class labels.
 knitr::include_graphics("img/classification2/training_test.jpeg")
 ```
 
-How exactly can we assess how well our predictions match the true labels for
+How exactly can we assess how well our predictions match the actual labels for
 the observations in the test set? One way we can do this is to calculate the
 prediction **accuracy**. \index{prediction accuracy|see{accuracy}}\index{accuracy} This is the fraction of examples for which the
 classifier made the correct prediction. To calculate this, we divide the number
 of correct predictions by the number of predictions made. 
-The process for assessing if our predictions match the true labels in the 
+The process for assessing if our predictions match the actual labels in the
 test set is illustrated in Figure \@ref(fig:06-ML-paradigm-test).
 
 $$\mathrm{accuracy} = \frac{\mathrm{number \; of  \; correct  \; predictions}}{\mathrm{total \;  number \;  of  \; predictions}}$$
@@ -125,7 +125,7 @@ a test set of 65 observations.
 
 Table: (\#tab:confusion-matrix) An example confusion matrix for the tumor image data. 
 
-|                        | Truly Malignant | Truly Benign |
+|                        | Actually Malignant | Actually Benign |
 | ---------------------- | --------------- | -------------- |
 | **Predicted Malignant**    |     1      |       4        |
 | **Predicted Benign**       |     3       |       57        |
@@ -480,7 +480,7 @@ hidden_print(knn_fit)
 Now that we have a $K$-nearest neighbors classifier object, we can use it to
 predict the class labels for our test set.  We use the `bind_cols` \index{bind\_cols} to add the
 column of predictions to the original test data, creating the
-`cancer_test_predictions` data frame.  The `Class` variable contains the true
+`cancer_test_predictions` data frame.  The `Class` variable contains the actual
 diagnoses, while the `.pred_class` contains the predicted diagnoses from the
 classifier.
 
@@ -536,8 +536,8 @@ confu22 <- (confusionmt |> filter(name == "cell_2_2"))$value
 The confusion matrix shows `r confu11` observations were correctly predicted 
 as malignant, and `r confu22` were correctly predicted as benign. 
 It also shows that the classifier made some mistakes; in particular,
-it classified  `r confu21` observations as benign when they were truly malignant,
-and `r confu12` observations as malignant when they were truly benign. 
+it classified  `r confu21` observations as benign when they were actually malignant,
+and `r confu12` observations as malignant when they were actually benign. 
 Using our formulas from earlier, we see that the accuracy agrees with what R reported,
 and can also compute the precision and recall of the classifier:
 
@@ -561,11 +561,11 @@ of the time, a classifier with 99% accuracy is not terribly impressive (just alw
 And beyond just accuracy, we need to consider the precision and recall: as mentioned
 earlier, the *kind* of mistake the classifier makes is
 important in many applications as well. In the previous example with 99% benign observations, it might be very bad for the
-classifier to predict "benign" when the true class is "malignant" (a false negative), as this
+classifier to predict "benign" when the actual class is "malignant" (a false negative), as this
 might result in a patient not receiving appropriate medical attention. In other
 words, in this context, we need the classifier to have a *high recall*. On the
 other hand, it might be less bad for the classifier to guess "malignant" when
-the true class is "benign" (a false positive), as the patient will then likely see a doctor who
+the actual class is "benign" (a false positive), as the patient will then likely see a doctor who
 can provide an expert diagnosis. In other words, we are fine with sacrificing
 some precision in the interest of achieving high recall. This is why it is 
 important not only to look at accuracy, but also the confusion matrix.