You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using 5- and 10-fold cross-validation, we have estimated that the prediction
702
-
accuracy of our classifier is somewhere around `r round(100*(vfold_metrics %>% filter(.metric == "accuracy"))$mean,0)`%.
726
+
accuracy of our classifier is somewhere around `r round(100*(vfold_metrics |> filter(.metric == "accuracy"))$mean,0)`%.
703
727
Whether that is good or not
704
728
depends entirely on the downstream application of the data analysis. In the
705
729
present situation, we are trying to predict a tumor diagnosis, with expensive,
706
730
damaging chemo/radiation therapy or patient death as potential consequences of
707
731
misprediction. Hence, we might like to
708
-
do better than `r round(100*(vfold_metrics %>% filter(.metric == "accuracy"))$mean,0)`% for this application.
732
+
do better than `r round(100*(vfold_metrics |> filter(.metric == "accuracy"))$mean,0)`% for this application.
709
733
710
734
In order to improve our classifier, we have one choice of parameter: the number of
711
735
neighbors, $K$. Since cross-validation helps us evaluate the accuracy of our
@@ -764,13 +788,13 @@ accuracy_vs_k
764
788
```
765
789
766
790
Setting the number of
767
-
neighbors to $K =$ `r (accuracies %>% arrange(desc(mean)) %>% head(1))$neighbors`
768
-
provides the highest accuracy (`r (accuracies %>% arrange(desc(mean)) %>% slice(1) %>% pull(mean) %>% round(4))*100`%). But there is no exact or perfect answer here;
791
+
neighbors to $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`
792
+
provides the highest accuracy (`r (accuracies |> arrange(desc(mean)) |> slice(1) |> pull(mean) |> round(4))*100`%). But there is no exact or perfect answer here;
769
793
any selection from $K = 3$ and $15$ would be reasonably justified, as all
770
794
of these differ in classifier accuracy by a small amount. Remember: the
771
795
values you see on this plot are *estimates* of the true accuracy of our
772
796
classifier. Although the
773
-
$K =$ `r (accuracies %>% arrange(desc(mean)) %>% head(1))$neighbors` value is
797
+
$K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors` value is
774
798
higher than the others on this plot,
775
799
that doesn't mean the classifier is actually more accurate with this parameter
776
800
value! Generally, when selecting $K$ (and other parameters for other predictive
@@ -780,12 +804,12 @@ models), we are looking for a value where:
780
804
- changing the value to a nearby one (e.g., adding or subtracting 1) doesn't decrease accuracy too much, so that our choice is reliable in the presence of uncertainty
781
805
- the cost of training the model is not prohibitive (e.g., in our situation, if $K$ is too large, predicting becomes expensive!)
782
806
783
-
We know that $K =$ `r (accuracies %>% arrange(desc(mean)) %>% head(1))$neighbors`
807
+
We know that $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`
784
808
provides the highest estimated accuracy. Further, Figure \@ref(fig:06-find-k) shows that the estimated accuracy
785
-
changes by only a small amount if we increase or decrease $K$ near $K =$ `r (accuracies %>% arrange(desc(mean)) %>% head(1))$neighbors`.
786
-
And finally, $K =$ `r (accuracies %>% arrange(desc(mean)) %>% head(1))$neighbors` does not create a prohibitively expensive
809
+
changes by only a small amount if we increase or decrease $K$ near $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`.
810
+
And finally, $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors` does not create a prohibitively expensive
787
811
computational cost of training. Considering these three points, we would indeed select
788
-
$K =$ `r (accuracies %>% arrange(desc(mean)) %>% head(1))$neighbors` for the classifier.
812
+
$K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors` for the classifier.
0 commit comments