You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We can also obtain the number of neighbours with the highest accuracy
945
+
programmatically by accessing the `neighbors` variable in the `accuracies` data
946
+
frame where the `mean` variable is highest.
947
+
Note that it is still useful to visualize the results as
948
+
we did above since this provides additional information on how the model
949
+
performance varies.
950
+
951
+
```{r 06-extract-k}
952
+
best_k <- accuracies |>
953
+
arrange(desc(mean)) |>
954
+
head(1) |>
955
+
pull(neighbors)
956
+
best_k
957
+
```
958
+
944
959
Setting the number of
945
-
neighbors to $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`
960
+
neighbors to $K =$ `r best_k`
946
961
provides the highest accuracy (`r (accuracies |> arrange(desc(mean)) |> slice(1) |> pull(mean) |> round(4))*100`%). But there is no exact or perfect answer here;
947
962
any selection from $K = 30$ and $60$ would be reasonably justified, as all
948
963
of these differ in classifier accuracy by a small amount. Remember: the
949
964
values you see on this plot are *estimates* of the true accuracy of our
950
965
classifier. Although the
951
-
$K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors` value is
966
+
$K =$ `r best_k` value is
952
967
higher than the others on this plot,
953
968
that doesn't mean the classifier is actually more accurate with this parameter
954
969
value! Generally, when selecting $K$ (and other parameters for other predictive
@@ -958,12 +973,12 @@ models), we are looking for a value where:
958
973
- changing the value to a nearby one (e.g., adding or subtracting a small number) doesn't decrease accuracy too much, so that our choice is reliable in the presence of uncertainty;
959
974
- the cost of training the model is not prohibitive (e.g., in our situation, if $K$ is too large, predicting becomes expensive!).
960
975
961
-
We know that $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`
976
+
We know that $K =$ `r best_k`
962
977
provides the highest estimated accuracy. Further, Figure \@ref(fig:06-find-k) shows that the estimated accuracy
963
-
changes by only a small amount if we increase or decrease $K$ near $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`.
964
-
And finally, $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors` does not create a prohibitively expensive
978
+
changes by only a small amount if we increase or decrease $K$ near $K =$ `r best_k`.
979
+
And finally, $K =$ `r best_k` does not create a prohibitively expensive
965
980
computational cost of training. Considering these three points, we would indeed select
966
-
$K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors` for the classifier.
0 commit comments