You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
neighbors to $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`
946
-
provides the highest accuracy (`r (accuracies |> arrange(desc(mean)) |> slice(1) |> pull(mean) |> round(4))*100`%). But there is no exact or perfect answer here;
1000
+
We can also obtain the number of neighbours with the highest accuracy
1001
+
programmatically by accessing the `neighbors` variable in the `accuracies` data
1002
+
frame where the `mean` variable is highest.
1003
+
Note that it is still useful to visualize the results as
1004
+
we did above since this provides additional information on how the model
1005
+
performance varies.
1006
+
1007
+
```{r 06-extract-k}
1008
+
best_k <- accuracies |>
1009
+
arrange(desc(mean)) |>
1010
+
head(1) |>
1011
+
pull(neighbors)
1012
+
best_k
1013
+
```
1014
+
1015
+
Setting the number of
1016
+
neighbors to $K =$ `r best_k`
1017
+
provides the highest cross-validation accuracy estimate (`r (accuracies |> arrange(desc(mean)) |> slice(1) |> pull(mean) |> round(4))*100`%). But there is no exact or perfect answer here;
947
1018
any selection from $K = 30$ and $60$ would be reasonably justified, as all
948
1019
of these differ in classifier accuracy by a small amount. Remember: the
949
1020
values you see on this plot are *estimates* of the true accuracy of our
950
-
classifier. Although the
951
-
$K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors` value is
952
-
higher than the others on this plot,
1021
+
classifier. Although the $K =$ `r best_k` value is higher than the others on this plot,
953
1022
that doesn't mean the classifier is actually more accurate with this parameter
954
1023
value! Generally, when selecting $K$ (and other parameters for other predictive
955
1024
models), we are looking for a value where:
@@ -958,12 +1027,12 @@ models), we are looking for a value where:
958
1027
- changing the value to a nearby one (e.g., adding or subtracting a small number) doesn't decrease accuracy too much, so that our choice is reliable in the presence of uncertainty;
959
1028
- the cost of training the model is not prohibitive (e.g., in our situation, if $K$ is too large, predicting becomes expensive!).
960
1029
961
-
We know that $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`
1030
+
We know that $K =$ `r best_k`
962
1031
provides the highest estimated accuracy. Further, Figure \@ref(fig:06-find-k) shows that the estimated accuracy
963
-
changes by only a small amount if we increase or decrease $K$ near $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`.
964
-
And finally, $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors` does not create a prohibitively expensive
1032
+
changes by only a small amount if we increase or decrease $K$ near $K =$ `r best_k`.
1033
+
And finally, $K =$ `r best_k` does not create a prohibitively expensive
965
1034
computational cost of training. Considering these three points, we would indeed select
966
-
$K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors` for the classifier.
@@ -507,13 +512,13 @@ Figure \@ref(fig:07-choose-k-knn-plot). What is happening here?
507
512
508
513
Figure \@ref(fig:07-howK) visualizes the effect of different settings of $K$ on the
509
514
regression model. Each plot shows the predicted values for house sale price from
510
-
our K-NN regression model on the training data for 6 different values for $K$: 1, 3, `r kmin`, 41, 250, and 680 (almost the entire training set).
515
+
our KNN regression model on the training data for 6 different values for $K$: 1, 3, 25, `r kmin`, 250, and 680 (almost the entire training set).
511
516
For each model, we predict prices for the range of possible home sizes we
512
517
observed in the data set (here 500 to 5,000 square feet) and we plot the
513
518
predicted prices as a blue line.
514
519
515
-
```{r 07-howK, echo = FALSE, warning = FALSE, fig.height = 13, fig.width = 10,fig.cap = "Predicted values for house price (represented as a blue line) from K-NN regression models for six different values for $K$."}
516
-
gridvals <- c(1, 3, kmin, 41, 250, 680)
520
+
```{r 07-howK, echo = FALSE, warning = FALSE, fig.height = 13, fig.width = 10,fig.cap = "Predicted values for house price (represented as a blue line) from KNN regression models for six different values for $K$."}
0 commit comments