You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: classification2.Rmd
+6-7Lines changed: 6 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -751,7 +751,7 @@ Then instead of using `fit` or `fit_resamples`, we will use the `tune_grid` func
751
751
to fit the model for each value in a range of parameter values.
752
752
In particular, we first create a data frame with a `neighbors`
753
753
variable that contains the sequence of values of $K$ to try; below we create the `k_vals`
754
-
data frame with the `neighbors` variable containing each value from $K=1$ to $K=15$ using
754
+
data frame with the `neighbors` variable containing values from 1 to 100 (stepping by 5) using
755
755
the `seq` function.
756
756
Then we pass that data frame to the `grid` argument of `tune_grid`.
757
757
@@ -761,7 +761,7 @@ set.seed(1)
761
761
```
762
762
763
763
```{r 06-range-cross-val-2}
764
-
k_vals <- tibble(neighbors = seq(from = 1, to = 15, by = 1))
764
+
k_vals <- tibble(neighbors = seq(from = 1, to = 100, by = 5))
765
765
766
766
knn_results <- workflow() |>
767
767
add_recipe(cancer_recipe) |>
@@ -775,9 +775,8 @@ accuracies <- knn_results |>
775
775
accuracies
776
776
```
777
777
778
-
We can select the best value of the number of neighbors (i.e., the one that results
779
-
in the highest classifier accuracy estimate) by plotting the accuracy versus $K$
780
-
in Figure \@ref(fig:06-find-k).
778
+
We can decide which number of neighbors is best by plotting the accuracy versus $K$,
779
+
as shown in Figure \@ref(fig:06-find-k).
781
780
782
781
```{r 06-find-k, fig.height = 3.5, fig.width = 4, fig.cap= "Plot of estimated accuracy versus the number of neighbors."}
783
782
accuracy_vs_k <- ggplot(accuracies, aes(x = neighbors, y = mean)) +
@@ -791,7 +790,7 @@ accuracy_vs_k
791
790
Setting the number of
792
791
neighbors to $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`
793
792
provides the highest accuracy (`r (accuracies |> arrange(desc(mean)) |> slice(1) |> pull(mean) |> round(4))*100`%). But there is no exact or perfect answer here;
794
-
any selection from $K = 3$ and $15$ would be reasonably justified, as all
793
+
any selection from $K = 30$ and $60$ would be reasonably justified, as all
795
794
of these differ in classifier accuracy by a small amount. Remember: the
796
795
values you see on this plot are *estimates* of the true accuracy of our
797
796
classifier. Although the
@@ -802,7 +801,7 @@ value! Generally, when selecting $K$ (and other parameters for other predictive
802
801
models), we are looking for a value where:
803
802
804
803
- we get roughly optimal accuracy, so that our model will likely be accurate
805
-
- changing the value to a nearby one (e.g., adding or subtracting 1) doesn't decrease accuracy too much, so that our choice is reliable in the presence of uncertainty
804
+
- changing the value to a nearby one (e.g., adding or subtracting a small number) doesn't decrease accuracy too much, so that our choice is reliable in the presence of uncertainty
806
805
- the cost of training the model is not prohibitive (e.g., in our situation, if $K$ is too large, predicting becomes expensive!)
807
806
808
807
We know that $K =$ `r (accuracies |> arrange(desc(mean)) |> head(1))$neighbors`
0 commit comments