@@ -863,23 +863,6 @@ regardless of what the new observation looks like. In general, if the model
863
863
* isn't influenced enough* by the training data, it is said to ** underfit** the
864
864
data.
865
865
866
- ** Overfitting:** \index{overfitting!classification} In contrast, when we decrease the number of neighbors, each
867
- individual data point has a stronger and stronger vote regarding nearby points.
868
- Since the data themselves are noisy, this causes a more "jagged" boundary
869
- corresponding to a * less simple* model. If you take this case to the extreme,
870
- setting $K = 1$, then the classifier is essentially just matching each new
871
- observation to its closest neighbor in the training data set. This is just as
872
- problematic as the large $K$ case, because the classifier becomes unreliable on
873
- new data: if we had a different training set, the predictions would be
874
- completely different. In general, if the model * is influenced too much* by the
875
- training data, it is said to ** overfit** the data.
876
-
877
- Both overfitting and underfitting are problematic and will lead to a model
878
- that does not generalize well to new data. When fitting a model, we need to strike
879
- a balance between the two. You can see these two effects in Figure
880
- \@ ref(fig:06-decision-grid-K), which shows how the classifier changes as
881
- we set the number of neighbors $K$ to 1, 7, 20, and 300.
882
-
883
866
``` {r 06-decision-grid-K, echo = FALSE, message = FALSE, fig.height = 10, fig.width = 10, fig.pos = "H", out.extra="", fig.cap = "Effect of K in overfitting and underfitting."}
884
867
ks <- c(1, 7, 20, 300)
885
868
plots <- list()
@@ -935,6 +918,23 @@ p_grid <- plot_grid(plotlist = p_no_legend, ncol = 2)
935
918
plot_grid(p_grid, legend, ncol = 1, rel_heights = c(1, 0.2))
936
919
```
937
920
921
+ ** Overfitting:** \index{overfitting!classification} In contrast, when we decrease the number of neighbors, each
922
+ individual data point has a stronger and stronger vote regarding nearby points.
923
+ Since the data themselves are noisy, this causes a more "jagged" boundary
924
+ corresponding to a * less simple* model. If you take this case to the extreme,
925
+ setting $K = 1$, then the classifier is essentially just matching each new
926
+ observation to its closest neighbor in the training data set. This is just as
927
+ problematic as the large $K$ case, because the classifier becomes unreliable on
928
+ new data: if we had a different training set, the predictions would be
929
+ completely different. In general, if the model * is influenced too much* by the
930
+ training data, it is said to ** overfit** the data.
931
+
932
+ Both overfitting and underfitting are problematic and will lead to a model
933
+ that does not generalize well to new data. When fitting a model, we need to strike
934
+ a balance between the two. You can see these two effects in Figure
935
+ \@ ref(fig:06-decision-grid-K), which shows how the classifier changes as
936
+ we set the number of neighbors $K$ to 1, 7, 20, and 300.
937
+
938
938
## Summary
939
939
940
940
Classification algorithms use one or more quantitative variables to predict the
0 commit comments