fixed one more fig in classification2

ttimbers · ttimbers · commit 1661d0eb4cc5 · 2022-01-09T23:28:18.000-08:00
diff --git a/classification2.Rmd b/classification2.Rmd
@@ -863,23 +863,6 @@ regardless of what the new observation looks like. In general, if the model
 *isn't influenced enough* by the training data, it is said to **underfit** the
 data.
 
-**Overfitting:** \index{overfitting!classification} In contrast, when we decrease the number of neighbors, each
-individual data point has a stronger and stronger vote regarding nearby points.
-Since the data themselves are noisy, this causes a more "jagged" boundary
-corresponding to a *less simple* model.  If you take this case to the extreme,
-setting $K = 1$, then the classifier is essentially just matching each new
-observation to its closest neighbor in the training data set. This is just as
-problematic as the large $K$ case, because the classifier becomes unreliable on
-new data: if we had a different training set, the predictions would be
-completely different.  In general, if the model *is influenced too much* by the
-training data, it is said to **overfit** the data.
-
-Both overfitting and underfitting are problematic and will lead to a model 
-that does not generalize well to new data. When fitting a model, we need to strike
-a balance between the two. You can see these two effects in Figure 
-\@ref(fig:06-decision-grid-K), which shows how the classifier changes as 
-we set the number of neighbors $K$ to 1, 7, 20, and 300.
-
 ```{r 06-decision-grid-K, echo = FALSE, message = FALSE, fig.height = 10, fig.width = 10, fig.pos = "H", out.extra="", fig.cap = "Effect of K in overfitting and underfitting."}
 ks <- c(1, 7, 20, 300)
 plots <- list()
@@ -935,6 +918,23 @@ p_grid <- plot_grid(plotlist = p_no_legend, ncol = 2)
 plot_grid(p_grid, legend, ncol = 1, rel_heights = c(1, 0.2))
 ```
 
+**Overfitting:** \index{overfitting!classification} In contrast, when we decrease the number of neighbors, each
+individual data point has a stronger and stronger vote regarding nearby points.
+Since the data themselves are noisy, this causes a more "jagged" boundary
+corresponding to a *less simple* model.  If you take this case to the extreme,
+setting $K = 1$, then the classifier is essentially just matching each new
+observation to its closest neighbor in the training data set. This is just as
+problematic as the large $K$ case, because the classifier becomes unreliable on
+new data: if we had a different training set, the predictions would be
+completely different.  In general, if the model *is influenced too much* by the
+training data, it is said to **overfit** the data.
+
+Both overfitting and underfitting are problematic and will lead to a model 
+that does not generalize well to new data. When fitting a model, we need to strike
+a balance between the two. You can see these two effects in Figure 
+\@ref(fig:06-decision-grid-K), which shows how the classifier changes as 
+we set the number of neighbors $K$ to 1, 7, 20, and 300.
+
 ## Summary
 
 Classification algorithms use one or more quantitative variables to predict the