Fixed fig placement in clustering

ttimbers · ttimbers · commit a03765bd2048 · 2022-01-10T12:00:53.000-08:00
diff --git a/clustering.Rmd b/clustering.Rmd
@@ -91,6 +91,8 @@ principal component analysis, multidimensional scaling, and more;
 see the additional resources section at the end of this chapter 
 for where to begin learning more about these other methods.
 
+\newpage
+
 > **Note:** There are also so-called *semisupervised* tasks, \index{semisupervised} 
 > where only some of the data come with response variable labels/values, 
 > but the vast majority don't. 
@@ -164,11 +166,12 @@ penguin_data <- read_csv("data/penguins_standardized.csv")
 penguin_data
 ```
 
-
 Next, we can create a scatter plot using this data set 
 to see if we can detect subtypes or groups in our data set.
 
-```{r 10-toy-example-plot, warning = FALSE, fig.height = 3.5, fig.width = 3.75, fig.align = "center", fig.pos = "H", out.extra="", fig.cap = "Scatter plot of standardized bill length versus standardized flipper length."}
+\newpage
+
+```{r 10-toy-example-plot, warning = FALSE, fig.height = 3.25, fig.width = 3.5, fig.align = "center", fig.pos = "H", out.extra="", fig.cap = "Scatter plot of standardized bill length versus standardized flipper length."}
 ggplot(data, aes(x = flipper_length_standardized, 
                  y = bill_length_standardized)) +
   geom_point() +
@@ -203,7 +206,7 @@ This procedure will separate the data into groups;
 Figure \@ref(fig:10-toy-example-clustering) shows these groups
 denoted by colored scatter points.
 
-```{r 10-toy-example-clustering, echo = FALSE, warning = FALSE, fig.height = 3.5, fig.width = 4.5, fig.align = "center", fig.cap = "Scatter plot of standardized bill length versus standardized flipper length with colored groups."}
+```{r 10-toy-example-clustering, echo = FALSE, warning = FALSE, fig.height = 3.25, fig.width = 4.25, fig.align = "center", fig.cap = "Scatter plot of standardized bill length versus standardized flipper length with colored groups."}
 ggplot(data, aes(y = bill_length_standardized, 
                  x = flipper_length_standardized, color = cluster)) +
   geom_point() +
@@ -261,7 +264,7 @@ in Figure \@ref(fig:10-toy-example-clus1-center).
 
 (ref:10-toy-example-clus1-center) Cluster 1 from the `penguin_data` data set example. Observations are in blue, with the cluster center highlighted in red.
 
-```{r 10-toy-example-clus1-center, echo = FALSE, warning = FALSE, fig.height = 3.5, fig.width = 3.75, fig.align = "center", fig.cap = "(ref:10-toy-example-clus1-center)"}
+```{r 10-toy-example-clus1-center, echo = FALSE, warning = FALSE, fig.height = 3.25, fig.width = 3.5, fig.align = "center", fig.cap = "(ref:10-toy-example-clus1-center)"}
 base <- ggplot(data, aes(x = flipper_length_standardized, y = bill_length_standardized)) +
   geom_point() +
   xlab("Flipper Length (standardized)") +
@@ -308,7 +311,7 @@ These distances are denoted by lines in Figure \@ref(fig:10-toy-example-clus1-di
 
 (ref:10-toy-example-clus1-dists) Cluster 1 from the `penguin_data` data set example. Observations are in blue, with the cluster center highlighted in red. The distances from the observations to the cluster center are represented as black lines.
 
-```{r 10-toy-example-clus1-dists, echo = FALSE, warning = FALSE, fig.height = 3.5, fig.width = 3.75, fig.align = "center", fig.cap = "(ref:10-toy-example-clus1-dists)"}
+```{r 10-toy-example-clus1-dists, echo = FALSE, warning = FALSE, fig.height = 3.25, fig.width = 3.5, fig.align = "center", fig.cap = "(ref:10-toy-example-clus1-dists)"}
 base <- ggplot(clus1) +
   geom_point(aes(y = bill_length_standardized, 
                  x = flipper_length_standardized),
@@ -347,7 +350,7 @@ Figure \@ref(fig:10-toy-example-all-clus-dists).
 
 (ref:10-toy-example-all-clus-dists) All clusters from the `penguin_data` data set example. Observations are in orange, blue, and yellow with the cluster center highlighted in red. The distances from the observations to each of the respective cluster centers are represented as black lines.
 
-```{r 10-toy-example-all-clus-dists, echo = FALSE, warning = FALSE, fig.height = 3.5, fig.width = 4.5, fig.align = "center", fig.cap = "(ref:10-toy-example-all-clus-dists)"}
+```{r 10-toy-example-all-clus-dists, echo = FALSE, warning = FALSE, fig.height = 3.25, fig.width = 4.25, fig.align = "center", fig.cap = "(ref:10-toy-example-all-clus-dists)"}
 
 
 all_clusters_base <- data |>
@@ -599,7 +602,7 @@ These, however, are beyond the scope of this book.
 Unlike the classification and regression models we studied in previous chapters, K-means \index{K-means!restart, nstart} can get "stuck" in a bad solution.
 For example, Figure \@ref(fig:10-toy-kmeans-bad-init) illustrates an unlucky random initialization by K-means.
 
-```{r 10-toy-kmeans-bad-init, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 3.5, fig.width = 3.75, fig.align = "center", fig.cap = "Random initialization of labels."}
+```{r 10-toy-kmeans-bad-init, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 3.25, fig.width = 3.75, fig.pos = "H", out.extra="", fig.align = "center", fig.cap = "Random initialization of labels."}
 penguin_data <- penguin_data |>
   mutate(label = as_factor(c(3L, 3L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 
                              1L, 3L, 1L, 2L, 2L, 2L, 3L, 3L, 3L)))
@@ -620,7 +623,7 @@ Figure \@ref(fig:10-toy-kmeans-bad-iter) shows what the iterations of K-means wo
 
 (ref:10-toy-kmeans-bad-iter) First five iterations of K-means clustering on the `penguin_data` example data set with a poor random initialization. Each pair of plots corresponds to an iteration. Within the pair, the first plot depicts the center update, and the second plot depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.
 
-```{r 10-toy-kmeans-bad-iter, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 6.75, fig.width = 8, fig.align = "center", fig.cap = "(ref:10-toy-kmeans-bad-iter)"}
+```{r 10-toy-kmeans-bad-iter, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 6.75, fig.width = 8, fig.pos = "H", out.extra="", fig.align = "center", fig.cap = "(ref:10-toy-kmeans-bad-iter)"}
 list_plot_cntrs <- vector(mode = "list", length = 5)
 list_plot_lbls <- vector(mode = "list", length = 5)
 
@@ -778,7 +781,7 @@ Figure \@ref(fig:10-toy-kmeans-vary-k) illustrates the impact of K
 on K-means clustering of our penguin flipper and bill length data 
 by showing the different clusterings for K's ranging from 1 to 9.
 
-```{r 10-toy-kmeans-vary-k, echo = FALSE, warning = FALSE, fig.height = 6.25, fig.width = 6, fig.cap = "Clustering of the penguin data for K clusters ranging from 1 to 9. Cluster centers are indicated by larger points that are outlined in black."}
+```{r 10-toy-kmeans-vary-k, echo = FALSE, warning = FALSE, fig.height = 6.25, fig.width = 6, fig.pos = "H", out.extra="", fig.cap = "Clustering of the penguin data for K clusters ranging from 1 to 9. Cluster centers are indicated by larger points that are outlined in black."}
 set.seed(3)
 
 kclusts <- tibble(k = 1:9) |>
@@ -842,7 +845,7 @@ decrease the total WSSD, but by only a *diminishing amount*. If we plot the tota
 clusters, we see that the decrease in total WSSD levels off (or forms an "elbow shape") \index{elbow method} when we reach roughly 
 the right number of clusters (Figure \@ref(fig:10-toy-kmeans-elbow)).
 
-```{r 10-toy-kmeans-elbow, echo = FALSE, warning = FALSE, fig.align = 'center', fig.height = 3.5, fig.width = 4.5, fig.cap = "Total WSSD for K clusters ranging from 1 to 9."}
+```{r 10-toy-kmeans-elbow, echo = FALSE, warning = FALSE, fig.align = 'center', fig.height = 3.25, fig.width = 4.25, fig.pos = "H", out.extra="", fig.cap = "Total WSSD for K clusters ranging from 1 to 9."}
 p2 <- ggplot(clusterings, aes(x = k, y = tot.withinss)) +
   geom_point(size = 2) +
   geom_line() +
@@ -933,7 +936,7 @@ clustered_data
 Now that we have this information in a tidy data frame, we can make a visualization
 of the cluster assignments for each point, as shown in Figure \@ref(fig:10-plot-clusters-2).
 
-```{r 10-plot-clusters-2, fig.height = 3.5, fig.width = 4.5, fig.align = "center", fig.cap = "The data colored by the cluster assignments returned by K-means."}
+```{r 10-plot-clusters-2, fig.height = 3.25, fig.width = 4.25, fig.align = "center", fig.pos = "H", out.extra="", fig.cap = "The data colored by the cluster assignments returned by K-means."}
 cluster_plot <- ggplot(clustered_data,
   aes(x = flipper_length_mm, 
       y = bill_length_mm, 
@@ -1042,7 +1045,7 @@ clustering_statistics
 Now that we have `tot.withinss` and `k` as columns in a data frame, we can make a line plot 
 (Figure \@ref(fig:10-plot-choose-k)) and search for the "elbow" to find which value of K to use. 
 
-```{r 10-plot-choose-k, fig.height = 3.5, fig.width = 4.5, fig.align = "center", fig.cap = "A plot showing the total WSSD versus the number of clusters."}
+```{r 10-plot-choose-k, fig.height = 3.25, fig.width = 4.25, fig.align = "center", fig.pos = "H", out.extra="", fig.cap = "A plot showing the total WSSD versus the number of clusters."}
 elbow_plot <- ggplot(clustering_statistics, aes(x = k, y = tot.withinss)) +
   geom_point() +
   geom_line() +
@@ -1077,7 +1080,7 @@ but there is a trade-off that doing many clusterings
 could take a long time.
 So this is something that needs to be balanced.
 
-```{r 10-choose-k-nstart, fig.height = 3.5, fig.width = 4.5, message= FALSE, warning = FALSE, fig.align = "center", fig.cap = "A plot showing the total WSSD versus the number of clusters when K-means is run with 10 restarts."}
+```{r 10-choose-k-nstart, fig.height = 3.25, fig.width = 4.25, fig.pos = "H", out.extra="", message= FALSE, warning = FALSE, fig.align = "center", fig.cap = "A plot showing the total WSSD versus the number of clusters when K-means is run with 10 restarts."}
 penguin_clust_ks <- tibble(k = 1:9) |>
   rowwise() |>
   mutate(penguin_clusts = list(kmeans(standardized_data, nstart = 10, k)),