Skip to content

Commit a03765b

Browse files
committed
Fixed fig placement in clustering
1 parent 7d28eaa commit a03765b

File tree

1 file changed

+16
-13
lines changed

1 file changed

+16
-13
lines changed

clustering.Rmd

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,8 @@ principal component analysis, multidimensional scaling, and more;
9191
see the additional resources section at the end of this chapter
9292
for where to begin learning more about these other methods.
9393

94+
\newpage
95+
9496
> **Note:** There are also so-called *semisupervised* tasks, \index{semisupervised}
9597
> where only some of the data come with response variable labels/values,
9698
> but the vast majority don't.
@@ -164,11 +166,12 @@ penguin_data <- read_csv("data/penguins_standardized.csv")
164166
penguin_data
165167
```
166168

167-
168169
Next, we can create a scatter plot using this data set
169170
to see if we can detect subtypes or groups in our data set.
170171

171-
```{r 10-toy-example-plot, warning = FALSE, fig.height = 3.5, fig.width = 3.75, fig.align = "center", fig.pos = "H", out.extra="", fig.cap = "Scatter plot of standardized bill length versus standardized flipper length."}
172+
\newpage
173+
174+
```{r 10-toy-example-plot, warning = FALSE, fig.height = 3.25, fig.width = 3.5, fig.align = "center", fig.pos = "H", out.extra="", fig.cap = "Scatter plot of standardized bill length versus standardized flipper length."}
172175
ggplot(data, aes(x = flipper_length_standardized,
173176
y = bill_length_standardized)) +
174177
geom_point() +
@@ -203,7 +206,7 @@ This procedure will separate the data into groups;
203206
Figure \@ref(fig:10-toy-example-clustering) shows these groups
204207
denoted by colored scatter points.
205208

206-
```{r 10-toy-example-clustering, echo = FALSE, warning = FALSE, fig.height = 3.5, fig.width = 4.5, fig.align = "center", fig.cap = "Scatter plot of standardized bill length versus standardized flipper length with colored groups."}
209+
```{r 10-toy-example-clustering, echo = FALSE, warning = FALSE, fig.height = 3.25, fig.width = 4.25, fig.align = "center", fig.cap = "Scatter plot of standardized bill length versus standardized flipper length with colored groups."}
207210
ggplot(data, aes(y = bill_length_standardized,
208211
x = flipper_length_standardized, color = cluster)) +
209212
geom_point() +
@@ -261,7 +264,7 @@ in Figure \@ref(fig:10-toy-example-clus1-center).
261264

262265
(ref:10-toy-example-clus1-center) Cluster 1 from the `penguin_data` data set example. Observations are in blue, with the cluster center highlighted in red.
263266

264-
```{r 10-toy-example-clus1-center, echo = FALSE, warning = FALSE, fig.height = 3.5, fig.width = 3.75, fig.align = "center", fig.cap = "(ref:10-toy-example-clus1-center)"}
267+
```{r 10-toy-example-clus1-center, echo = FALSE, warning = FALSE, fig.height = 3.25, fig.width = 3.5, fig.align = "center", fig.cap = "(ref:10-toy-example-clus1-center)"}
265268
base <- ggplot(data, aes(x = flipper_length_standardized, y = bill_length_standardized)) +
266269
geom_point() +
267270
xlab("Flipper Length (standardized)") +
@@ -308,7 +311,7 @@ These distances are denoted by lines in Figure \@ref(fig:10-toy-example-clus1-di
308311

309312
(ref:10-toy-example-clus1-dists) Cluster 1 from the `penguin_data` data set example. Observations are in blue, with the cluster center highlighted in red. The distances from the observations to the cluster center are represented as black lines.
310313

311-
```{r 10-toy-example-clus1-dists, echo = FALSE, warning = FALSE, fig.height = 3.5, fig.width = 3.75, fig.align = "center", fig.cap = "(ref:10-toy-example-clus1-dists)"}
314+
```{r 10-toy-example-clus1-dists, echo = FALSE, warning = FALSE, fig.height = 3.25, fig.width = 3.5, fig.align = "center", fig.cap = "(ref:10-toy-example-clus1-dists)"}
312315
base <- ggplot(clus1) +
313316
geom_point(aes(y = bill_length_standardized,
314317
x = flipper_length_standardized),
@@ -347,7 +350,7 @@ Figure \@ref(fig:10-toy-example-all-clus-dists).
347350

348351
(ref:10-toy-example-all-clus-dists) All clusters from the `penguin_data` data set example. Observations are in orange, blue, and yellow with the cluster center highlighted in red. The distances from the observations to each of the respective cluster centers are represented as black lines.
349352

350-
```{r 10-toy-example-all-clus-dists, echo = FALSE, warning = FALSE, fig.height = 3.5, fig.width = 4.5, fig.align = "center", fig.cap = "(ref:10-toy-example-all-clus-dists)"}
353+
```{r 10-toy-example-all-clus-dists, echo = FALSE, warning = FALSE, fig.height = 3.25, fig.width = 4.25, fig.align = "center", fig.cap = "(ref:10-toy-example-all-clus-dists)"}
351354
352355
353356
all_clusters_base <- data |>
@@ -599,7 +602,7 @@ These, however, are beyond the scope of this book.
599602
Unlike the classification and regression models we studied in previous chapters, K-means \index{K-means!restart, nstart} can get "stuck" in a bad solution.
600603
For example, Figure \@ref(fig:10-toy-kmeans-bad-init) illustrates an unlucky random initialization by K-means.
601604

602-
```{r 10-toy-kmeans-bad-init, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 3.5, fig.width = 3.75, fig.align = "center", fig.cap = "Random initialization of labels."}
605+
```{r 10-toy-kmeans-bad-init, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 3.25, fig.width = 3.75, fig.pos = "H", out.extra="", fig.align = "center", fig.cap = "Random initialization of labels."}
603606
penguin_data <- penguin_data |>
604607
mutate(label = as_factor(c(3L, 3L, 1L, 1L, 2L, 1L, 2L, 1L, 1L,
605608
1L, 3L, 1L, 2L, 2L, 2L, 3L, 3L, 3L)))
@@ -620,7 +623,7 @@ Figure \@ref(fig:10-toy-kmeans-bad-iter) shows what the iterations of K-means wo
620623

621624
(ref:10-toy-kmeans-bad-iter) First five iterations of K-means clustering on the `penguin_data` example data set with a poor random initialization. Each pair of plots corresponds to an iteration. Within the pair, the first plot depicts the center update, and the second plot depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.
622625

623-
```{r 10-toy-kmeans-bad-iter, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 6.75, fig.width = 8, fig.align = "center", fig.cap = "(ref:10-toy-kmeans-bad-iter)"}
626+
```{r 10-toy-kmeans-bad-iter, echo = FALSE, warning = FALSE, message = FALSE, fig.height = 6.75, fig.width = 8, fig.pos = "H", out.extra="", fig.align = "center", fig.cap = "(ref:10-toy-kmeans-bad-iter)"}
624627
list_plot_cntrs <- vector(mode = "list", length = 5)
625628
list_plot_lbls <- vector(mode = "list", length = 5)
626629
@@ -778,7 +781,7 @@ Figure \@ref(fig:10-toy-kmeans-vary-k) illustrates the impact of K
778781
on K-means clustering of our penguin flipper and bill length data
779782
by showing the different clusterings for K's ranging from 1 to 9.
780783

781-
```{r 10-toy-kmeans-vary-k, echo = FALSE, warning = FALSE, fig.height = 6.25, fig.width = 6, fig.cap = "Clustering of the penguin data for K clusters ranging from 1 to 9. Cluster centers are indicated by larger points that are outlined in black."}
784+
```{r 10-toy-kmeans-vary-k, echo = FALSE, warning = FALSE, fig.height = 6.25, fig.width = 6, fig.pos = "H", out.extra="", fig.cap = "Clustering of the penguin data for K clusters ranging from 1 to 9. Cluster centers are indicated by larger points that are outlined in black."}
782785
set.seed(3)
783786
784787
kclusts <- tibble(k = 1:9) |>
@@ -842,7 +845,7 @@ decrease the total WSSD, but by only a *diminishing amount*. If we plot the tota
842845
clusters, we see that the decrease in total WSSD levels off (or forms an "elbow shape") \index{elbow method} when we reach roughly
843846
the right number of clusters (Figure \@ref(fig:10-toy-kmeans-elbow)).
844847

845-
```{r 10-toy-kmeans-elbow, echo = FALSE, warning = FALSE, fig.align = 'center', fig.height = 3.5, fig.width = 4.5, fig.cap = "Total WSSD for K clusters ranging from 1 to 9."}
848+
```{r 10-toy-kmeans-elbow, echo = FALSE, warning = FALSE, fig.align = 'center', fig.height = 3.25, fig.width = 4.25, fig.pos = "H", out.extra="", fig.cap = "Total WSSD for K clusters ranging from 1 to 9."}
846849
p2 <- ggplot(clusterings, aes(x = k, y = tot.withinss)) +
847850
geom_point(size = 2) +
848851
geom_line() +
@@ -933,7 +936,7 @@ clustered_data
933936
Now that we have this information in a tidy data frame, we can make a visualization
934937
of the cluster assignments for each point, as shown in Figure \@ref(fig:10-plot-clusters-2).
935938

936-
```{r 10-plot-clusters-2, fig.height = 3.5, fig.width = 4.5, fig.align = "center", fig.cap = "The data colored by the cluster assignments returned by K-means."}
939+
```{r 10-plot-clusters-2, fig.height = 3.25, fig.width = 4.25, fig.align = "center", fig.pos = "H", out.extra="", fig.cap = "The data colored by the cluster assignments returned by K-means."}
937940
cluster_plot <- ggplot(clustered_data,
938941
aes(x = flipper_length_mm,
939942
y = bill_length_mm,
@@ -1042,7 +1045,7 @@ clustering_statistics
10421045
Now that we have `tot.withinss` and `k` as columns in a data frame, we can make a line plot
10431046
(Figure \@ref(fig:10-plot-choose-k)) and search for the "elbow" to find which value of K to use.
10441047

1045-
```{r 10-plot-choose-k, fig.height = 3.5, fig.width = 4.5, fig.align = "center", fig.cap = "A plot showing the total WSSD versus the number of clusters."}
1048+
```{r 10-plot-choose-k, fig.height = 3.25, fig.width = 4.25, fig.align = "center", fig.pos = "H", out.extra="", fig.cap = "A plot showing the total WSSD versus the number of clusters."}
10461049
elbow_plot <- ggplot(clustering_statistics, aes(x = k, y = tot.withinss)) +
10471050
geom_point() +
10481051
geom_line() +
@@ -1077,7 +1080,7 @@ but there is a trade-off that doing many clusterings
10771080
could take a long time.
10781081
So this is something that needs to be balanced.
10791082

1080-
```{r 10-choose-k-nstart, fig.height = 3.5, fig.width = 4.5, message= FALSE, warning = FALSE, fig.align = "center", fig.cap = "A plot showing the total WSSD versus the number of clusters when K-means is run with 10 restarts."}
1083+
```{r 10-choose-k-nstart, fig.height = 3.25, fig.width = 4.25, fig.pos = "H", out.extra="", message= FALSE, warning = FALSE, fig.align = "center", fig.cap = "A plot showing the total WSSD versus the number of clusters when K-means is run with 10 restarts."}
10811084
penguin_clust_ks <- tibble(k = 1:9) |>
10821085
rowwise() |>
10831086
mutate(penguin_clusts = list(kmeans(standardized_data, nstart = 10, k)),

0 commit comments

Comments
 (0)