You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: clustering.Rmd
+10-10Lines changed: 10 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -164,7 +164,7 @@ penguin_data
164
164
Next, we can create a scatter plot using this data set
165
165
to see if we can detect subtypes or groups in our data set.
166
166
167
-
```{r 10-toy-example-plot, warning = FALSE, fig.height = 4, fig.width = 4.35, fig.cap = "Scatter plot of standardized bill length versus standardized flipper length."}
167
+
```{r 10-toy-example-plot, warning = FALSE, fig.height = 4, fig.width = 4.35, fig.align = "center", fig.cap = "Scatter plot of standardized bill length versus standardized flipper length."}
168
168
ggplot(data, aes(x = flipper_length_standardized,
169
169
y = bill_length_standardized)) +
170
170
geom_point() +
@@ -198,7 +198,7 @@ This procedure will separate the data into groups;
198
198
Figure \@ref(fig:10-toy-example-clustering) shows these groups
199
199
denoted by colored scatter points.
200
200
201
-
```{r 10-toy-example-clustering, echo = FALSE, warning = FALSE, fig.height = 4, fig.width = 5, fig.cap = "Scatter plot of standardized bill length versus standardized flipper length with colored groups."}
201
+
```{r 10-toy-example-clustering, echo = FALSE, warning = FALSE, fig.height = 4, fig.width = 5, fig.align = "center", fig.cap = "Scatter plot of standardized bill length versus standardized flipper length with colored groups."}
202
202
ggplot(data, aes(y = bill_length_standardized,
203
203
x = flipper_length_standardized, color = cluster)) +
204
204
geom_point() +
@@ -256,7 +256,7 @@ in Figure \@ref(fig:10-toy-example-clus1-center).
256
256
257
257
(ref:10-toy-example-clus1-center) Cluster 1 from the `penguin_data` data set example. Observations are in blue, with the cluster center highlighted in red.
base <- ggplot(data, aes(x = flipper_length_standardized, y = bill_length_standardized)) +
261
261
geom_point() +
262
262
xlab("Flipper Length (standardized)") +
@@ -303,7 +303,7 @@ These distances are denoted by lines in Figure \@ref(fig:10-toy-example-clus1-di
303
303
304
304
(ref:10-toy-example-clus1-dists) Cluster 1 from the `penguin_data` data set example. Observations are in blue, with the cluster center highlighted in red. The distances from the observations to the cluster center are represented as black lines.
(ref:10-toy-example-all-clus-dists) All clusters from the `penguin_data` data set example. Observations are in orange, blue, and yellow with the cluster center highlighted in red. The distances from the observations to each of the respective cluster centers are represented as black lines.
@@ -439,7 +439,7 @@ and the right column depicts the reassignment of data to clusters.
439
439
440
440
(ref:10-toy-kmeans-iter) First four iterations of K-means clustering on the `penguin_data` example data set. Each row corresponds to an iteration, where the left column depicts the center update, and the right column depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.
@@ -546,7 +546,7 @@ These, however, are beyond the scope of this book.
546
546
Unlike the classification and regression models we studied in previous chapters, K-means \index{K-means!restart,nstart} can get "stuck" in a bad solution.
547
547
For example, Figure \@ref(fig:10-toy-kmeans-bad-init) illustrates an unlucky random initialization by K-means.
@@ -567,7 +567,7 @@ Figure \@ref(fig:10-toy-kmeans-bad-iter) shows what the iterations of K-means wo
567
567
568
568
(ref:10-toy-kmeans-bad-iter) First five iterations of K-means clustering on the `penguin_data` example data set with a poor random initialization. Each row corresponds to an iteration, where the left column depicts the center update, and the right column depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.
@@ -959,7 +959,7 @@ but there is a trade-off that doing many clusterings
959
959
could take a long time.
960
960
So this is something that needs to be balanced.
961
961
962
-
```{r 10-choose-k-nstart, fig.height = 4, fig.width = 4.35, message= F, warning = F, fig.cap = "A plot showing the total WSSD versus the number of clusters when K-means is run with 10 restarts."}
962
+
```{r 10-choose-k-nstart, fig.height = 4, fig.width = 4.35, message= FALSE, warning = FALSE, fig.align = "center", fig.cap = "A plot showing the total WSSD versus the number of clusters when K-means is run with 10 restarts."}
0 commit comments