Skip to content

Commit 7607675

Browse files
fixing some of the text for new figs
1 parent 75f6d0d commit 7607675

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

source/clustering.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -358,7 +358,7 @@ in {numref}`toy-example-clus1-center`
358358
:figwidth: 700px
359359
:name: toy-example-clus1-center
360360

361-
Cluster 1 from the penguin_data data set example. Observations are in blue, with the cluster center highlighted in red.
361+
Cluster 1 from the `penguin_data` data set example. Observations are in blue, with the cluster center highlighted in orange.
362362
:::
363363

364364
```{code-cell} ipython3
@@ -400,7 +400,7 @@ These distances are denoted by lines in {numref}`toy-example-clus1-dists` for th
400400
:figwidth: 700px
401401
:name: toy-example-clus1-dists
402402

403-
Cluster 1 from the penguin_data data set example. Observations are in blue, with the cluster center highlighted in red. The distances from the observations to the cluster center are represented as black lines.
403+
Cluster 1 from the `penguin_data` data set example. Observations are in blue, with the cluster center highlighted in orange. The distances from the observations to the cluster center are represented as black lines.
404404
:::
405405

406406
```{code-cell} ipython3
@@ -450,7 +450,7 @@ These distances are denoted by black lines in
450450
:figwidth: 700px
451451
:name: toy-example-all-clus-dists
452452

453-
All clusters from the penguin_data data set example. Observations are in orange, blue, and yellow with the cluster center highlighted in red. The distances from the observations to each of the respective cluster centers are represented as black lines.
453+
All clusters from the `penguin_data` data set example. Observations are in blue, orange, and red with the cluster center highlighted in orange. The distances from the observations to each of the respective cluster centers are represented as black lines.
454454
:::
455455

456456
+++
@@ -584,7 +584,7 @@ and the right column depicts the reassignment of data to clusters.
584584
:figwidth: 700px
585585
:name: toy-kmeans-iter-1
586586

587-
First three iterations of K-means clustering on the penguin_data example data set. Each pair of plots corresponds to an iteration. Within the pair, the first plot depicts the center update, and the second plot depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.
587+
First three iterations of K-means clustering on the `penguin_data` example data set. Each pair of plots corresponds to an iteration. Within the pair, the first plot depicts the center update, and the second plot depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.
588588
:::
589589

590590
+++
@@ -662,7 +662,7 @@ glue('toy-kmeans-bad-iter-1', plot_kmean_iterations(4, penguin_data, centroid_in
662662
:figwidth: 700px
663663
:name: toy-kmeans-bad-iter-1
664664

665-
First five iterations of K-means clustering on the penguin_data example data set with a poor random initialization. Each pair of plots corresponds to an iteration. Within the pair, the first plot depicts the center update, and the second plot depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.
665+
First five iterations of K-means clustering on the `penguin_data` example data set with a poor random initialization. Each pair of plots corresponds to an iteration. Within the pair, the first plot depicts the center update, and the second plot depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black.
666666
:::
667667

668668
This looks like a relatively bad clustering of the data, but K-means cannot improve it.
@@ -764,7 +764,7 @@ glue('toy-kmeans-elbow', elbow_plot, display=True)
764764
```
765765

766766
If we set K less than 3, then the clustering merges separate groups of data; this causes a large
767-
total WSSD, since the cluster center (denoted by an "x") is not close to any of the data in the cluster. On
767+
total WSSD, since the cluster center (denoted by large shapes with black outlines) is not close to any of the data in the cluster. On
768768
the other hand, if we set K greater than 3, the clustering subdivides subgroups of data; this does indeed still
769769
decrease the total WSSD, but by only a *diminishing amount*. If we plot the total WSSD versus the number of
770770
clusters, we see that the decrease in total WSSD levels off (or forms an "elbow shape") when we reach roughly

0 commit comments

Comments
 (0)