Skip to content

Commit 80e6dd5

Browse files
clustering lobjs
1 parent 70b4238 commit 80e6dd5

File tree

1 file changed

+10
-9
lines changed

1 file changed

+10
-9
lines changed

source/clustering.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -39,16 +39,17 @@ including techniques to choose the number of clusters.
3939

4040
By the end of the chapter, readers will be able to do the following:
4141

42-
* Describe a case where clustering is appropriate,
42+
* Describe a situation in which clustering is an appropriate technique to use,
4343
and what insight it might extract from the data.
4444
* Explain the K-means clustering algorithm.
4545
* Interpret the output of a K-means analysis.
46-
* Differentiate between clustering and classification.
47-
* Identify when it is necessary to scale variables before clustering and do this using Python
48-
* Perform k-means clustering in Python using `scikit-learn`
46+
* Differentiate between clustering, classification, and regression.
47+
* Identify when it is necessary to scale variables before clustering, and do this using Python.
48+
* Perform K-means clustering in Python using `scikit-learn`.
4949
* Use the elbow method to choose the number of clusters for K-means.
50-
* Visualize the output of k-means clustering in Python using a coloured scatter plot
51-
* Describe advantages, limitations and assumptions of the kmeans clustering algorithm.
50+
* Visualize the output of K-means clustering in Python using a colored scatter plot.
51+
* Describe advantages, limitations and assumptions of the K-means clustering algorithm.
52+
5253

5354
## Clustering
5455

@@ -912,7 +913,7 @@ penguin_clust[1].inertia_
912913

913914
To calculate the total WSSD for a variety of Ks, we will
914915
create a data frame that contains different values of `k`
915-
and the WSSD of running KMeans with each values of k.
916+
and the WSSD of running K-means with each values of k.
916917
To create this dataframe,
917918
we will use what is called a "list comprehension" in Python,
918919
where we repeat an operation multiple times
@@ -934,7 +935,7 @@ we could square all the numbers from 1-4 and store them in a list:
934935

935936
Next, we will use this approach to compute the WSSD for the K-values 1 through 9.
936937
For each value of K,
937-
we create a new KMeans model
938+
we create a new `KMeans` model
938939
and wrap it in a `scikit-learn` pipeline
939940
with the preprocessor we created earlier.
940941
We store the WSSD values in a list that we will use to create a dataframe
@@ -1008,7 +1009,7 @@ due to an unlucky initialization of the initial center positions
10081009
as we mentioned earlier in the chapter.
10091010

10101011
```{note}
1011-
It is rare that the KMeans function from `scikit-learn`
1012+
It is rare that the implementation of K-means from `scikit-learn`
10121013
gets stuck in a bad solution, because `scikit-learn` tries to choose
10131014
the initial centers carefully to prevent this from happening.
10141015
If you still find yourself in a situation where you have a bump in the elbow plot,

0 commit comments

Comments
 (0)