@@ -39,16 +39,17 @@ including techniques to choose the number of clusters.
39
39
40
40
By the end of the chapter, readers will be able to do the following:
41
41
42
- * Describe a case where clustering is appropriate,
42
+ * Describe a situation in which clustering is an appropriate technique to use ,
43
43
and what insight it might extract from the data.
44
44
* Explain the K-means clustering algorithm.
45
45
* Interpret the output of a K-means analysis.
46
- * Differentiate between clustering and classification .
47
- * Identify when it is necessary to scale variables before clustering and do this using Python
48
- * Perform k -means clustering in Python using ` scikit-learn `
46
+ * Differentiate between clustering, classification, and regression .
47
+ * Identify when it is necessary to scale variables before clustering, and do this using Python.
48
+ * Perform K -means clustering in Python using ` scikit-learn ` .
49
49
* Use the elbow method to choose the number of clusters for K-means.
50
- * Visualize the output of k-means clustering in Python using a coloured scatter plot
51
- * Describe advantages, limitations and assumptions of the kmeans clustering algorithm.
50
+ * Visualize the output of K-means clustering in Python using a colored scatter plot.
51
+ * Describe advantages, limitations and assumptions of the K-means clustering algorithm.
52
+
52
53
53
54
## Clustering
54
55
@@ -912,7 +913,7 @@ penguin_clust[1].inertia_
912
913
913
914
To calculate the total WSSD for a variety of Ks, we will
914
915
create a data frame that contains different values of ` k `
915
- and the WSSD of running KMeans with each values of k.
916
+ and the WSSD of running K-means with each values of k.
916
917
To create this dataframe,
917
918
we will use what is called a "list comprehension" in Python,
918
919
where we repeat an operation multiple times
@@ -934,7 +935,7 @@ we could square all the numbers from 1-4 and store them in a list:
934
935
935
936
Next, we will use this approach to compute the WSSD for the K-values 1 through 9.
936
937
For each value of K,
937
- we create a new KMeans model
938
+ we create a new ` KMeans ` model
938
939
and wrap it in a ` scikit-learn ` pipeline
939
940
with the preprocessor we created earlier.
940
941
We store the WSSD values in a list that we will use to create a dataframe
@@ -1008,7 +1009,7 @@ due to an unlucky initialization of the initial center positions
1008
1009
as we mentioned earlier in the chapter.
1009
1010
1010
1011
``` {note}
1011
- It is rare that the KMeans function from `scikit-learn`
1012
+ It is rare that the implementation of K-means from `scikit-learn`
1012
1013
gets stuck in a bad solution, because `scikit-learn` tries to choose
1013
1014
the initial centers carefully to prevent this from happening.
1014
1015
If you still find yourself in a situation where you have a bump in the elbow plot,
0 commit comments