Merge pull request #394 from UBC-DSCI/clustering-edits

leem44 · web-flow · commit 45890e301a7d · 2021-12-06T16:19:01.000-08:00
Copyediting for clustering
diff --git a/clustering.Rmd b/clustering.Rmd
@@ -28,7 +28,7 @@ using the K-means algorithm,
 including techniques to choose the number of clusters.
 
 ## Chapter learning objectives 
-By the end of the chapter, readers will be able to:
+By the end of the chapter, readers will be able to do the following:
 
 * Describe a case where clustering is appropriate, 
 and what insight it might extract from the data.
@@ -104,7 +104,7 @@ for where to begin learning more about these other methods.
 Here we will present an illustrative example using a data set \index{Palmer penguins} from the
 [{palmerpenguins} R data package](https://allisonhorst.github.io/palmerpenguins/). This data set was
 collected by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) and
-the [Palmer Station, Antarctica Long Term Ecological Research Site](https://pal.lternet.edu/) and includes
+the [Palmer Station, Antarctica Long Term Ecological Research Site](https://pal.lternet.edu/), and includes
 measurements for adult penguins found near there [@palmerpenguins]. We have
 modified the data set for use in this chapter. Here we will focus on using two
 variables---penguin bill and flipper length, both in millimeters---to determine whether 
@@ -184,7 +184,7 @@ including:
 2. a small flipper length, but large bill length group, and
 3. a large  flipper and bill length group.
 
-Data visualization is a great tool to give us a rough sense for such patterns
+Data visualization is a great tool to give us a rough sense of such patterns
 when we have a small number of variables. 
 But if we are to group data&mdash;and select the number of groups&mdash;as part of 
 a reproducible analysis, we need something a bit more automated.
@@ -193,7 +193,7 @@ as we increase the number of variables we consider when clustering.
 The way to rigorously separate the data into groups 
 is to use a clustering algorithm.
 In this chapter, we will focus on the *K-means* algorithm, 
-\index{K-means} a widely-used and often very effective clustering method, 
+\index{K-means} a widely used and often very effective clustering method, 
 combined with the *elbow method* \index{elbow method} 
 for selecting the number of clusters. 
 This procedure will separate the data into groups;
@@ -911,7 +911,7 @@ As you can see above, the clustering object returned by `kmeans` has a lot of in
 that can be used to visualize the clusters, pick K, and evaluate the total WSSD.
 To obtain this information in a tidy format, we will call in help 
 from the `broom` package. \index{broom} Let's start by visualizing the clustering
-as a colored scatter plot. To do that
+as a colored scatter plot. To do that,
 we use the `augment` function, \index{K-means!augment} \index{augment} which takes in the model and the original data
 frame, and returns a data frame with the data and the cluster assignments for
 each point:
@@ -965,7 +965,7 @@ Then we use `rowwise` \index{rowwise} + `mutate` to apply the `kmeans` function
 within each row to each K. 
 However, given that the `kmeans` function 
 returns a model object to us (not a vector),
-we will need to store the results as a list columm.
+we will need to store the results as a list column.
 This works because both vectors and lists are legitimate 
 data structures for data frame columns. 
 To make this work, 
@@ -1098,4 +1098,4 @@ please follow the instructions for computer setup needed to run the worksheets
 found in Chapter \@ref(move-to-your-own-machine).
 
 ## Additional resources
-- Chapter 10 of [An Introduction to Statistical Learning](https://www.statlearning.com/) [-@james2013introduction] provides a great next stop in the process of learning about clustering and unsupervised learning in general. In the realm of clustering specifically, it provides a great companion introduction to K-means, but also covers *hierarchical* clustering for when you expect there to be subgroups, and then subgroups within subgroups, etc. in your data. In the realm of more general unsupervised learning, it covers *principal components analysis (PCA)*, which is a very popular technique in scientific applications for reducing the number of predictors in a dataset. 
+- Chapter 10 of [*An Introduction to Statistical Learning*](https://www.statlearning.com/) [-@james2013introduction] provides a great next stop in the process of learning about clustering and unsupervised learning in general. In the realm of clustering specifically, it provides a great companion introduction to K-means, but also covers *hierarchical* clustering for when you expect there to be subgroups, and then subgroups within subgroups, etc., in your data. In the realm of more general unsupervised learning, it covers *principal components analysis (PCA)*, which is a very popular technique in scientific applications for reducing the number of predictors in a dataset.