4343
4444> Challenge:
4545>
46- > - To learn about k-means, let's use the ` iris ` with the sepal and
46+ > - To learn about k-means, let's use the ` iris ` dataset with the sepal and
4747> petal length variables only (to facilitate visualisation). Create
4848> such a data matrix and name it ` x `
4949
@@ -63,7 +63,7 @@ cl <- kmeans(x, 3, nstart = 10)
6363> - The actual results of the algorithms, i.e. the cluster membership
6464> can be accessed in the ` clusters ` element of the clustering result
6565> output. Use it to colour the inferred clusters to generate a figure
66- > like shown below.
66+ > like that shown below.
6767
6868``` {r solkmplot, echo=FALSE, fig.cap = "k-means algorithm on sepal and petal lengths"}
6969plot(x, col = cl$cluster)
@@ -139,7 +139,7 @@ a global minimum.
139139
140140> Challenge:
141141>
142- > Repeat kmeans on our ` x ` data multiple times, setting the number of
142+ > Repeat k-means on our ` x ` data multiple times, setting the number of
143143> iterations to 1 or greater and check whether you repeatedly obtain
144144> the same results. Try the same with random data of identical
145145> dimensions.
@@ -203,13 +203,13 @@ plot(ks, tot_within_ss, type = "b")
203203
204204### How does hierarchical clustering work
205205
206- ** Initialisation** : Starts by assigning each of the n point its own cluster
206+ ** Initialisation** : Starts by assigning each of the n points its own cluster
207207
208208** Iteration**
209209
2102101 . Find the two nearest clusters, and join them together, leading to
211211 n-1 clusters
212- 2 . Continue merging cluster process until all are grouped into a
212+ 2 . Continue the cluster merging process until all are grouped into a
213213 single cluster
214214
215215** Termination:** All observations are grouped within a single cluster.
@@ -323,7 +323,7 @@ as well as supervised methods, as we will see in the next chapter.
323323
324324A typical way to pre-process the data prior to learning is to scale
325325the data, or apply principal component analysis (next section). Scaling
326- assures that all data columns have mean 0 and standard deviate 1.
326+ assures that all data columns have a mean of 0 and standard deviation of 1.
327327
328328In R, scaling is done with the ` scale ` function.
329329
@@ -348,11 +348,11 @@ plot(hcl2, main = "scaled data")
348348## Principal component analysis (PCA)
349349
350350** Dimensionality reduction** techniques are widely used and versatile
351- techniques that can be used o
351+ techniques that can be used to:
352352
353353- find structure in features
354354- pre-processing for other ML algorithms, and
355- - as an aid in visualisation.
355+ - aid in visualisation.
356356
357357The basic principle of dimensionality reduction techniques is to
358358transform the data into a new space that summarise properties of the
0 commit comments