Skip to content

Commit 450d484

Browse files
committed
Merge remote-tracking branch 'origin/master'
2 parents 00220dd + cdcbda9 commit 450d484

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

docs/parameter_selection.rst

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,13 @@ Selecting ``min_cluster_size``
1515

1616
The primary parameter to effect the resulting clustering is
1717
``min_cluster_size``. Ideally this is a relatively intuitive parameter
18-
to select -- set it to the smallest size grouping that you sih to
18+
to select -- set it to the smallest size grouping that you wish to
1919
consider a cluster. It can have slightly non-obvious effects however.
2020
Let's consider the digits dataset from sklearn. We can project the data
2121
into two dimensions to visualize it via t-SNE.
2222

23+
.. code:: python
24+
2325
digits = datasets.load_digits()
2426
data = digits.data
2527
projection = TSNE().fit_transform(data)
@@ -29,7 +31,7 @@ into two dimensions to visualize it via t-SNE.
2931
.. image:: images/parameter_selection_3_1.png
3032

3133

32-
If we cluster this data in the full 64 dimensional space with hdbscan we
34+
If we cluster this data in the full 64 dimensional space with HDBSCAN\* we
3335
can see some effects from varying the ``min_cluster_size``.
3436

3537
We start with a ``min_cluster_size`` of 15.
@@ -52,7 +54,7 @@ We start with a ``min_cluster_size`` of 15.
5254
Increasing the ``min_cluster_size`` to 30 reduces the number of
5355
clusters, merging some together. This is a result of HDBSCAN\*
5456
reoptimizing which flat clustering provides greater stability under a
55-
slightly different notion of what constitutes cluster.
57+
slightly different notion of what constitutes a cluster.
5658

5759
.. code:: python
5860
@@ -113,7 +115,7 @@ pruned out. Thus ``min_cluster_size`` does behave more closely to our
113115
intuitions, but only if we fix ``min_samples``. If you wish to explore
114116
different ``min_cluster_size`` settings with a fixed ``min_samples``
115117
value, especially for larger dataset sizes, you can cache the hard
116-
computation, and recompute onlythe relatively cheap flat cluster
118+
computation, and recompute only the relatively cheap flat cluster
117119
extraction using the ``memory`` parameter, which makes use of ``joblib``
118120
[link].
119121

@@ -156,7 +158,7 @@ leaving the ``min_cluster_size`` at 60, but reducing ``min_samples`` to
156158

157159
Now most points are clustered, and there are much fewer noise points.
158160
Steadily increasing ``min_samples`` will, as we saw in the examples
159-
above, make the clustering progressivly more conservative, culiminating
161+
above, make the clustering progressivly more conservative, culminating
160162
in the example above where ``min_samples`` was set to 60 and we had only
161163
two clusters with most points declared as noise.
162164

0 commit comments

Comments
 (0)