Skip to content

Commit f3769a5

Browse files
authored
Merge pull request #177 from moi90/patch-1
Clean up spelling in advanced_hdbscan.rst
2 parents d0869a5 + f469b51 commit f3769a5

File tree

1 file changed

+20
-21
lines changed

1 file changed

+20
-21
lines changed

docs/advanced_hdbscan.rst

Lines changed: 20 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,13 @@ Getting More Information About a Clustering
33
===========================================
44

55
Once you have the basics of clustering sorted you may want to dig a
6-
little deeper than just the cluster labels returned to you. Fortunately
7-
the hdbscan library provides you with the facilities to do this. During
6+
little deeper than just the cluster labels returned to you. Fortunately, the hdbscan library provides you with the facilities to do this. During
87
processing HDBSCAN\* builds a hierarchy of potential clusters, from
98
which it extracts the flat clustering returned. It can be informative to
109
look at that hierarchy, and potentially make use of the extra
1110
information contained therein.
1211

13-
Suppose we have a dataset for clustering. It is a binary file in nunpy format and it can be found at https://github.com/lmcinnes/hdbscan/blob/master/notebooks/clusterable_data.npy.
12+
Suppose we have a dataset for clustering. It is a binary file in NumPy format and it can be found at https://github.com/lmcinnes/hdbscan/blob/master/notebooks/clusterable_data.npy.
1413

1514
.. code:: python
1615
@@ -116,7 +115,7 @@ each branch representing the number of points in the cluster at that
116115
level. If we wish to know which branches were selected by the HDBSCAN\*
117116
algorithm we can pass ``select_clusters=True``. You can even pass a
118117
selection palette to color the selections according to the cluster
119-
labelling.
118+
labeling.
120119

121120
.. code:: python
122121
@@ -127,8 +126,8 @@ labelling.
127126
.. image:: images/advanced_hdbscan_11_1.png
128127

129128

130-
From this we can see, for example, that the yellow cluster, at the
131-
center of the plot, forms early (breaking off from the pale blue and
129+
From this, we can see, for example, that the yellow cluster at the
130+
center of the plot forms early (breaking off from the pale blue and
132131
purple clusters) and persists for a long time. By comparison the green
133132
cluster, which also forms early, quickly breaks apart and then
134133
vanishes altogether (shattering into clusters all smaller than the
@@ -141,7 +140,7 @@ for example, in the dark blue cluster.
141140

142141
If this was a simple visual analysis of the condensed tree can tell you
143142
a lot more about the structure of your data. This is not all we can do
144-
with condensed trees however. For larger and more complex datasets the
143+
with condensed trees, however. For larger and more complex datasets the
145144
tree itself may be very complex, and it may be desirable to run more
146145
interesting analytics over the tree itself. This can be achieved via
147146
several converter methods: :py:meth:`~hdbscan.plots.CondensedTree.to_networkx`, :py:meth:`~hdbscan.plots.CondensedTree.to_pandas`, and
@@ -162,9 +161,9 @@ First we'll consider :py:meth:`~hdbscan.plots.CondensedTree.to_networkx`
162161
163162
164163
165-
As you can see we get a networkx directed graph, which we can then use
166-
all the regular networkx tools and analytics on. The graph is richer
167-
than the visual plot above may lead you to believe however:
164+
As you can see we get a NetworkX directed graph, which we can then use
165+
all the regular NetworkX tools and analytics on. The graph is richer
166+
than the visual plot above may lead you to believe, however:
168167

169168
.. code:: python
170169
@@ -182,12 +181,12 @@ than the visual plot above may lead you to believe however:
182181
183182
The graph actually contains nodes for all the points falling out of
184183
clusters as well as the clusters themselves. Each node has an associated
185-
``size`` attribute, and each edge has a ``weight`` of the lambda value
184+
``size`` attribute and each edge has a ``weight`` of the lambda value
186185
at which that edge forms. This allows for much more interesting
187186
analyses.
188187

189-
Next we have the :py:meth:`~hdbscan.plots.CondensedTree.to_pandas` method, which returns a panda dataframe
190-
where each row corresponds to an edge of the networkx graph:
188+
Next, we have the :py:meth:`~hdbscan.plots.CondensedTree.to_pandas` method, which returns a panda DataFrame
189+
where each row corresponds to an edge of the NetworkX graph:
191190

192191
.. code:: python
193192
@@ -258,11 +257,11 @@ the id of the child cluster (or, if the child is a single data point
258257
rather than a cluster, the index in the dataset of that point), the
259258
``lambda_val`` provides the lambda value at which the edge forms, and
260259
the ``child_size`` provides the number of points in the child cluster.
261-
As you can see the start of the dataframe has singleton points falling
260+
As you can see the start of the DataFrame has singleton points falling
262261
out of the root cluster, with each ``child_size`` equal to 1.
263262

264263
If you want just the clusters, rather than all the individual points
265-
as well, simply select the rows of the dataframe with ``child_size``
264+
as well, simply select the rows of the DataFrame with ``child_size``
266265
greater than 1.
267266

268267
.. code:: python
@@ -293,13 +292,13 @@ array:
293292
294293
295294
296-
This is equivalent to the pandas dataframe, but is in pure numpy and
295+
This is equivalent to the pandas DataFrame but is in pure NumPy and
297296
hence has no pandas dependencies if you do not wish to use pandas.
298297

299298
Single Linkage Trees
300299
--------------------
301300

302-
We have still more data at our disposal however. As noted in the How
301+
We have still more data at our disposal, however. As noted in the How
303302
HDBSCAN Works section, prior to providing a condensed tree the algorithm
304303
builds a complete dendrogram. We have access to this too via the
305304
:py:attr:`~hdbscan.HDBSCAN.single_linkage_tree_` attribute of the clusterer.
@@ -333,13 +332,13 @@ As you can see we gain a lot from condensing the tree in terms of better
333332
presenting and summarising the data. There is a lot less to be gained
334333
from visual inspection of a plot like this (and it only gets worse for
335334
larger datasets). The plot function support most of the same
336-
fucntionality as the dendrogram plotting from
335+
functionality as the dendrogram plotting from
337336
``scipy.cluster.hierarchy``, so you can view various truncations of the
338337
tree if necessary. In practice, however, you are more likely to be
339338
interested in access the raw data for further analysis. Again we have
340339
:py:meth:`~hdbscan.plots.SingleLinkageTree.to_networkx`, :py:meth:`~hdbscan.plots.SingleLinkageTree.to_pandas` and :py:meth:`~hdbscan.plots.SingleLinkageTree.to_numpy`. This time the
341-
:py:meth:`~hdbscan.plots.SingleLinkageTree.to_networkx` provides a direct networkx version of what you see
342-
above. The numpy and pandas results conform to the single linkage
340+
:py:meth:`~hdbscan.plots.SingleLinkageTree.to_networkx` provides a direct NetworkX version of what you see
341+
above. The NumPy and pandas results conform to the single linkage
343342
hierarchy format of ``scipy.cluster.hierarchy``, and can be passed to
344343
routines there if necessary.
345344

@@ -360,6 +359,6 @@ noise points (any cluster smaller than the ``minimum_cluster_size``).
360359
array([ 0, -1, 0, ..., -1, -1, 0])
361360
362361
363-
In this way it is possible to extract the DBSCAN clustering that would result
362+
In this way, it is possible to extract the DBSCAN clustering that would result
364363
for any given epsilon value, all from one run of hdbscan.
365364

0 commit comments

Comments
 (0)