Skip to content

Commit a4b846d

Browse files
committed
update docs; clean propagate labels;
1 parent c66f5c1 commit a4b846d

File tree

4 files changed

+461
-16
lines changed

4 files changed

+461
-16
lines changed

doc/detecting_branches.ipynb

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@
195195
"metadata": {},
196196
"source": [
197197
"Using the branch detection functionality is fairly straightforward. Using\n",
198-
"`fast_hdbscan` one can simply configure the ``BranchDetector`` class and fit is\n",
198+
"`fast_hdbscan` one can simply configure the ``BranchDetector`` class and fit it\n",
199199
"with the `fast_hdbscan.HDBSCAN` object. By default `BranchDetector` uses the\n",
200200
"values of the given `HDBSCAN` object for the parameters they share.\n",
201201
"\n",
@@ -236,10 +236,9 @@
236236
"cell_type": "markdown",
237237
"metadata": {},
238238
"source": [
239-
"By default, the centers of clusters get a non-noise label different from the\n",
240-
"branches in the cluster. This behavior can be changed by setting the\n",
241-
"`propagate_labels=True` parameter or by calling `propagated_labels()` after\n",
242-
"fitting."
239+
"The centers of clusters get a non-noise label different from the branches in the\n",
240+
"cluster. This behavior can be changed by setting the `propagate_labels=True`\n",
241+
"parameter or by calling `propagated_labels()` after fitting."
243242
]
244243
},
245244
{
@@ -306,7 +305,12 @@
306305
" default a cluster needs to have one bifurcation (Y-shape) before the detected\n",
307306
" branches are represented in the final labelling.\n",
308307
"\n",
309-
"Unlike `hdbscan.branches.BranchDetector`, the `fast_hdbscan` version does not support the `branch_detection_method` parameter. This implementation will always use a `\"core\"` graph to determine which points are connected within a cluster."
308+
"Unlike the `hdbscan` version, `fast_hdbscan`'s `BranchDetector` does not support\n",
309+
"the `branch_detection_method` parameter. This implementation will always use a\n",
310+
"`\"core\"` graph to determine which points are connected within a cluster. A\n",
311+
"`\"core\"` graph combines nearest neighbors and the minimum spanning tree of a\n",
312+
"cluster. It contains all connectivity within the points' core distances and\n",
313+
"forms a single connected component per cluster."
310314
]
311315
},
312316
{
@@ -315,8 +319,8 @@
315319
"source": [
316320
"## Useful attributes\n",
317321
"\n",
318-
"Like the HDBSCAN class, the BranchDetector class contains several useful\n",
319-
"attributes for exploring datasets.\n",
322+
"The `BranchDetector` class contains several useful attributes for exploring\n",
323+
"datasets.\n",
320324
"\n",
321325
"### Branch hierarchy\n",
322326
"\n",

doc/for_developers.ipynb

Lines changed: 445 additions & 0 deletions
Large diffs are not rendered by default.

doc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ User Guide
7777
benchmarks
7878
comparable_clusterings
7979
detecting_branches
80+
for_developers
8081

8182

8283
----------

fast_hdbscan/sub_clusters.py

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -142,12 +142,10 @@ def propagate_labels_per_cluster(graph, sub_labels):
142142
undirected[idx][neighbor] = 1 / graph.weights[i]
143143

144144
# repeat density-weighted majority votes on noise points until all are assigned
145-
prev = 0
146145
while True:
147146
noise_idx = np.nonzero(sub_labels == -1)[0]
148-
if noise_idx.shape[0] == prev:
147+
if noise_idx.shape[0] == 0:
149148
break
150-
prev = noise_idx.shape[0]
151149
for idx in noise_idx:
152150
candidates = {np.int64(0): np.float64(0.0) for _ in range(0)}
153151
for neighbor_idx, weight in undirected[idx].items():
@@ -165,7 +163,7 @@ def propagate_labels_per_cluster(graph, sub_labels):
165163
max_weight = weight
166164
max_candidate = candidate
167165
sub_labels[idx] = max_candidate
168-
return sub_labels, prev
166+
return sub_labels
169167

170168

171169
def propagate_sub_cluster_labels(labels, sub_labels, graph_list, points_list):
@@ -177,11 +175,9 @@ def propagate_sub_cluster_labels(labels, sub_labels, graph_list, points_list):
177175
unique_sub_labels = np.unique(sub_labels[points])
178176
has_noise = unique_sub_labels[0] == -1 and len(unique_sub_labels) > 1
179177
if has_noise:
180-
sub_labels[points], remaining = propagate_labels_per_cluster(
178+
sub_labels[points] = propagate_labels_per_cluster(
181179
core_graph, sub_labels[points]
182180
)
183-
if remaining > 0:
184-
raise RuntimeError('Failed to propagate all labels in sub-cluster')
185181
labels[points] = sub_labels[points] + running_id
186182
running_id += len(unique_sub_labels) - int(has_noise)
187183

@@ -488,7 +484,6 @@ def fit(
488484
self.sub_cluster_probabilities_,
489485
self._approximation_graphs,
490486
self._condensed_trees,
491-
492487
self._linkage_trees,
493488
self._spanning_trees,
494489
self.lens_values_,

0 commit comments

Comments
 (0)