Skip to content

Commit 23a11a0

Browse files
authored
Add performance note per issue #167
1 parent eb9e50b commit 23a11a0

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

docs/faq.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,17 @@ Despite the generate model having clearly different "clusters", without more
4747
data we simply cannot differentiate between these models, and hence no
4848
density based clustering will manage cluster these according to the model.
4949

50+
Q: I am not getting the claimed performance. Why not?
51+
-----------------------------------------------------
52+
53+
The most likely explanation is to do with the dimensionality of your input data.
54+
While HDBSCAN can perform well on low to medium dimensional data the performance
55+
tends to decrease significantly as dimension increases. In general HDBSCAN can do
56+
well on up to around 50 or 100 dimensional data, but performance can see
57+
significant decreases beyond that. Of course a lot is also dataset dependent, so
58+
you can still get good performance even on high dimensional data, but it
59+
is no longer guaranteed.
60+
5061
Q: I want to predict the cluster of a new unseen point. How do I do this?
5162
-------------------------------------------------------------------------
5263

0 commit comments

Comments
 (0)