Skip to content

Wrong exemplars returned when using cluster_selection_epsilon (exemplars from eps=0 are returned)Β #593

@lucetka

Description

@lucetka

When using a model with cluster_selection_epsilon within the effective range, the exemplars returned seem to be totally wrong - they are the exemplars that belong to the clusters produced before the eps is applied.

I think this issue is related also to another issue that I've asked about #571 , ie that the condensed tree returned is always the eps=0 tree, without showing the new "superclusters" selected as a consequence of merging clusters + the points falling out at the specified eps level, and I've noticed that other related issues have been identified by others #586. It would be great if this could be fixed.

Meanwhile, as an ultra-quick and very dirty workaround sufficient for my specific use, I map the labels from the clustering with epsilon to the clustering without, and for the newly emerged superclusters I simply use the exemplars from all the clusters from the eps=0 clustering that had been engulfed by the new supercluster (i.e. instead of 3 exemplars, I end up for e.g. with 6, which is in my case -- clustering documents -- not necessarily a bad thing as it also gives you an idea about the heterogeneity of the final cluster). However, I know this is not really correct because of course the resulting supercluster consists of more than just the engulfed clusters that had been selected in the eps=0 clustering - the supercluster of course also sucks in all the points previously discarded as noise at every split that happened above the applied eps level, and all these points (previously noise in the eps=0 clustering but now part of the cluster in the clustering with eps applied) are then not represented by the exemplars.

Edit: I realize I should have mentioned hdbscan 0.8.28 with Python 3.10.2 on Windows 10 64bit

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions