Skip to content

Add Cosine Distance#31

Open
carterharrison wants to merge 1 commit intotom-whitehead:masterfrom
carterharrison:master
Open

Add Cosine Distance#31
carterharrison wants to merge 1 commit intotom-whitehead:masterfrom
carterharrison:master

Conversation

@carterharrison
Copy link
Copy Markdown

@carterharrison carterharrison commented Oct 11, 2024

I implemented Cosine Distance as a DistanceMetric. It is a common distance metric used for text embeddings. Let me know what I can do to get this merged!

@carterharrison carterharrison changed the title Add Cosin Distance Add Cosine Distance Oct 11, 2024
@tom-whitehead
Copy link
Copy Markdown
Owner

tom-whitehead commented Oct 12, 2024

Hey @carterharrison, thanks for contributing! My HDBSCAN implementation uses a KD Tree nearest neighbours algorithm to calculate distances to the kth (min_samples-th) neighbour. I've shied away from implementing cosine distance in the past as it
it's not commonly used with KD Trees. As I understand it, KD Tree NN is largely designed to work efficiently with distance metrics that satisfy the properties of Euclidean space and queries using cosine distance can become inefficient quickly, particularly as dimensionality increases. It is more common to use cosine distance with the Ball Tree NN algorithm (which I haven't been able to support yet).

That said, I tried this out on a couple of test datasets and the clusters looked good, albeit very similar results to Euclidean distance. As such I have nothing against merging it.

Have you tested it at all? If so, are you happy with the quality of clusters and performance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants