Skip to content

Add the current count of vectors in a cluster in hierarchical k-means #132587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Aug 8, 2025

This commits adds a new parameter to the k-means result that contains the current count of vectors in a cluster. This array is always up-to-date so at anytime it contains the number of vectors assign to a cluster. This array is used in the places where we are counting the number of vectors assigned, both in the codec as well as in the algorithm itself. But more important, this will allow us to limit the number of vectors in a cluster if we wish to, in order to build more balanced clusters.

I did not notice any performance regression or changes in recall after this change. The only difference with the previous version is that when we update the centroids after a assignment step, we update the centroids using all the assigned vectors, while before we were using only the sampled vectors.

@iverase iverase changed the title Add the current count of vector in a cluster in hierarchical k-means Add the current count of vectors in a cluster in hierarchical k-means Aug 8, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants