Add the current count of vectors in a cluster in hierarchical k-means #132587

iverase · 2025-08-08T15:06:15Z

This commits adds a new parameter to the k-means result that contains the current count of vectors in a cluster. This array is always up-to-date so at anytime it contains the number of vectors assign to a cluster. This array is used in the places where we are counting the number of vectors assigned, both in the codec as well as in the algorithm itself. But more important, this will allow us to limit the number of vectors in a cluster if we wish to, in order to build more balanced clusters.

I did not notice any performance regression or changes in recall after this change. The only difference with the previous version is that when we update the centroids after a assignment step, we update the centroids using all the assigned vectors, while before we were using only the sampled vectors.

elasticsearchmachine · 2025-08-08T15:06:50Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Add the current count of vector in a cluster in hierarchical k-means

0f1cf08

iverase requested review from benwtrent and john-wagster August 8, 2025 15:06

iverase added >non-issue :Search Relevance/Vectors Vector search v9.2.0 labels Aug 8, 2025

iverase changed the title ~~Add the current count of vector in a cluster in hierarchical k-means~~ Add the current count of vectors in a cluster in hierarchical k-means Aug 8, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 8, 2025

Merge branch 'main' into runningCounts

94154a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add the current count of vectors in a cluster in hierarchical k-means #132587

Add the current count of vectors in a cluster in hierarchical k-means #132587

iverase commented Aug 8, 2025

Uh oh!

elasticsearchmachine commented Aug 8, 2025

Uh oh!

Uh oh!

Add the current count of vectors in a cluster in hierarchical k-means #132587

Are you sure you want to change the base?

Add the current count of vectors in a cluster in hierarchical k-means #132587

Conversation

iverase commented Aug 8, 2025

Uh oh!

elasticsearchmachine commented Aug 8, 2025

Uh oh!

Uh oh!