Skip to content

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Aug 12, 2025

Currently, if a vector is numerically equivalent to the centroid (the distance between the vector and the centroid is lower than SOAR_MIN_DISTANCE), we spill the vector to the second nearest centroid. We think this is not needed because if the vector is really a near neighbour, we expect the centroid to be one of the centroids searched so spilling it to the second nearest does not really provide much value. In addition, in many cases this situation indicates a degenerated situation where the centroid is populated with the same vector so spilling all this vectors to the nearest centroid is not good.

Therefore this commit proposes that in the degenerated case, where the vector is equivalent to the centroid, the vector does not get a soar assignment, which is defined as a -1 in the soar assignments array. This commit adds a couple of test with degenerated distribution of vectors that makes sure we are handling the situation downstream.

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@iverase iverase merged commit 4d1b7a6 into elastic:main Aug 12, 2025
34 checks passed
@iverase iverase deleted the soarDegenerated branch August 12, 2025 11:59
iverase added a commit to iverase/elasticsearch that referenced this pull request Aug 13, 2025
iverase added a commit that referenced this pull request Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants