Skip to content

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Jun 23, 2025

During merging, we need to access the vectors in a random access fashion in order to build the clusters. In order to achieve that, we write our vectors and dicIds together on a temporary file. During testing on a memory constraint node, I noticed in the flamegraph that we were taking a lot of time reading docIds:

image

Looking at this process I noticed we can do much better because:

  1. If the segment is dense, e.g all documents have a vector, we don't need to write he docIds as the docId is the ordinal of the vector.
  2. If the segment is not dense, we can write the docIds in a separate file as they are access independent of the vectors.

This commit just adds the logic above which improved the performance on memory constraint nodes.

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jun 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think separating out the doc ids vs the vectors is great!

@iverase iverase merged commit 72b488c into elastic:main Jun 23, 2025
27 checks passed
@iverase iverase deleted the ivfwriter_tmpfile branch June 23, 2025 12:44
kderusso pushed a commit to kderusso/elasticsearch that referenced this pull request Jun 23, 2025
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants