Skip to content

Conversation

benwtrent
Copy link
Member

This uniformly encodes a bunch of doc-ids with the "worst" encoding. This way we store the encoding value just once, and can then encode the doc IDs uniformly and decode them in chunks at a time.

This is currently unused, but there are two things that I think that could benefit from this:

  • doc id ranges being encoded for centroids (eager/lazy filtering)
  • encoding doc ids along with vector blocks to prevent potentially loading large integer arrays for overly large clusters

@benwtrent benwtrent requested a review from iverase September 2, 2025 19:18
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Sep 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense to me.

@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 3, 2025
@elasticsearchmachine elasticsearchmachine merged commit e721859 into elastic:main Sep 3, 2025
33 checks passed
@benwtrent benwtrent deleted the refactor/add-block-doc-id-encoding branch September 3, 2025 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants