Skip to content

Conversation

@benwtrent
Copy link
Member

To protect against accidentally very large clusters, we shouldn't eagerly load ALL docIds for a postings list at time. This block encodes them into BLOCK_SIZE chunks (16).

Running benchmarks I found no measurable impact on index time. There MAY be a small increase at query time? The numbers aren't clear. If there is an impact, it seems very small.

As an aside, I am wondering if our block size is too conservative and if we should up it to 32 for vectors and docs as our typical running hardware is avx256 and 512...

@benwtrent benwtrent requested a review from iverase September 4, 2025 20:20
@elasticsearchmachine elasticsearchmachine added v9.2.0 needs:triage Requires assignment of a team area label labels Sep 4, 2025
@benwtrent benwtrent added >non-issue :Search Relevance/Vectors Vector search and removed needs:triage Requires assignment of a team area label labels Sep 4, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Sep 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems reasonable, especially when running in memory constrained setups.
I would eventually change the bulk size in a different PR, to eventually isolate performance impacts.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The reason for using 16 is that we use to filter soar assignments eagerly so it made sense to make the bulk size small.

@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 5, 2025
@elasticsearchmachine elasticsearchmachine merged commit b81985e into elastic:main Sep 5, 2025
33 checks passed
@benwtrent benwtrent deleted the block-encode-doc-ids branch September 5, 2025 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants