Refactor doc-id encoding for DiskBBQ to allow doc ids to be encoded in blocks #134013

benwtrent · 2025-09-02T19:18:56Z

This uniformly encodes a bunch of doc-ids with the "worst" encoding. This way we store the encoding value just once, and can then encode the doc IDs uniformly and decode them in chunks at a time.

This is currently unused, but there are two things that I think that could benefit from this:

doc id ranges being encoded for centroids (eager/lazy filtering)
encoding doc ids along with vector blocks to prevent potentially loading large integer arrays for overly large clusters

…n blocks

elasticsearchmachine · 2025-09-02T19:19:20Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

iverase

This change makes sense to me.

Refactor doc-id encoding for DiskBBQ to allow doc ids to be encoded i…

9aee716

…n blocks

benwtrent requested a review from iverase September 2, 2025 19:18

benwtrent added >non-issue :Search Relevance/Vectors Vector search v9.2.0 labels Sep 2, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Sep 2, 2025

benwtrent and others added 2 commits September 2, 2025 17:08

Merge branch 'main' into refactor/add-block-doc-id-encoding

f2dcf33

Merge branch 'main' into refactor/add-block-doc-id-encoding

f652c80

iverase approved these changes Sep 3, 2025

View reviewed changes

Merge branch 'main' into refactor/add-block-doc-id-encoding

48018c2

benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 3, 2025

Merge branch 'main' into refactor/add-block-doc-id-encoding

5f3fd23

elasticsearchmachine merged commit e721859 into elastic:main Sep 3, 2025
33 checks passed

benwtrent deleted the refactor/add-block-doc-id-encoding branch September 3, 2025 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor doc-id encoding for DiskBBQ to allow doc ids to be encoded in blocks #134013

Refactor doc-id encoding for DiskBBQ to allow doc ids to be encoded in blocks #134013

Uh oh!

benwtrent commented Sep 2, 2025

Uh oh!

elasticsearchmachine commented Sep 2, 2025

Uh oh!

iverase left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Refactor doc-id encoding for DiskBBQ to allow doc ids to be encoded in blocks #134013

Refactor doc-id encoding for DiskBBQ to allow doc ids to be encoded in blocks #134013

Uh oh!

Conversation

benwtrent commented Sep 2, 2025

Uh oh!

elasticsearchmachine commented Sep 2, 2025

Uh oh!

iverase left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants