Skip to content

Conversation

thecoop
Copy link
Member

@thecoop thecoop commented Oct 1, 2025

Adds an on_disk_rescore index option to disk BBQ. This enables on-disk rescoring for the original raw vectors, so random pages are not copied into memory unnecessarily

@thecoop thecoop force-pushed the diskbbq-disk-rescoring branch from bf48533 to cf67fc8 Compare October 3, 2025 10:16
@thecoop thecoop requested a review from benwtrent October 3, 2025 10:16
@thecoop thecoop marked this pull request as ready for review October 3, 2025 10:17
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @thecoop, I've created a changelog YAML for you.

@thecoop
Copy link
Member Author

thecoop commented Oct 3, 2025

The option on serverless should just not do anything. We need to check this is the case already due to canUseDirectIO filtering out serverless-specific dirs.

Comment on lines +69 to +72
@FunctionalInterface
public interface GetFormatReader {
FlatVectorsReader getReader(String formatName, boolean useDirectIO) throws IOException;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to make this a more general interface? Maybe put it in its own class outside of IVF reader?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly - something to look at in #135931

*/
public abstract class IVFVectorsReader extends KnnVectorsReader {

private record FlatVectorsReaderKey(String formatName, boolean useDirectIO) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put it in its own class? Wouldn't we want this same thing for other formats eventually?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking maybe a shared superclass for the per-field specifications, but I want to do that refactor as part of #135931, after this is merged in

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really love how (relatively) simple this change is now.

I have some concerns on ensuring bwc tests. but overall I like the change.

@thecoop
Copy link
Member Author

thecoop commented Oct 6, 2025

The option on serverless should just not do anything. We need to check this is the case already due to canUseDirectIO filtering out serverless-specific dirs.

Yup, this is the case. The option on serverless can be set, but it doesn't do anything.

@thecoop thecoop requested review from benwtrent and tteofili October 6, 2025 15:17
Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, overall, this is very nice :)

}

// for testing
KnnVectorsWriter version0FieldsWriter(SegmentWriteState state) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thank you!

}
}

public void testDirectIOBackwardsCompatibleRead() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK, but I would prefer a new ES920V0DiskBBQVectorsFormatTests.java that runs ALL our test coverage for the byte change. But, eh, this is OK I think.

@thecoop thecoop merged commit b730620 into elastic:main Oct 7, 2025
34 checks passed
@thecoop thecoop deleted the diskbbq-disk-rescoring branch October 7, 2025 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants