Skip to content

Conversation

@thecoop
Copy link
Member

@thecoop thecoop commented Jul 9, 2025

Change the JVM option to an index option direct_raw_vector_reads. This needs to be set on index creation, and cannot be changed afterwards. This is because it uses a special index format to indicate that direct IO should be used to access the flat vectors.

@thecoop thecoop force-pushed the direct-io-index-option branch from 7fec052 to 73e3cc0 Compare July 9, 2025 14:23
@thecoop
Copy link
Member Author

thecoop commented Jul 14, 2025

I've added it as a separate option - but the name needs some work. use_direct_io is an implementation-specific term that users wont understand without context. It's used in low-memory scenarios, so maybe something like low_memory_io? low_memory_mode?

I went with disable_offheap_cache_rescoring

@thecoop thecoop requested a review from benwtrent July 14, 2025 13:28
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked about before, I think this is likely the best way.

I don't know the best name for the parameter, but I agree it should be field mapper setting that is statically set on field creation.

Our format composability will allow us to have a

DirectIOES818HnswBinaryQuantizedVectorsFormat and DirectIOES818BinaryQuantizedVectorsFormat

Really, the only difference will be a single parameter passed to the inner formats & the name. So, we can likely refactor slightly and greatly reduce the code churn required to make happen.

@thecoop thecoop force-pushed the direct-io-index-option branch from 8e19c25 to 5749e57 Compare August 19, 2025 11:48
@thecoop thecoop changed the title Add a direct IO option to rescore_vector for bbq_hnsw Add a direct IO option for rescoring to bbq_hnsw Aug 19, 2025
@thecoop
Copy link
Member Author

thecoop commented Aug 19, 2025

I've changed it to an index setting. The name still needs a bit of work, and the docs need to be updated for the setting. We also need to consider conversions (if any) between indices with and without this option, or whether it requires a reindex to change this setting on an existing index.

@thecoop thecoop removed the WIP label Aug 19, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 19, 2025
@thecoop thecoop requested review from a team as code owners September 22, 2025 13:55
@thecoop thecoop changed the base branch from lucene_snapshot to lucene_snapshot_10_3 September 22, 2025 13:55
@romseygeek
Copy link
Contributor

Is the plan to rebase this against main once the lucene_snapshot_10_3 branch is merged?

@benwtrent
Copy link
Member

Is the plan to rebase this against main once the lucene_snapshot_10_3 branch is merged?

@romseygeek yep!

@thecoop thecoop force-pushed the direct-io-index-option branch from e45df42 to 222ccc1 Compare September 23, 2025 13:14
@thecoop thecoop changed the base branch from lucene_snapshot_10_3 to main September 23, 2025 13:27
@elasticsearchmachine
Copy link
Collaborator

Hi @thecoop, I've created a changelog YAML for you.

@thecoop thecoop force-pushed the direct-io-index-option branch from f5a6952 to 67fb702 Compare September 23, 2025 13:43
@tvernum tvernum removed the request for review from a team September 24, 2025 06:11
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think renamign the param is warranted. I think passing a boolean is ok.

I am surprised there aren't any necessary changes to the binary quantized format!

public DenseVectorIndexOptions parseIndexOptions(String fieldName, Map<String, ?> indexOptionsMap, IndexVersion indexVersion) {
Object mNode = indexOptionsMap.remove("m");
Object efConstructionNode = indexOptionsMap.remove("ef_construction");
Object directRawVectorReadsNode = indexOptionsMap.remove("direct_raw_vector_reads");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets call this on_disk_rescore I think this captures the intent.

Let's confirm with others before we settle the name.

this(NAME, new Lucene99FlatVectorsFormat(FlatVectorScorerUtil.getLucene99FlatVectorsScorer()));
}

ES818BinaryQuantizedVectorsFormat(String name, FlatVectorsFormat rawVectorFormat) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works now?

@elasticsearchmachine
Copy link
Collaborator

Hi @thecoop, I've created a changelog YAML for you.

@thecoop
Copy link
Member Author

thecoop commented Oct 2, 2025

This is obsoleted by #135343

@thecoop thecoop closed this Oct 2, 2025
@thecoop thecoop deleted the direct-io-index-option branch October 2, 2025 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants