-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add a direct IO option for rescoring to bbq_hnsw #130893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7fec052 to
73e3cc0
Compare
server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
|
I've added it as a separate option - but the name needs some work. I went with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we talked about before, I think this is likely the best way.
I don't know the best name for the parameter, but I agree it should be field mapper setting that is statically set on field creation.
Our format composability will allow us to have a
DirectIOES818HnswBinaryQuantizedVectorsFormat and DirectIOES818BinaryQuantizedVectorsFormat
Really, the only difference will be a single parameter passed to the inner formats & the name. So, we can likely refactor slightly and greatly reduce the code churn required to make happen.
8e19c25 to
5749e57
Compare
|
I've changed it to an index setting. The name still needs a bit of work, and the docs need to be updated for the setting. We also need to consider conversions (if any) between indices with and without this option, or whether it requires a reindex to change this setting on an existing index. |
…ect-io-index-option
|
Is the plan to rebase this against |
@romseygeek yep! |
e45df42 to
222ccc1
Compare
|
Hi @thecoop, I've created a changelog YAML for you. |
f5a6952 to
67fb702
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think renamign the param is warranted. I think passing a boolean is ok.
I am surprised there aren't any necessary changes to the binary quantized format!
| public DenseVectorIndexOptions parseIndexOptions(String fieldName, Map<String, ?> indexOptionsMap, IndexVersion indexVersion) { | ||
| Object mNode = indexOptionsMap.remove("m"); | ||
| Object efConstructionNode = indexOptionsMap.remove("ef_construction"); | ||
| Object directRawVectorReadsNode = indexOptionsMap.remove("direct_raw_vector_reads"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets call this on_disk_rescore I think this captures the intent.
Let's confirm with others before we settle the name.
| this(NAME, new Lucene99FlatVectorsFormat(FlatVectorScorerUtil.getLucene99FlatVectorsScorer())); | ||
| } | ||
|
|
||
| ES818BinaryQuantizedVectorsFormat(String name, FlatVectorsFormat rawVectorFormat) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this works now?
|
Hi @thecoop, I've created a changelog YAML for you. |
|
This is obsoleted by #135343 |
Change the JVM option to an index option
direct_raw_vector_reads. This needs to be set on index creation, and cannot be changed afterwards. This is because it uses a special index format to indicate that direct IO should be used to access the flat vectors.