-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Add on-disk rescoring to disk BBQ #135778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9af249c
to
bf48533
Compare
bf48533
to
cf67fc8
Compare
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Hi @thecoop, I've created a changelog YAML for you. |
The option on serverless should just not do anything. We need to check this is the case already due to |
@FunctionalInterface | ||
public interface GetFormatReader { | ||
FlatVectorsReader getReader(String formatName, boolean useDirectIO) throws IOException; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to make this a more general interface? Maybe put it in its own class outside of IVF reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly - something to look at in #135931
*/ | ||
public abstract class IVFVectorsReader extends KnnVectorsReader { | ||
|
||
private record FlatVectorsReaderKey(String formatName, boolean useDirectIO) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put it in its own class? Wouldn't we want this same thing for other formats eventually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking maybe a shared superclass for the per-field specifications, but I want to do that refactor as part of #135931, after this is merged in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really love how (relatively) simple this change is now.
I have some concerns on ensuring bwc tests. but overall I like the change.
server/src/main/java/org/elasticsearch/index/codec/vectors/diskbbq/IVFVectorsWriter.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/vectors/diskbbq/IVFVectorsWriter.java
Show resolved
Hide resolved
Yup, this is the case. The option on serverless can be set, but it doesn't do anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, overall, this is very nice :)
} | ||
|
||
// for testing | ||
KnnVectorsWriter version0FieldsWriter(SegmentWriteState state) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, thank you!
server/src/main/java/org/elasticsearch/index/codec/vectors/diskbbq/IVFVectorsWriter.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/vectors/diskbbq/IVFVectorsWriter.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
public void testDirectIOBackwardsCompatibleRead() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK, but I would prefer a new ES920V0DiskBBQVectorsFormatTests.java
that runs ALL our test coverage for the byte change. But, eh, this is OK I think.
Adds an
on_disk_rescore
index option to disk BBQ. This enables on-disk rescoring for the original raw vectors, so random pages are not copied into memory unnecessarily