-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Refactor bulk quantization writing into a unified class #130354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor bulk quantization writing into a unified class #130354
Conversation
john-wagster
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| writeQuantizedValue(postingsOutput, binaryValue, correction); | ||
| binarizedByteVectorValues.getCorrectiveTerms(ord); | ||
| postingsOutput.writeBytes(binaryValue, 0, binaryValue.length); | ||
| postingsOutput.writeInt(Float.floatToIntBits(correction.lowerInterval())); | ||
| postingsOutput.writeInt(Float.floatToIntBits(correction.upperInterval())); | ||
| postingsOutput.writeInt(Float.floatToIntBits(correction.additionalCorrection())); | ||
| assert correction.quantizedComponentSum() >= 0 && correction.quantizedComponentSum() <= 0xffff; | ||
| postingsOutput.writeShort((short) correction.quantizedComponentSum()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were actually writing all vectors TWICE in the tail of postings. This (just a tad) negatively impacted recall. This is also why the skew index test started failing as the testing value there apparently relied on this :/
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
this is a small refactor, laying ground work for more generalized bulk writing. I did some benchmarking and there was no significant performance difference (as expected).
this is a small refactor, laying ground work for more generalized bulk writing. I did some benchmarking and there was no significant performance difference (as expected).
this is a small refactor, laying ground work for more generalized bulk writing.
I did some benchmarking and there was no significant performance difference (as expected).