Introduce an Int7VectorScorer for scoring 7-bit quantize vectors #132154
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduce a bulk scorer for 7-bit quantize vectors in order to leverage the fast implementations existing in Elasticsearch. The idea os to replace the currently use Int4VectorScorer in DiskBBQ with this one as it is much faster and has higher quality. Performance benchmarks shows the performance difference.
for Int4VectorScorer:
for the new Int7VectorScorer:
As we see, it is almost 2 times faster. There are two caveats for this implementation:
1.- Int7Scorer requires two more bytes per vector than Int4Scorer. This is because the component sum needs to be stores as an integer instead of a short.
2.- For java 21 and preferred bit size of 128, Int7Scorer is slower than Int4Scorer. I think this is an odd combination and clearly not recommended so we shouldn't be optimizing for it.