Refactor bulk quantization writing into a unified class #130354

benwtrent · 2025-06-30T19:09:23Z

this is a small refactor, laying ground work for more generalized bulk writing.

I did some benchmarking and there was no significant performance difference (as expected).

john-wagster

lgtm

benwtrent · 2025-07-01T19:21:56Z

server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsWriter.java

-            writeQuantizedValue(postingsOutput, binaryValue, correction);
-            binarizedByteVectorValues.getCorrectiveTerms(ord);
-            postingsOutput.writeBytes(binaryValue, 0, binaryValue.length);
-            postingsOutput.writeInt(Float.floatToIntBits(correction.lowerInterval()));
-            postingsOutput.writeInt(Float.floatToIntBits(correction.upperInterval()));
-            postingsOutput.writeInt(Float.floatToIntBits(correction.additionalCorrection()));
-            assert correction.quantizedComponentSum() >= 0 && correction.quantizedComponentSum() <= 0xffff;
-            postingsOutput.writeShort((short) correction.quantizedComponentSum());


We were actually writing all vectors TWICE in the tail of postings. This (just a tad) negatively impacted recall. This is also why the skew index test started failing as the testing value there apparently relied on this :/

elasticsearchmachine · 2025-07-01T19:23:37Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

this is a small refactor, laying ground work for more generalized bulk writing. I did some benchmarking and there was no significant performance difference (as expected).

Refactor bulk quantization writing into a unified class

7694446

benwtrent requested review from iverase and john-wagster June 30, 2025 19:09

elasticsearchmachine added v9.2.0 needs:triage Requires assignment of a team area label labels Jun 30, 2025

iverase approved these changes Jul 1, 2025

View reviewed changes

john-wagster approved these changes Jul 1, 2025

View reviewed changes

fixing test and writing

4d387b6

benwtrent commented Jul 1, 2025

View reviewed changes

Merge branch 'main' into ivf-bulk-writer-refactor

ba59768

benwtrent added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Search Relevance/Vectors Vector search >non-issue and removed needs:triage Requires assignment of a team area label labels Jul 1, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 1, 2025

elasticsearchmachine merged commit 044f34b into elastic:main Jul 1, 2025
32 checks passed

benwtrent deleted the ivf-bulk-writer-refactor branch July 1, 2025 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor bulk quantization writing into a unified class #130354

Refactor bulk quantization writing into a unified class #130354

Uh oh!

benwtrent commented Jun 30, 2025

Uh oh!

john-wagster left a comment

Uh oh!

benwtrent Jul 1, 2025

Uh oh!

elasticsearchmachine commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor bulk quantization writing into a unified class #130354

Refactor bulk quantization writing into a unified class #130354

Uh oh!

Conversation

benwtrent commented Jun 30, 2025

Uh oh!

john-wagster left a comment

Choose a reason for hiding this comment

Uh oh!

benwtrent Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants