Skip to content

Conversation

@benwtrent
Copy link
Member

@benwtrent benwtrent commented Jan 30, 2026

Adds new ES formats that build on the Lucene formats.

This adds scorers & scorer suppliers.

@benwtrent benwtrent requested a review from thecoop January 30, 2026 17:12
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jan 30, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@benwtrent
Copy link
Member Author

Here is the difference in recall/etc.

this PR (1, 2, 4, 7) bits, all using the new format

ndex_name                      index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count      QPS  recall  visited  filter_selectivity  filter_cached  oversampling_factor  num_candidates  early_termination
------------------------------  ----------  -------------------  -----------  ----------------  -------------  -------  ------  -------  ------------------  -------------  -------------------  --------------  -----------------
cohere-wikipedia-docs-768d.vec        hnsw                0.000         0.43              0.00           0.00  2325.58    0.67  4103.22                1.00           true                 0.00             250              false
cohere-wikipedia-docs-768d.vec        hnsw                0.000         0.59              0.00           0.00  1694.92    0.79  3889.48                1.00           true                 0.00             250              false
cohere-wikipedia-docs-768d.vec        hnsw                0.000         0.68              0.00           0.00  1470.59    0.90  3796.08                1.00           true                 0.00             250              false
cohere-wikipedia-docs-768d.vec        hnsw                0.000         1.08              0.00           0.00   925.93    0.94  3812.40                1.00           true                 0.00             250              false

baseline: (1, 4, 7)

index_name                      index_type  visit_percentage(%)  latency(ms)  net_cpu_time(ms)  avg_cpu_count      QPS  recall  visited  filter_selectivity  filter_cached  oversampling_factor  num_candidates  early_termination
------------------------------  ----------  -------------------  -----------  ----------------  -------------  -------  ------  -------  ------------------  -------------  -------------------  --------------  -----------------
cohere-wikipedia-docs-768d.vec        hnsw                0.000         0.41              0.00           0.00  2439.02    0.67  4085.27                1.00           true                 0.00             250              false
cohere-wikipedia-docs-768d.vec        hnsw                0.000         1.69              0.00           0.00   591.72    0.55  4418.59                1.00           true                 0.00             250              false
cohere-wikipedia-docs-768d.vec        hnsw                0.000         0.42              0.00           0.00  2380.95    0.92  3787.26                1.00           true                 0.00             250              false

Obviously, 2, 4 are way better. Single bit might be a little slower. But int7 is significantly slower due to lack of native code support.

Recall is better across the board.

@benwtrent benwtrent changed the title Adds initial pass of the new scalar formats from lucene Adds new formats that use the new scalar formats from lucene Feb 2, 2026
@benwtrent benwtrent requested a review from thecoop February 3, 2026 19:34
@thecoop
Copy link
Member

thecoop commented Feb 4, 2026

We need some tests on the scorer, that the native and lucene implementations produce the same result - see Int7SQVectorScorerFactoryTests

@benwtrent benwtrent requested a review from a team as a code owner February 4, 2026 19:32
@benwtrent
Copy link
Member Author

@thecoop I am gonna close this and rebase & reopen against the new lucene_10_4 branch

adding tests

[CI] Auto commit changes from spotless

adding exposure via module

iter

fixing things

[CI] Auto commit changes from spotless

iter

iter

iter

[CI] Auto commit changes from spotless

Adding random vector scorer code

iter

iter

fixing scorer supplier

iter

adding more tests
@benwtrent benwtrent force-pushed the add-new-scalar-formats branch from 53d421b to 0d5b790 Compare February 5, 2026 21:31
@benwtrent benwtrent requested review from a team as code owners February 5, 2026 21:31
@benwtrent benwtrent changed the base branch from lucene_snapshot to lucene_snapshot_10_4 February 5, 2026 21:31
@benwtrent benwtrent requested a review from thecoop February 5, 2026 21:32
@benwtrent
Copy link
Member Author

@thecoop sorry for the force push, but rebased on 10_4 and now merging there.

@tvernum tvernum removed the request for review from a team February 6, 2026 07:21
}

@Override
float applyCorrections(float rawScore, int ord) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new name, way better than some variant of score.
Can we have a follow up PR that renames all the others (e.g. in BBQ/DiskBBQ)?
I'm also leaning towards using the same names on native functions. Wdyt? CC @thecoop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like how you separated corrections into the different distance scorers, like we did in native code.

Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just gave a quick look over; looks good, just a couple of minor comments/questions

public RandomVectorScorerSupplier getRandomVectorScorerSupplier(VectorSimilarityFunction sim, KnnVectorValues values)
throws IOException {
if (values instanceof QuantizedByteVectorValues quantizedValues && quantizedValues.getSlice() != null) {
// TODO: optimize int4, 2, and single bit quantization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting confused with all the formats :)
Maybe we can sync a bit on these?
Are these "striped" (like BBQ/DiskBBQ) or packed? (e.g. 2 Int4 in a byte?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are scalar quantized, so packed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 bits is striped, int4 are packed.

I am not convinced that "striped" is the best option for int4 * int4 operations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lucene just packs with int4 & int4. Stripes int4 * int1 and double stripes int4 * int2 :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that "striped" is the best option for int4 * int4 operations.

++
I have (SIMD) implementations for both, I just need some time to test and benchmark them.
My gut feeling is that for int4 packed/normal mul (or madd) is going to be faster.
Give me some time and I'll come back with numbers :)

@benwtrent benwtrent requested review from ldematte and thecoop February 6, 2026 13:03
Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. but I think we need a few more benchmarks

@benwtrent benwtrent merged commit 632a640 into elastic:lucene_snapshot_10_4 Feb 10, 2026
32 of 36 checks passed
@benwtrent benwtrent deleted the add-new-scalar-formats branch February 10, 2026 12:41
float y1 = quantizedComponentSum;
float score = ax * ay * values.dimension() + ay * lx * x1 + ax * ly * y1 + lx * ly * rawScore;
score += additionalCorrection + correctiveTerms.additionalCorrection() - values.getCentroidDP();
score = Math.clamp(score, -1, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can reuse the native code implementations here? Or we can expose a new one, but share the same "kernel"? Besides this clamp, I do not see other differences.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely could reuse native here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants