Make knn quantized tests more reliable #121551

carlosdelest · 2025-02-03T18:25:14Z

Follow up to #121110

This adds the same pattern for fixing knn quantized indices issues, mostly found during cross cluster search related errors:

Force merge to 1 segment
Refresh indices
Force merge to 1 segment

This tries to avoid having a single vector in a shard, which doesn't work well with quantization.

This can be removed when #109416 is resolved.

This is needed to close #120441

elasticsearchmachine · 2025-02-04T07:51:12Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

benwtrent · 2025-02-04T15:04:18Z

...spec/src/yamlRestTest/resources/rest-api-spec/test/search.vectors/41_knn_search_bbq_hnsw.yml

      indices.flush:
        index: bbq_hnsw

+  # For added test reliability, pending the resolution of https://github.com/elastic/elasticsearch/issues/109416.


we don't need to force merge twice. Just once.

That's good - I was keeping the pattern in #121110 . I didn't see the need myself, but given that this was something already reviewed I thought there might be a reason for it.

I will change this PR and the above change with:

- do: indices.refresh: {} - do: indices.forcemerge: max_num_segments: 1 index: int4_flat

Disagree. I have 0% understanding of how and why this is working, but I know from many experiments that in practice it makes a difference. I have spent many hours and 10000 runs of individual tests to confirm that doing two merges reduces the probability for failures.

Carlos, have you done local testing of the failure rate?

@svilen-mihaylov-elastic , I have not been able to reproduce this locally. Is this something you were able to do?

Do you have a theory about why it is necessary to force merge twice? In my mind, doing a refresh to make the docs visible and then doing a force merge to ensure quantization does not act funny should be enough. I can't think of why an additional force merge should be necessary.

@benwtrent , as the solution with two force merges was already merged in #121110, we may add the solution as is to avoid the test flakiness and unmuting, and we can refine afterwards to remove unnecessary operations. WDYT?

@svilen-mihaylov-elastic , I have not been able to reproduce this locally. Is this something you were able to do?

Yes, there is a way to reproduce it locally by repeating the tests many times over (thousands).

Do you have a theory about why it is necessary to force merge twice? In my mind, doing a refresh to make the docs visible and then doing a force merge to ensure quantization does not act funny should be enough. I can't think of why an additional force merge should be necessary.

As I said, no, I do not have a theory as to why refreshing two times lowers the probability of the test failing.

My suggestion here is to leave the double merging, and indicate (as I have done in the other pr) that this code is to be removed pending the resolution of #109416 according to Ben.

Yes, there is a way to reproduce it locally by repeating the tests many times over (thousands).

I've done the same using -Dtests.iters. A couple of thousand tests didn't throw a single error.

My suggestion here is to leave the double merging, and indicate (as I have done in the other pr) that this code is to be removed pending the resolution of #109416 according to Ben.

I'm fine with this as it's a temporary solution to unmute tests. Is there anything else that needs to be done on this PR to be approved?

Yes, there is a way to reproduce it locally by repeating the tests many times over (thousands).

I've done the same using -Dtests.iters. A couple of thousand tests didn't throw a single error.

Awesome! For my reference would you mind sharing the command line you are using? Last time I tried I had problems using Dtests.iters with yml tests.

My suggestion here is to leave the double merging, and indicate (as I have done in the other pr) that this code is to be removed pending the resolution of #109416 according to Ben.

I'm fine with this as it's a temporary solution to unmute tests. Is there anything else that needs to be done on this PR to be approved?

Not from my side.

@svilen-mihaylov-elastic this is the command I used:

./gradlew ":qa:ccs-common-rest:yamlRestTest" --tests "org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT" -Dtests.method="test {p0=search.vectors/41_knn_search_bbq_hnsw/Vector rescoring has same scoring as exact search for kNN section*}" -Druntime.java=23 -Dtests.iters=1000 -Dtests.timeoutSuite=3600000!
Check the * at the end of the tests.method param, as this is what is needed so tests.iters does not skip the created tests.

I'm fine with this as it's a temporary solution to unmute tests. Is there anything else that needs to be done on this PR to be approved?

Not from my side.

Nice, please approve this PR then. I'll wait for @benwtrent concerns for today before merging.

carlosdelest · 2025-02-05T06:59:47Z

@benwtrent this might potentially help with #121475, #121395, #121350 and #115475

benwtrent · 2025-02-05T14:13:24Z

@carlosdelest I doubt it. Those are failing due to filter restriction not handling shard distribution, which is the key problem.

For all the int4 similarity flat failing things, I have addressed more fully in my pr

benwtrent · 2025-02-05T14:21:46Z

For the bbq rescore failure, the real solution is to ensure that rescoring actually rescores all the vectors, then whatever the quantization error, it doesn't matter. @carlosdelest

carlosdelest · 2025-02-05T18:26:15Z

For the bbq rescore failure, the real solution is to ensure that rescoring actually rescores all the vectors, then whatever the quantization error, it doesn't matter

@benwtrent I think it is related to this PR, as it seems that not all docs are visible from the search PoV, that's why it seems to me related to the fix in #121110. Rescoring just takes the results from the knn search and performs the rescoring on them - I don't think it can limit the number of results retrieved.

Are you thinking on something else as the possible reason for the bbq rescore failure?

benwtrent · 2025-02-05T18:56:12Z

@carlosdelest I would much prefer your PR only fix a singular test. Have we any other failures other than the ones I have already addressed?

Additionally, have you tested with simply refreshing after the force merge?

carlosdelest · 2025-02-06T12:38:23Z

I'm double checking this and I can't reproduce locally the failure with the 8.x code.

Given that this test has not been muted in other branches, I'm thinking on unmuting the test to check if this is reproducible on 8.x .

I would much prefer your PR only fix a singular test. Have we any other failures other than the ones I have already addressed?

My goal with this PR was to make a common setup for quantized tests, given that #121110 was already merged and seemed a good solution for this failure. But seeing I can't reproduce this makes me think twice about the need for this.

benwtrent · 2025-02-06T12:42:53Z

@carlosdelest I would rather not spread a pattern that we are not sure why it "works". Besides, I have since removed that pattern and fixed the test more directly by adjusting the similarity thresholds.

carlosdelest · 2025-02-06T12:45:13Z

@carlosdelest I would rather not spread a pattern that we are not sure why it "works". Besides, I have since removed that pattern and fixed the test more directly by adjusting the similarity thresholds.

Got it. Given that, and the lack of reproducibility locally or on other branches, I'm closing this PR and unmuting the test. Thanks!

Make knn quantized tests more reliable

035df7d

carlosdelest mentioned this pull request Feb 3, 2025

[CI] CcsCommonYamlTestSuiteIT test {p0=search.vectors/41_knn_search_bbq_hnsw/Vector rescoring has same scoring as exact search for kNN section} failing #120441

Closed

carlosdelest marked this pull request as ready for review February 4, 2025 07:50

carlosdelest requested a review from a team February 4, 2025 07:51

carlosdelest mentioned this pull request Feb 4, 2025

[PoC] ESQL - Add scoring for full text functions disjunctions #121153

Closed

carlosdelest requested review from benwtrent and svilen-mihaylov-elastic February 4, 2025 08:12

benwtrent reviewed Feb 4, 2025

View reviewed changes

svilen-mihaylov-elastic approved these changes Feb 5, 2025

View reviewed changes

carlosdelest closed this Feb 6, 2025

Make knn quantized tests more reliable #121551

Make knn quantized tests more reliable #121551

Uh oh!

Conversation

carlosdelest commented Feb 3, 2025

Uh oh!

elasticsearchmachine commented Feb 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

svilen-mihaylov-elastic Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

svilen-mihaylov-elastic Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlosdelest commented Feb 5, 2025

Uh oh!

benwtrent commented Feb 5, 2025

Uh oh!

benwtrent commented Feb 5, 2025

Uh oh!

carlosdelest commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benwtrent commented Feb 5, 2025

Uh oh!

carlosdelest commented Feb 6, 2025

Uh oh!

benwtrent commented Feb 6, 2025

Uh oh!

carlosdelest commented Feb 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

svilen-mihaylov-elastic Feb 4, 2025 •

edited

Loading

svilen-mihaylov-elastic Feb 4, 2025 •

edited

Loading

carlosdelest commented Feb 5, 2025 •

edited

Loading