Skip to content

Conversation

@carlosdelest
Copy link
Member

Follow up to #121110

This adds the same pattern for fixing knn quantized indices issues, mostly found during cross cluster search related errors:

  • Force merge to 1 segment
  • Refresh indices
  • Force merge to 1 segment

This tries to avoid having a single vector in a shard, which doesn't work well with quantization.

This can be removed when #109416 is resolved.

This is needed to close #120441

@carlosdelest carlosdelest added >test Issues or PRs that are addressing/adding tests auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.16.0 v9.0.0 v8.17.0 v8.18.0 v8.19.0 v9.1.0 labels Feb 3, 2025
@carlosdelest carlosdelest marked this pull request as ready for review February 4, 2025 07:50
@carlosdelest carlosdelest requested a review from a team February 4, 2025 07:51
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

indices.flush:
index: bbq_hnsw

# For added test reliability, pending the resolution of https://github.com/elastic/elasticsearch/issues/109416.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to force merge twice. Just once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good - I was keeping the pattern in #121110 . I didn't see the need myself, but given that this was something already reviewed I thought there might be a reason for it.

I will change this PR and the above change with:

  - do:
      indices.refresh: {}
  - do:
      indices.forcemerge:
        max_num_segments: 1
        index: int4_flat

Copy link
Contributor

@svilen-mihaylov-elastic svilen-mihaylov-elastic Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree. I have 0% understanding of how and why this is working, but I know from many experiments that in practice it makes a difference. I have spent many hours and 10000 runs of individual tests to confirm that doing two merges reduces the probability for failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carlos, have you done local testing of the failure rate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svilen-mihaylov-elastic , I have not been able to reproduce this locally. Is this something you were able to do?

Do you have a theory about why it is necessary to force merge twice? In my mind, doing a refresh to make the docs visible and then doing a force merge to ensure quantization does not act funny should be enough. I can't think of why an additional force merge should be necessary.

@benwtrent , as the solution with two force merges was already merged in #121110, we may add the solution as is to avoid the test flakiness and unmuting, and we can refine afterwards to remove unnecessary operations. WDYT?

Copy link
Contributor

@svilen-mihaylov-elastic svilen-mihaylov-elastic Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svilen-mihaylov-elastic , I have not been able to reproduce this locally. Is this something you were able to do?

Yes, there is a way to reproduce it locally by repeating the tests many times over (thousands).

Do you have a theory about why it is necessary to force merge twice? In my mind, doing a refresh to make the docs visible and then doing a force merge to ensure quantization does not act funny should be enough. I can't think of why an additional force merge should be necessary.

As I said, no, I do not have a theory as to why refreshing two times lowers the probability of the test failing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion here is to leave the double merging, and indicate (as I have done in the other pr) that this code is to be removed pending the resolution of #109416 according to Ben.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a way to reproduce it locally by repeating the tests many times over (thousands).

I've done the same using -Dtests.iters. A couple of thousand tests didn't throw a single error.

My suggestion here is to leave the double merging, and indicate (as I have done in the other pr) that this code is to be removed pending the resolution of #109416 according to Ben.

I'm fine with this as it's a temporary solution to unmute tests. Is there anything else that needs to be done on this PR to be approved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a way to reproduce it locally by repeating the tests many times over (thousands).

I've done the same using -Dtests.iters. A couple of thousand tests didn't throw a single error.

Awesome! For my reference would you mind sharing the command line you are using? Last time I tried I had problems using Dtests.iters with yml tests.

My suggestion here is to leave the double merging, and indicate (as I have done in the other pr) that this code is to be removed pending the resolution of #109416 according to Ben.

I'm fine with this as it's a temporary solution to unmute tests. Is there anything else that needs to be done on this PR to be approved?

Not from my side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svilen-mihaylov-elastic this is the command I used:

./gradlew ":qa:ccs-common-rest:yamlRestTest" --tests "org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT" -Dtests.method="test {p0=search.vectors/41_knn_search_bbq_hnsw/Vector rescoring has same scoring as exact search for kNN section*}" -Druntime.java=23 -Dtests.iters=1000 -Dtests.timeoutSuite=3600000!
Check the * at the end of the tests.method param, as this is what is needed so tests.iters does not skip the created tests.

I'm fine with this as it's a temporary solution to unmute tests. Is there anything else that needs to be done on this PR to be approved?

Not from my side.

Nice, please approve this PR then. I'll wait for @benwtrent concerns for today before merging.

@carlosdelest
Copy link
Member Author

@benwtrent this might potentially help with #121475, #121395, #121350 and #115475

@benwtrent
Copy link
Member

@carlosdelest I doubt it. Those are failing due to filter restriction not handling shard distribution, which is the key problem.

For all the int4 similarity flat failing things, I have addressed more fully in my pr

@benwtrent
Copy link
Member

For the bbq rescore failure, the real solution is to ensure that rescoring actually rescores all the vectors, then whatever the quantization error, it doesn't matter. @carlosdelest

@carlosdelest
Copy link
Member Author

carlosdelest commented Feb 5, 2025

For the bbq rescore failure, the real solution is to ensure that rescoring actually rescores all the vectors, then whatever the quantization error, it doesn't matter

@benwtrent I think it is related to this PR, as it seems that not all docs are visible from the search PoV, that's why it seems to me related to the fix in #121110. Rescoring just takes the results from the knn search and performs the rescoring on them - I don't think it can limit the number of results retrieved.

Are you thinking on something else as the possible reason for the bbq rescore failure?

@benwtrent
Copy link
Member

@carlosdelest I would much prefer your PR only fix a singular test. Have we any other failures other than the ones I have already addressed?

Additionally, have you tested with simply refreshing after the force merge?

@carlosdelest
Copy link
Member Author

I'm double checking this and I can't reproduce locally the failure with the 8.x code.

Given that this test has not been muted in other branches, I'm thinking on unmuting the test to check if this is reproducible on 8.x .

I would much prefer your PR only fix a singular test. Have we any other failures other than the ones I have already addressed?

My goal with this PR was to make a common setup for quantized tests, given that #121110 was already merged and seemed a good solution for this failure. But seeing I can't reproduce this makes me think twice about the need for this.

@benwtrent
Copy link
Member

@carlosdelest I would rather not spread a pattern that we are not sure why it "works". Besides, I have since removed that pattern and fixed the test more directly by adjusting the similarity thresholds.

@carlosdelest
Copy link
Member Author

@carlosdelest I would rather not spread a pattern that we are not sure why it "works". Besides, I have since removed that pattern and fixed the test more directly by adjusting the similarity thresholds.

Got it. Given that, and the lack of reproducibility locally or on other branches, I'm closing this PR and unmuting the test. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >test Issues or PRs that are addressing/adding tests v8.16.0 v8.17.0 v8.18.0 v8.19.0 v9.0.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] CcsCommonYamlTestSuiteIT test {p0=search.vectors/41_knn_search_bbq_hnsw/Vector rescoring has same scoring as exact search for kNN section} failing

4 participants