Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,13 @@ setup:
indices.flush:
index: bbq_hnsw

# For added test reliability, pending the resolution of https://github.com/elastic/elasticsearch/issues/109416.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to force merge twice. Just once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good - I was keeping the pattern in #121110 . I didn't see the need myself, but given that this was something already reviewed I thought there might be a reason for it.

I will change this PR and the above change with:

  - do:
      indices.refresh: {}
  - do:
      indices.forcemerge:
        max_num_segments: 1
        index: int4_flat

Copy link
Contributor

@svilen-mihaylov-elastic svilen-mihaylov-elastic Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree. I have 0% understanding of how and why this is working, but I know from many experiments that in practice it makes a difference. I have spent many hours and 10000 runs of individual tests to confirm that doing two merges reduces the probability for failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carlos, have you done local testing of the failure rate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svilen-mihaylov-elastic , I have not been able to reproduce this locally. Is this something you were able to do?

Do you have a theory about why it is necessary to force merge twice? In my mind, doing a refresh to make the docs visible and then doing a force merge to ensure quantization does not act funny should be enough. I can't think of why an additional force merge should be necessary.

@benwtrent , as the solution with two force merges was already merged in #121110, we may add the solution as is to avoid the test flakiness and unmuting, and we can refine afterwards to remove unnecessary operations. WDYT?

Copy link
Contributor

@svilen-mihaylov-elastic svilen-mihaylov-elastic Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svilen-mihaylov-elastic , I have not been able to reproduce this locally. Is this something you were able to do?

Yes, there is a way to reproduce it locally by repeating the tests many times over (thousands).

Do you have a theory about why it is necessary to force merge twice? In my mind, doing a refresh to make the docs visible and then doing a force merge to ensure quantization does not act funny should be enough. I can't think of why an additional force merge should be necessary.

As I said, no, I do not have a theory as to why refreshing two times lowers the probability of the test failing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion here is to leave the double merging, and indicate (as I have done in the other pr) that this code is to be removed pending the resolution of #109416 according to Ben.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a way to reproduce it locally by repeating the tests many times over (thousands).

I've done the same using -Dtests.iters. A couple of thousand tests didn't throw a single error.

My suggestion here is to leave the double merging, and indicate (as I have done in the other pr) that this code is to be removed pending the resolution of #109416 according to Ben.

I'm fine with this as it's a temporary solution to unmute tests. Is there anything else that needs to be done on this PR to be approved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a way to reproduce it locally by repeating the tests many times over (thousands).

I've done the same using -Dtests.iters. A couple of thousand tests didn't throw a single error.

Awesome! For my reference would you mind sharing the command line you are using? Last time I tried I had problems using Dtests.iters with yml tests.

My suggestion here is to leave the double merging, and indicate (as I have done in the other pr) that this code is to be removed pending the resolution of #109416 according to Ben.

I'm fine with this as it's a temporary solution to unmute tests. Is there anything else that needs to be done on this PR to be approved?

Not from my side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svilen-mihaylov-elastic this is the command I used:

./gradlew ":qa:ccs-common-rest:yamlRestTest" --tests "org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT" -Dtests.method="test {p0=search.vectors/41_knn_search_bbq_hnsw/Vector rescoring has same scoring as exact search for kNN section*}" -Druntime.java=23 -Dtests.iters=1000 -Dtests.timeoutSuite=3600000!
Check the * at the end of the tests.method param, as this is what is needed so tests.iters does not skip the created tests.

I'm fine with this as it's a temporary solution to unmute tests. Is there anything else that needs to be done on this PR to be approved?

Not from my side.

Nice, please approve this PR then. I'll wait for @benwtrent concerns for today before merging.

- do:
indices.forcemerge:
index: bbq_hnsw
max_num_segments: 1
- do:
indices.refresh: {}
- do:
indices.forcemerge:
index: bbq_hnsw
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,13 @@ setup:
indices.flush:
index: hnsw_byte_quantized

# For added test reliability, pending the resolution of https://github.com/elastic/elasticsearch/issues/109416.
- do:
indices.forcemerge:
index: hnsw_byte_quantized
max_num_segments: 1
- do:
indices.refresh: {}
- do:
indices.forcemerge:
index: hnsw_byte_quantized
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ setup:
vector: [0.5, 111.3, -13.0, 14.8]
another_vector: [-0.5, 11.0, 0, 12]

# For added test reliability, pending the resolution of https://github.com/elastic/elasticsearch/issues/109416.
- do:
indices.forcemerge:
index: hnsw_byte_quantized
Expand All @@ -68,6 +69,11 @@ setup:
- do:
indices.refresh: {}

- do:
indices.forcemerge:
index: hnsw_byte_quantized
max_num_segments: 1

---
"kNN search only":
- do:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,13 @@ setup:
indices.flush:
index: bbq_flat

# For added test reliability, pending the resolution of https://github.com/elastic/elasticsearch/issues/109416.
- do:
indices.forcemerge:
index: bbq_flat
max_num_segments: 1
- do:
indices.refresh: {}
- do:
indices.forcemerge:
index: bbq_flat
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,17 @@ setup:
vector: [0.5, 111.3, -13.0, 14.8, -156.0]
another_vector: [-0.5, 11.0, 0, 12, 111.0]

# For added test reliability, pending the resolution of https://github.com/elastic/elasticsearch/issues/109416.
- do:
indices.forcemerge:
index: int8_flat
max_num_segments: 1
- do:
indices.refresh: {}
- do:
indices.forcemerge:
index: int8_flat
max_num_segments: 1

---
"kNN search only":
Expand Down