Skip to content

Conversation

@andreidan
Copy link
Contributor

This attempts to fix a flay test where the term_freq returned by the multiple terms vectors API was null.
I was not able to reproduce this test but this proposes a fix based on the following running theory:

  • an Elasticsearch cluster comprised of at least 2 nodes
  • we create a couple of indices with 1 primary and 1 replica
  • we index a document that was acknowledged only by the primary (because wait_for_active_shards defaults to 1)
  • the test executes the multiple terms vectors API and it hits the node hosting the replica shard, which hasn't yet received the document we ingested in the primary shard.

This race condition between the document replication and the test running the terms vectors API on the replica shard could yield a null value for the the term's term_freq (as the replica shard contains 0 documents).

This PR proposes we change the wait_for_active_shards value to all so each write is acknowledged by all replicas before the client receives the response.

Fixes #113325

This attempts to fix a flay test where the term_freq returned
by the multiple terms vectors API was `null`.
I was not able to reproduce this test but this proposes a fix
based on the following running theory:
- an Elasticsearch cluster comprised of at least 2 nodes
- we create a couple of indices with 1 primary and 1 replica
- we index a document that was acknowledged only by the primary
(because `wait_for_active_shards` defaults to `1`)
- the test executes the multiple terms vectors API and it hits the
node hosting the replica shard, which hasn't yet received the
document we ingested in the primary shard.

This race condition between the document replication and the test
running the terms vectors API on the replica shard could yield
a `null` value for the the term's `term_freq` (as the replica shard
contains 0 documents).

This PR proposes we change the `wait_for_active_shards` value to
`all` so each write is acknowledged by all replicas before the client
receives the response.
@andreidan andreidan added >test-failure Triaged test failures from CI :Search Foundations/Search Catch all for Search Foundations v9.0.0 v8.19.0 v9.1.0 labels Jan 31, 2025
@elasticsearchmachine elasticsearchmachine added needs:risk Requires assignment of a risk label (low, medium, blocker) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch labels Jan 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Copy link
Member

@piergm piergm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theory make sense to me! More so if it's not always failing but a flaky test 😄 Nice one Andrei! Let's hope the theory is correct 🤞

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan andreidan added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Feb 3, 2025
@elasticsearchmachine elasticsearchmachine added the needs:risk Requires assignment of a risk label (low, medium, blocker) label Feb 3, 2025
@andreidan andreidan added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Feb 3, 2025
@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@elasticsearchmachine elasticsearchmachine merged commit 44e5104 into elastic:main Feb 3, 2025
17 checks passed
@andreidan andreidan deleted the fix-test-mtermvectors branch February 3, 2025 18:57
@andreidan andreidan added the auto-backport Automatically create backport pull requests when merged label Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) low-risk An open issue or test failure that is a low risk to future releases :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch >test-failure Triaged test failures from CI v8.19.0 v9.0.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] MixedClusterClientYamlTestSuiteIT test {p0=mtermvectors/10_basic/Tests catching other exceptions per item} failing

4 participants