Skip to content

Conversation

@lukewhiting
Copy link
Contributor

@lukewhiting lukewhiting commented Mar 4, 2025

This PR addresses a failing test initially flagged by #109679 by reverting the initial workaround created by #111435. It wraps the request in a busy/wait to deal with the stats temporarily being unavailable during a move of the watcher shards between nodes and the subsequent restarting of the watcher stats service.

Looking into how that's stats are generated, it's not so much that they reset on move but that they just take a short amount of time to repopulate as watches are loaded in again by the newly started instance of the watcher service on the node the shards have been moved to. Wrapping this request in a wait loop allows the tests to cope with this move.

I let the change soak by running the affected tests on a loop for 2 hours on my laptop and no failures occurred during that time.

Fixes ES-9782

@lukewhiting lukewhiting added >test-failure Triaged test failures from CI :Data Management/Watcher auto-backport Automatically create backport pull requests when merged v9.0.0 v8.19.0 v9.1.0 labels Mar 4, 2025
@lukewhiting lukewhiting requested review from Copilot and masseyke March 4, 2025 10:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR reinstates a watch count check in integration tests by reintroducing a busy/wait pattern to handle temporary unavailability of watcher stats during shard moves and watcher service restarts.

  • Reintroduces assertBusy checks to ensure the watch count is 1L before further test execution.
  • Updates three test methods in WatchAckTests.java to use WatcherStatsRequestBuilder for polling watcher stats.

Reviewed Changes

File Description
x-pack/plugin/watcher/src/internalClusterTest/java/org/elasticsearch/xpack/watcher/test/integration/WatchAckTests.java Adds busy/wait assertions using WatcherStatsRequestBuilder to ensure proper watch count

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Mar 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@lukewhiting lukewhiting added >test Issues or PRs that are addressing/adding tests and removed >test-failure Triaged test failures from CI needs:risk Requires assignment of a risk label (low, medium, blocker) labels Mar 4, 2025
@lukewhiting lukewhiting force-pushed the es-9782-watcher-stats-reset branch from d86373c to 6b6a869 Compare March 4, 2025 10:57
Copy link
Member

@masseyke masseyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can always revert again if it causes problems.

@lukewhiting lukewhiting merged commit c702eb9 into elastic:main Mar 6, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.0
8.x

lukewhiting added a commit to lukewhiting/elasticsearch that referenced this pull request Mar 6, 2025
@lukewhiting lukewhiting deleted the es-9782-watcher-stats-reset branch March 6, 2025 10:08
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Mar 11, 2025
costin pushed a commit to costin/elasticsearch that referenced this pull request Mar 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Data Management/Watcher Team:Data Management Meta label for data/management team >test Issues or PRs that are addressing/adding tests v8.19.0 v9.0.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants