Skip to content

Conversation

@valeriy42
Copy link
Contributor

Backport

This will backport the following commits from main to 8.18:

Questions ?

Please refer to the Backport tool documentation

… node is temoporarily unavailable (elastic#129391)

During cluster upgrade, the anomaly detection jobs must be reassigned from one ML node to another. During this reassignment, the jobs transition through several states, including "opening" and "opened". If, during this transition, the master node becomes temporarily unavailable, e.g., due to reassignment, the new job state is not successfully committed to the cluster state. Therefore, once the new master became available, the cluster state was inconsistent: some anomaly detection jobs were opened, but their state got stuck as "opening".

This PR introduces a retryable action for updating the job state to ensure that the job state is successfully updated and the cluster state remains consistent during the upgrade.

Fixes elastic#126148

(cherry picked from commit d487eb5)
@valeriy42 valeriy42 added :ml Machine learning Team:ML Meta label for the ML team auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Jun 16, 2025
@elasticsearchmachine elasticsearchmachine merged commit 59903c3 into elastic:8.18 Jun 16, 2025
16 checks passed
@valeriy42 valeriy42 deleted the backport/8.18/pr-129391 branch June 16, 2025 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport :ml Machine learning Team:ML Meta label for the ML team v8.18.3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants