Skip to content

Conversation

joshua-adams-1
Copy link
Contributor

This is the fifth part of a series of PRs fixing how the FailedToCommitClusterStateException is used in ElasticSearch. As per #135017, FailedToCommitClusterStateException is defined as:

Thrown when a cluster state publication fails to commit the new cluster state. If publication fails then a new master is elected but the
update might or might not take effect, depending on whether the newly-elected master accepted the published state that failed to
be committed. This exception should only be used when there is <i>ambiguity</i> whether a state update took effect or not.

Currently, FailedToCommitClusterStateException is used as a 'catch-all' exception thrown at multiple places throughout the Coordinator and MasterService during the publication process. Semantically however, it doesn't make sense to throw this exception before the cluster state update is actually sent over the wire, since at this point, we know for certain that the cluster state update failed. FailedToCommitClusterStateException is intended to display ambiguity.

This work is a pre-requisite to #134213.


Changes

Replaces the FailedToCommitClusterStateException inside MasterService .BatchingTaskQueue.submitTask with a NotMasterException. This exception is thrown when the threadpools are shutting down, which occurs when the node itself is shutting down, hence it is no longer the current master. As a note, since there is an assert false, this cannot happen in practice.


Next Steps

The goal of this work is to fix up all erroneously used FailedToCommitClusterStateException.

Done:

Todo:

  • Any final uses of FailedToCommitClusterStateException that need to be updated

Relates to: ES-13061

Replaces the FailedToCommitClusterStateException inside MasterService
.BatchingTaskQueue.submitTask with a NotMasterException.

Relates to: ES-13061
@joshua-adams-1 joshua-adams-1 self-assigned this Oct 7, 2025
@joshua-adams-1 joshua-adams-1 added >refactoring :Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. labels Oct 7, 2025
@joshua-adams-1 joshua-adams-1 marked this pull request as ready for review October 7, 2025 12:18
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI doesn't look very happy about this - please request my review again once that's addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. >refactoring Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants