Skip to content

Conversation

joshua-adams-1
Copy link
Contributor

@joshua-adams-1 joshua-adams-1 commented Sep 30, 2025

This is the third part of a series of PRs fixing how the FailedToCommitClusterStateException is used in ElasticSearch. As per #135017, FailedToCommitClusterStateException is defined as:

Thrown when a cluster state publication fails to commit the new cluster state. If publication fails then a new master is elected but the
update might or might not take effect, depending on whether the newly-elected master accepted the published state that failed to
be committed. This exception should only be used when there is <i>ambiguity</i> whether a state update took effect or not.

Currently, FailedToCommitClusterStateException is used as a 'catch-all' exception thrown at multiple places throughout the Coordinator and MasterService during the publication process. Semantically however, it doesn't make sense to throw this exception before the cluster state update is actually sent over the wire, since at this point, we know for certain that the cluster state update failed. FailedToCommitClusterStateException is intended to display ambiguity.

This work is a pre-requisite to #134213.


Changes

I replace three FailedToCommitClusterStateExceptions thrown prior to publishing the cluster state with NotMasterExceptions as per this conversation with David Turner.


Next Steps

The goal of this work is to fix up all erroneously used FailedToCommitClusterStateException.

Done:

Todo:

  • Replace the FailedToCommitClusterStateException exception inside MasterService.BatchingTaskQueue.submitTask, (here) with a NotMasterException.

Relates to: ES-13061

Changes a FailedToCommitClusterStateException incorrectly thrown prior
to cluster state update publication to a NotMasterException
- Introduces a FailedToPublishClusterStateException.
- Changes three FailedToCommitClusterStateExceptions thrown prior to the
 cluster state update publication to
 FailedToPublishClusterStateExceptions.

 Relates to: ES-13061
@joshua-adams-1 joshua-adams-1 self-assigned this Sep 30, 2025
@joshua-adams-1 joshua-adams-1 added >refactoring :Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. labels Sep 30, 2025
Copy link
Contributor Author

@joshua-adams-1 joshua-adams-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting which changes have been included since I built this PR on top of #135548

super(in);
}

public NotMasterException(String msg, Object... args) {
Copy link
Contributor Author

@joshua-adams-1 joshua-adams-1 Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as part of #135548 and will disappear once I rebase

)
);
throw new FailedToCommitClusterStateException(
throw new NotMasterException(
Copy link
Contributor Author

@joshua-adams-1 joshua-adams-1 Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as part of #135548 and will disappear when I rebase

Comment on lines 1018 to 1022
if (e instanceof FailedToCommitClusterStateException) {
failure = new FailedToCommitClusterStateException(e.getMessage(), e);
} else {
failure = new NotMasterException(e.getMessage(), e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also be handling FailedToPublishClusterStateException here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Good catch - I realised I'm missing code here, and in a few other places too

@joshua-adams-1 joshua-adams-1 changed the title Replace pre publication failed to commit cluster state exceptions [WIP] Replace pre publication failed to commit cluster state exceptions Oct 2, 2025
@joshua-adams-1
Copy link
Contributor Author

As a note to anyone wanting to review, I need to push a second revision, adding the FailedToPublishClusterStateException to a few more files, and also extending a number of test suites to randomly generate this error during tests to increase the coverage

@joshua-adams-1 joshua-adams-1 changed the title [WIP] Replace pre publication failed to commit cluster state exceptions Replace pre publication failed to commit cluster state exceptions Oct 2, 2025
@joshua-adams-1 joshua-adams-1 marked this pull request as ready for review October 2, 2025 13:09
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

* <p>
* This is a retryable exception inside {@link TransportMasterNodeAction}
*/
public class FailedToPublishClusterStateException extends ElasticsearchException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meaningfully different from NotMasterException? I know the name isn't ideal, but introducing a new ElasticsearchException subclass carries substantial costs too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by costs? I proposed adding a new exception to make the code easier to understand, especially since NotMasterException implies the error occurs because the node is no longer the master which isn't true in these cases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to think about BwC concerns - what happens if you're in a mixed-version cluster and this exception gets thrown to an older node which doesn't know that it's retryable?

Can you point to a case where NotMasterException doesn't imply that the node has (or will very shortly have) stopped being the master?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to think about BwC concerns - what happens if you're in a mixed-version cluster and this exception gets thrown to an older node which doesn't know that it's retryable?

Would something like this protect against mixed version clusters?

if (getVersion().onOrAfter(TransportVersion.THE_VERSION_I_ADDED_ABOVE)) {
    throw new FailedToPublishClusterStateException();
} else {
    throw new FailedToCommitClusterStateException()
}

Can you point to a case where NotMasterException doesn't imply that the node has (or will very shortly have) stopped being the master?

My proposed solution changed the three FailedToCommitClusterStateExceptions below into FailedToPublishClusterStateExceptions:

@Override
    public void publish(
        ClusterStatePublicationEvent clusterStatePublicationEvent,
        ActionListener<Void> publishListener,
        AckListener ackListener
    ) {
        try {
            synchronized (mutex) {
                if (mode != Mode.LEADER || getCurrentTerm() != clusterStatePublicationEvent.getNewState().term()) {
                    logger.debug(
                        () -> format(
                            "[%s] failed publication as node is no longer master for term %s",
                            clusterStatePublicationEvent.getSummary(),
                            clusterStatePublicationEvent.getNewState().term()
                        )
                    );

                    // === Changed in #135548 === //
                    throw new NotMasterException(
                        "node is no longer master for term "
                            + clusterStatePublicationEvent.getNewState().term()
                            + " while handling publication"
                    );
                }

                if (currentPublication.isPresent()) {
                    assert false : "[" + currentPublication.get() + "] in progress, cannot start new publication";
                    logger.error(
                        () -> format(
                            "[%s] failed publication as already publication in progress",
                            clusterStatePublicationEvent.getSummary()
                        )
                    );

                    // === Exception 1 === //
                    throw new FailedToCommitClusterStateException("publication " + currentPublication.get() + " already in progress");
                }

                assert assertPreviousStateConsistency(clusterStatePublicationEvent);

                final ClusterState clusterState;
                final long publicationContextConstructionStartMillis;
                final PublicationTransportHandler.PublicationContext publicationContext;
                final PublishRequest publishRequest;

                try {
                    clusterState = clusterStatePublicationEvent.getNewState();
                    assert getLocalNode().equals(clusterState.getNodes().get(getLocalNode().getId()))
                        : getLocalNode() + " should be in published " + clusterState;
                    publicationContextConstructionStartMillis = transportService.getThreadPool().rawRelativeTimeInMillis();
                    publicationContext = publicationHandler.newPublicationContext(clusterStatePublicationEvent);
                } catch (Exception e) {
                    logger.debug(() -> "[" + clusterStatePublicationEvent.getSummary() + "] publishing failed during context creation", e);
                    becomeCandidate("publication context creation");

                    // === Exception 2 === //
                    throw new FailedToCommitClusterStateException("publishing failed during context creation", e);
                }

                try (Releasable ignored = publicationContext::decRef) {
                    try {
                        clusterStatePublicationEvent.setPublicationContextConstructionElapsedMillis(
                            transportService.getThreadPool().rawRelativeTimeInMillis() - publicationContextConstructionStartMillis
                        );
                        publishRequest = coordinationState.get().handleClientValue(clusterState);
                    } catch (Exception e) {
                        logger.warn(
                            "failed to start publication of state version ["
                                + clusterState.version()
                                + "] in term ["
                                + clusterState.term()
                                + "] for ["
                                + clusterStatePublicationEvent.getSummary()
                                + "]",
                            e
                        );
                        becomeCandidate("publication creation");

                        // === Exception 3 === //
                        throw new FailedToCommitClusterStateException("publishing failed while starting", e);
                    }

                    ....
  1. If a publication is already in progress, AFAIU this implies the current node is not the master, because only master nodes can initiate cluster state updates. But what if this is on the new master running at the same time as the old? Can this occur? Because in this case, a NotMasterException would not make sense.
  2. I'm not sure this implies the node will not be the master anymore, since I followed the code through and we can throw an ElasticsearchException here if serialization fails, and that's independent of a node being master.
  3. AFAICT this can be safely converted to a NotMasterException. Digging into the .handleClientValue(clusterState) function I see code throwing exceptions such as:
throw new CoordinationStateRejectedException("election not won");
throw new CoordinationStateRejectedException("cannot start publishing next value before accepting previous one");
throw new CoordinationStateRejectedException(
        "incoming term " + clusterState.term() + " does not match current term " + getCurrentTerm()
);
...

which all imply the current node is not the master anymore since there are term mismatches, and so a NotMasterException is correct

@DaveCTurner
Copy link
Contributor

Would something like this protect against mixed version clusters?

What do you mean by getVersion()? But no not really, we'd still be in a position where we're misusing FailedToCommitClusterStateException when really we mean NotMasterException.

changed the three FailedToCommitClusterStateExceptions below into FailedToPublishClusterStateExceptions:

Case 1 has an assert false - this can't happen in practice. Cases 2 and 3 have a becomeCandidate() call so the node is indeed not the master any more at this point. I think we should use NotMasterException in these cases.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (just comment nits but no need for another round)

*
* This exception is retryable within {@link TransportMasterNodeAction}.
*
* Exception indicating a cluster state update was published but not committed to all nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptual nit: "committed" is a global property rather than something that happens on one or more nodes.

Suggested change
* Exception indicating a cluster state update was published but not committed to all nodes.
* Exception indicating a cluster state update was published and may or may not have been committed.

This exception indicates the publishing master doesn't think the update was committed, but it cannot tell for sure. It depends on which other master nodes accepted it and the winner of the next election.

* Exception indicating a cluster state update was published but not committed to all nodes.
* <p>
* If this exception is thrown, then the cluster state update was published, but is not guaranteed
* to be committed on any nodes, including the next master node. This exception should only be thrown when there is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* to be committed on any nodes, including the next master node. This exception should only be thrown when there is
* to be committed, including the next master node. This exception should only be thrown when there is

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah also could you adjust the top-level comment in the PR to look more like a commit message? I generally recommend always having this reflect the eventual commit message, and if you need to give more context to reviewers etc then do so in a reply. Otherwise you'll forget to adjust it on merge sometimes and it'll make a mess of the git log.

Also maybe (if you think it's not obvious) should we comment on why NotMasterException is appropriate in these cases because of the preceding becomeCandidate()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. >refactoring Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants