-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Changes FailedToCommitClusterStateException to NotMasterException #135548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes FailedToCommitClusterStateException to NotMasterException #135548
Conversation
Changes a FailedToCommitClusterStateException incorrectly thrown prior to cluster state update publication to a NotMasterException
…ter-exception-before-publishing
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Hey @justzh , I'm not sure what you mean by this. Is there anything I can do to help? |
…ter-exception-before-publishing
| logger.debug( | ||
| () -> format( | ||
| "node is no longer the master prior to publication of cluster state version [%s]: [%s]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any loss in understanding of the log message moving the summary to the end after the colon, without the 'failing' prefix, for the NotMasterException case?
| void onPublishFailure(FailedToCommitClusterStateException e) { | ||
| void onPublishFailure(Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is only called with FailedToCommitClusterStateException or NotMasterException from above, but is it worth trying to be as specific as possible here instead of Exception? Maybe the common base class ElasticsearchException, and also adding an assertion that it is one of the two expected types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the parameter e is passed in from an onfailure method here, and is of type Exception, so I can't narrow this down.
But, since this method is currently called from within this IF statement:
if (exception instanceof FailedToCommitClusterStateException || exception instanceof NotMasterException) {
....
I know that currently the type must be either FailedToCommitClusterStateException or NotMasterException so it's a good call to add an assertion in case this method gets used anywhere in future, and that doesn't hold
…ter-exception-before-publishing
…on-before-publishing' of https://github.com/joshua-adams-1/elasticsearch into change-failed-to-commit-exception-to-not-master-exception-before-publishing
JeremyDahlgren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
server/src/main/java/org/elasticsearch/cluster/service/MasterService.java
Outdated
Show resolved
Hide resolved
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2
Co-authored-by: David Turner <[email protected]>
…ter-exception-before-publishing
This is the second of a series of PRs fixing how the
FailedToCommitClusterStateExceptionis used in ElasticSearch. As per #135017,FailedToCommitClusterStateExceptionis defined as:Currently,
FailedToCommitClusterStateExceptionis used as a 'catch-all' exception thrown at multiple places throughout theCoordinatorandMasterServiceduring the publication process. Semantically however, it doesn't make sense to throw this exception before the cluster state update is actually sent over the wire, since at this point, we know for certain that the cluster state update failed.FailedToCommitClusterStateExceptionis intended to display ambiguity.This work is a pre-requisite to #134213.
Changes
As explained above, any exception thrown prior to the
publish(...)call means that the cluster state update definitely failed. Hence, with no ambiguity, it should not be aFailedToCommitClusterStateException. In this instance, I changed the code to throw aNotMasterExceptionsince this is semantically correct.Testing
MasterServicetest suite 100 times throughNext Steps
The goal of this work is to fix up all erroneously used
FailedToCommitClusterStateException.Done:
FailedToCommitClusterStateExceptionthrown insideMasterService.Batch.onResponse()when draining the queue after the threadpool has shut down - Change FailedToCommitClusterStateException to NotMasterException #135008Todo:
FailedToCommitClusterStateExceptions withFailedToPublishClusterStateException, a new exception designed to capture any failures before the cluster state update is sent over the wire. This exception will capture guaranteed cluster state update failure, withFailedToCommitClusterStateExceptionused once the update has been sent over the wire, representing ambiguityFailedToCommitClusterStateExceptionexception insideMasterService.BatchingTaskQueue.submitTask, (here) with aFailedToPublishClusterStateException.Relates to ES-13061