-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Changes FailedToCommitClusterStateException to NotMasterException #135548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes FailedToCommitClusterStateException to NotMasterException #135548
Conversation
Changes a FailedToCommitClusterStateException incorrectly thrown prior to cluster state update publication to a NotMasterException
…ter-exception-before-publishing
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Hey @justzh , I'm not sure what you mean by this. Is there anything I can do to help? |
…ter-exception-before-publishing
logger.debug( | ||
() -> format( | ||
"node is no longer the master prior to publication of cluster state version [%s]: [%s]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any loss in understanding of the log message moving the summary to the end after the colon, without the 'failing' prefix, for the NotMasterException
case?
void onPublishFailure(FailedToCommitClusterStateException e) { | ||
void onPublishFailure(Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is only called with FailedToCommitClusterStateException
or NotMasterException
from above, but is it worth trying to be as specific as possible here instead of Exception
? Maybe the common base class ElasticsearchException
, and also adding an assertion that it is one of the two expected types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the parameter e
is passed in from an onfailure
method here, and is of type Exception
, so I can't narrow this down.
But, since this method is currently called from within this IF
statement:
if (exception instanceof FailedToCommitClusterStateException || exception instanceof NotMasterException) {
....
I know that currently the type must be either FailedToCommitClusterStateException
or NotMasterException
so it's a good call to add an assertion in case this method gets used anywhere in future, and that doesn't hold
…ter-exception-before-publishing
…on-before-publishing' of https://github.com/joshua-adams-1/elasticsearch into change-failed-to-commit-exception-to-not-master-exception-before-publishing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
server/src/main/java/org/elasticsearch/cluster/service/MasterService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM2
Co-authored-by: David Turner <[email protected]>
…ter-exception-before-publishing
This is the second of a series of PRs fixing how the
FailedToCommitClusterStateException
is used in ElasticSearch. As per #135017,FailedToCommitClusterStateException
is defined as:Currently,
FailedToCommitClusterStateException
is used as a 'catch-all' exception thrown at multiple places throughout theCoordinator
andMasterService
during the publication process. Semantically however, it doesn't make sense to throw this exception before the cluster state update is actually sent over the wire, since at this point, we know for certain that the cluster state update failed.FailedToCommitClusterStateException
is intended to display ambiguity.This work is a pre-requisite to #134213.
Changes
As explained above, any exception thrown prior to the
publish(...)
call means that the cluster state update definitely failed. Hence, with no ambiguity, it should not be aFailedToCommitClusterStateException
. In this instance, I changed the code to throw aNotMasterException
since this is semantically correct.Testing
MasterService
test suite 100 times throughNext Steps
The goal of this work is to fix up all erroneously used
FailedToCommitClusterStateException
.Done:
FailedToCommitClusterStateException
thrown insideMasterService.Batch.onResponse()
when draining the queue after the threadpool has shut down - Change FailedToCommitClusterStateException to NotMasterException #135008Todo:
FailedToCommitClusterStateException
s withFailedToPublishClusterStateException
, a new exception designed to capture any failures before the cluster state update is sent over the wire. This exception will capture guaranteed cluster state update failure, withFailedToCommitClusterStateException
used once the update has been sent over the wire, representing ambiguityFailedToCommitClusterStateException
exception insideMasterService.BatchingTaskQueue.submitTask
, (here) with aFailedToPublishClusterStateException
.Relates to ES-13061