Master node disconnect #132023

joshua-adams-1 · 2025-07-28T15:55:15Z

Extends the Coordinator so that we don't prematurely close the connection to a joining node. This prevents a node-join: [{}] with reason [{}]; for troubleshooting guidance, see {} WARN log being emitted unnecessarily.

Closes #126192

Jira Ticket - ES-11449

Extends the `Coordinator.handleJoinRequest` `onFailure` method in the callback to only fail if the node is not in already in the `ClusterState`. This avoids a rare corner case where a master node's `ClusterState` already has a node in it that is attempting to join the cluster, and it logs an incorrect error message

Adds two new integration test suites, `NodeJoiningIT` and `NodeJoiningMasterElectionIT`. These included tests related to the coordinator logic when a node joins the cluster. There is also a proposed solution in the Coordinator class, currently commented out. This will be uncommented in a follow up commit

pxsalehi · 2025-08-11T11:51:53Z

server/src/internalClusterTest/java/org/elasticsearch/cluster/coordination/NodeJoiningIT.java

+
+        try {
+            ensureSufficientMasterEligibleNodes();
+            DiscoveryNode masterNode = internalCluster().clusterService().state().nodes().getMasterNode();


Regarding test flakiness, one contributor could be where you get the info that you use in your assertions. Using internalCluster().clusterService() doesn't necessarily give you the most up to date value since it might not fetch the instance from the current master node. This is documented on that method, but want to make sure you have consider this random behaviour that this utility can have. You can fetch the master node instance by providing the master node name or similar to the test in TransportMasterNodeActionIT, which I assume this is partially based on, use a master client.

Having tried that out briefly, I think the test failure is on longer due to the latch you mentioned but rather log expectation assertions. Hope that hleps!

(at least based on 20-something runs that used to get 8-10 failures in the latch check, which now only fails once later at the log expectation assertion)

test that the warn log is not present

DaveCTurner

Unclear if this is ready for review or not - it's still marked as a draft but you've requested reviews.

I note that you're using force-push rather than merge:

Please don't do that once other folks have started to look at a PR unless absolutely unavoidable. It makes it much harder to track the history of changes against the comments.

DaveCTurner · 2025-08-12T07:44:18Z

.../internalClusterTest/java/org/elasticsearch/cluster/coordination/MasterElectionTestCase.java

+     * @param cleanupTasks The list of cleanup tasks
+     * @return A latch that will be released when the old master acknowledges the new master's election
+     */
+    protected CountDownLatch configureElectionLatchForNewMaster(String newMaster, List<Releasable> cleanupTasks) {


I think this'd be much simpler implemented with ClusterServiceUtils.addTemporaryStateListener. Not sure why we didn't do so when first written - I think it must have evolved to this from something more complex.

Good catch, have updated in a668ab1

DaveCTurner · 2025-08-12T07:44:43Z

.../internalClusterTest/java/org/elasticsearch/cluster/coordination/MasterElectionTestCase.java

+     * @param cleanupTasks The list of clean up tasks
+     * @return A cyclic barrier which when awaited on will un-block the applier
+     */
+    protected static CyclicBarrier blockClusterStateApplier(String nodeName, ArrayList<Releasable> cleanupTasks) {


Hmm I don't think you need to pull this one up to the base class, it only has one caller.

Updated in a668ab1

DaveCTurner · 2025-08-12T07:46:23Z

.../internalClusterTest/java/org/elasticsearch/cluster/coordination/MasterElectionTestCase.java

+     * @param cleanupTasks The list of cleanup tasks
+     * @return A latch that will be released when the master acknowledges it's re-election
+     */
+    protected CountDownLatch configureElectionLatchForReElectedMaster(


Do you need a separate method here? In the new tests I think we can just wait for the term to increase, as observed by any node - no need to worry about which node is master or anything so fiddly.

Also this one is only called from one place. I think it's a premature optimization to generalize these two test suites like this when they have so little in common.

Updated in a668ab1

DaveCTurner · 2025-08-12T07:47:15Z

.../internalClusterTest/java/org/elasticsearch/cluster/coordination/MasterElectionTestCase.java

+     * master accepts its own failed state update before standing down, we can still
+     * establish a quorum without its (or our own) join.
+     */
+    protected static String ensureSufficientMasterEligibleNodes() {


The new tests don't need 5 master nodes, and indeed having 5 (rather than 3) makes the situation unnecessarily complex.

Updated in a668ab1

DaveCTurner · 2025-08-12T07:50:24Z

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

+                        1. (T, V+1) is accepted -> NodeConnectionsService now stores an open connection to N. It can be closed.
+                        2. (T, V+1) is rejected -> A new cluster state is published without N in it.
+                           It is right to close the connection and retry.
+                        3. The above scenario occurs. We do not close the connection after (T, V+1) fails and keep it open:
+                            3.1 (T+1, V+2) is accepted -> By waiting, we did not close the connection to N unnecessarily
+                            3.2 (T+1, V+2) is rejected -> A new cluster state is published without N in it. Closing is correct here.


Nice, I think that makes sense and is much simpler than I anticipated.

DaveCTurner · 2025-08-12T07:56:02Z

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

+                        public void clusterChanged(ClusterChangedEvent event) {
+                            // Now it's safe to close the connection
+                            Releasables.close(response);
+                            // Remove this listener to avoid memory leaks
+                            clusterService.removeListener(this);


This doesn't quite work tho, because we apply a cluster state when the master fails too. The state we apply in that case differs from the last committed state only in the values of things which are local properties of the state, i.e. which do not have strong consistency guarantees, such as DiscoveryNodes#masterNodeId. I think we need to keep the connection open until we've applied the next committed state (i.e. one with a non-null master).

I did see the new tests fail in this way when run in a loop, but unfortunately I didn't have things configured to collect a build scan. I'm trying again.

Also we should only insert this listener when the update fails with a FailedToCommitClusterStateException - for all other outcomes there's no need to wait for another state to be committed, indeed we cannot expect another state to be committed any time soon if the cluster is stable.

This makes sense. I've changed this logic to only close the connection when a committed state is seen, but how would I know whether a FailedToCommitClusterStateException is thrown?

rather than using ActionListener#runBefore you'd typically capture the exception with ActionListener#delegateResponse. But in this case we're already capturing the success response with ActionListener#delegateFailure so it's probably best to just create a completely new ActionListener that does the right thing with both success and exceptional responses.

Updated in a668ab1

DaveCTurner · 2025-08-12T07:56:49Z

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java


    // exposed for tests
    boolean missingJoinVoteFrom(DiscoveryNode node) {
+        logger.info("Missing vote from: {}", node.getName());


Stray logging leftover from debugging? I don't think we want an INFO log here, it will raise support cases asking what it means.

Updated in a668ab1

DaveCTurner · 2025-08-12T07:58:03Z

server/src/main/java/org/elasticsearch/cluster/node/DiscoveryNodes.java

+     * @param name of the node which existence should be verified
+     * @return <code>true</code> if the node exists. Otherwise <code>false</code>
+     */
+    public boolean nodeExistsWithName(String name) {


This is only used in tests, I don't think there's a need to have it in the production codebase.

Updated in a668ab1

joshua-adams-1 · 2025-08-12T09:25:51Z

Unclear if this is ready for review or not - it's still marked as a draft but you've requested reviews.

I note that you're using force-push rather than merge:
Please don't do that once other folks have started to look at a PR unless absolutely unavoidable. It makes it much harder to track the history of changes against the comments.

Hey! Sorry, no I wasn't expecting a review yet. I added you as a reviewer, but I didn't realise that action sent it for review, because there's a button at the bottom of my PR saying Ready for Review. I was under the impression clicking that sent it for review. Still learning, sorry! Thank you for the comments, I'll get it cleaned up. And I shall stop using force merges!

DaveCTurner · 2025-08-12T14:29:58Z

server/src/internalClusterTest/java/org/elasticsearch/cluster/coordination/NodeJoiningIT.java

+    private String generateNodeDescriptionForNewDiscoveryNode(int numberOfNodesOriginallyInCluster, DiscoveryNode masterNode) {
+        // Nodes are named `node_s0`, `node_s1` etc ...
+        // Therefore, if there are N nodes in the cluster, named `node_s0` ... `node_sN-1`, N+1 will be named `node_sN`
+        String newNodeName = "node_s" + numberOfNodesOriginallyInCluster;


This isn't correct, sometimes the new node will be of the form node_tN. There may be other possibilities too even today, I'm not sure, and we certainly don't know what'll happen in future. More generally, this is heavily overspecifying the conditions we actually want, and coupling this test to the implementation details of how nodes are named in the cluster. If we changed the format of these messages at all, we'd have to update this code too.

I think we should be asserting that we expect to see no messages at all from NodeJoinExecutor at level WARN. The content isn't really relevant.

Good point, I have updated in cc67155

- Removes MasterElectionTestCase - reverts TransportMasterNodeActionIT - Modifies NodeJoiningIT to use ClusterStateUtils rather than CountDownLatches - Modifies the Coordinator to keep the connection open only when there is a FailedToCommitClusterStateException and until the next committed cluster state update

…asticsearch into master-node-disconnect

joshua-adams-1 · 2025-08-22T11:03:53Z

Thank you David for your help on this! Appreciate there's been a bit of churn

DaveCTurner · 2025-08-22T11:23:03Z

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

-                                Releasables.close(response);
-                                joinListener.onFailure(e);
+                            // Immediate condition check in case another node is elected master
+                            if (clusterService.state().nodes().nodeExists(joinRequest.getSourceNode().getId())) {


We also need to fail the join listener and remove the cluster-state listener if the last-applied state is committed (i.e. has a master node) and doesn't include the joining node.

Does it work to call clusterStateListener.clusterChanged(new ClusterChangedEvent("", clusterService.state(), clusterService.state()))?

Does it work

Hmm no, not quite: that might complete ll twice which is generally something we should try and avoid. It turns out that as things are implemented today ll will be a SubscribableListener which has well-defined semantics when completed multiple times:

elasticsearch/server/src/main/java/org/elasticsearch/action/support/SubscribableListener.java

Lines 44 to 45 in cdc7474

* If this listener is completed more than once then all results other than the first (whether successful or otherwise) are silently

* discarded. All subscribed listeners will be notified of the same result, exactly once, even if several completions occur concurrently.

However we can't rely on that being true in future, it's not guaranteed that ll will always be a SubscribableListener in this context. I'd prefer we explicitly deduplicated this work e.g. by creating another SubscribableListener.

However we can't rely on that being true in future

I contemplated making it not be true by adding a check that these listeners are not completed multiple times:

diff --git a/server/src/main/java/org/elasticsearch/action/support/SubscribableListener.java b/server/src/main/java/org/elasticsearch/action/support/SubscribableListener.java index 9eddbb55b776..b7c8c4a28279 100644 --- a/server/src/main/java/org/elasticsearch/action/support/SubscribableListener.java +++ b/server/src/main/java/org/elasticsearch/action/support/SubscribableListener.java @@ -135,7 +135,7 @@ public class SubscribableListener<T> implements ActionListener<T> { */ public static <T> SubscribableListener<T> newForked(CheckedConsumer<ActionListener<T>, ? extends Exception> fork) { final var listener = new SubscribableListener<T>(); - ActionListener.run(listener, fork::accept); + ActionListener.run(ActionListener.assertOnce(listener), fork::accept); return listener; }

However, on reflection this seems unnecessarily strict and indeed it causes the SubscribableListener test suite to fail because we actually already assert that newForked and andThen receive the returned listener instance, which is therefore safe to complete more than once. I think it's best to document this fact, see #133391, and then we can rely on it here too (so disregard my previous message)

DaveCTurner · 2025-08-28T12:25:41Z

server/src/internalClusterTest/java/org/elasticsearch/cluster/coordination/NodeJoiningIT.java

-            safeAwait(publishingBanRemovedListener);
-            logger.info("Master publishing ban removed");
+            // Assert the master was re-elected
+            assertTrue(masterNodeName.equals(internalCluster().getMasterName()) && originalTerm < getTerm(masterNodeName));


nit: slight preference for this being two assertThat calls: if this fails, we get no feedback about why, whereas if we use assertThat twice then we'll be able to see whether it was the master that changed (and it'll identify the new master) or whether the term didn't increase.

DaveCTurner · 2025-08-28T12:29:10Z

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

+                            // Another node was elected, and doesn't have the node in it
+                            if (clusterService.state().nodes().getMasterNode() != null
+                                && clusterService.state().nodes().nodeExists(joinRequest.getSourceNode().getId()) == false) {
+                                // Remove this listener to avoid memory leaks
+                                clusterService.removeListener(clusterStateListener);
+                                ll.onFailure(e);
                            }


It seems like it'd be simpler to call clusterStateListener.clusterChanged directly here, constructing a "fake" ClusterChangedEvent to carry the current cluster state. I think we want the same logic both ways: particularly if the node is for some reason in the current cluster state at this point then we can complete ll successfully too.

DaveCTurner · 2025-08-28T12:36:38Z

server/src/internalClusterTest/java/org/elasticsearch/cluster/coordination/NodeJoiningIT.java

+        Node N should join the cluster, but it should not be disconnected (#ES-11449)
+     */
+    @TestLogging(reason = "test includes assertions about logging", value = "org.elasticsearch.cluster.coordination:INFO")
+    public void testNodeTriesToJoinClusterAndThenSameMasterIsElected() {


Btw this test is a really good discriminator of the improvement, it failed for me 9 out of 10 times I ran it having reverted the behaviour change in Coordinator, and passed 10 out of 10 with the change in place. Great stuff.

DaveCTurner · 2025-09-01T14:06:28Z

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

-
                                        if (discoveryNodes.nodeExists(joinRequest.getSourceNode().getId())) {
+                                            // Remove this listener to avoid memory leaks
+                                            clusterService.removeListener(this);


Why have this within the if branches? It's on both branches so we call it either way, and I don't see a need to call it after calling nodeExists or anything like that?

…asticsearch into master-node-disconnect

DaveCTurner

LGTM

This reverts commit 656a7a9.

Unmutes `DataTierAllocationDeciderIT .testDesiredNodesAreTakenIntoAccountInAutoExpandReplicas` and `DataTierAllocationDeciderIT .testShardsAreKeptInPreferredTierUntilTheNextTierIsInItsFinalState` since the problematic commit elastic#132023 has been reverted

Unmutes `DataTierAllocationDeciderIT .testDesiredNodesAreTakenIntoAccountInAutoExpandReplicas` and `DataTierAllocationDeciderIT .testShardsAreKeptInPreferredTierUntilTheNextTierIsInItsFinalState` since the problematic commit #132023 has been reverted

joshua-adams-1 added 2 commits July 28, 2025 16:52

Merge branch 'main' into master-node-disconnect

829c69f

joshua-adams-1 self-assigned this Jul 28, 2025

joshua-adams-1 added >non-issue :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v9.2.0 labels Jul 28, 2025

joshua-adams-1 changed the title ~~Master node disconnect~~ [WIP] Master node disconnect Jul 28, 2025

joshua-adams-1 added 2 commits July 28, 2025 18:31

Checkstyle

fb05a2a

Merge branch 'main' into master-node-disconnect

962b3b2

joshua-adams-1 force-pushed the master-node-disconnect branch from fbc0026 to 962b3b2 Compare July 28, 2025 17:31

joshua-adams-1 added 4 commits August 7, 2025 14:58

Adds nodeExistsWithName method

2a541b0

Merge branch 'main' into master-node-disconnect

eb572d4

Minor tweaks

3e501b4

joshua-adams-1 force-pushed the master-node-disconnect branch from dcfb910 to 3e501b4 Compare August 7, 2025 16:19

pxsalehi reviewed Aug 11, 2025

View reviewed changes

joshua-adams-1 added 2 commits August 11, 2025 15:28

Uncomments out solution, and makes NodeJoiningIT

b6ddfba

test that the warn log is not present

Merge branch 'main' into master-node-disconnect

dac102a

joshua-adams-1 force-pushed the master-node-disconnect branch from d967904 to dac102a Compare August 11, 2025 14:30

Ran ./gradlew spotlessApply precommit

d269e47

joshua-adams-1 force-pushed the master-node-disconnect branch from 89f06d5 to d269e47 Compare August 11, 2025 14:50

joshua-adams-1 requested a review from DaveCTurner August 11, 2025 14:50

DaveCTurner reviewed Aug 12, 2025

View reviewed changes

joshua-adams-1 changed the title ~~[WIP] Master node disconnect~~ Master node disconnect Aug 12, 2025

DaveCTurner reviewed Aug 12, 2025

View reviewed changes

joshua-adams-1 added 4 commits August 14, 2025 10:21

Merge branch 'main' into master-node-disconnect

17e6bd6

Generalise logging expectations

cc67155

Merge branch 'main' into master-node-disconnect

d47915b

Merge branch 'master-node-disconnect' of github.com:joshua-adams-1/el…

8323493

…asticsearch into master-node-disconnect

[CI] Auto commit changes from spotless

ff5b4ec

DaveCTurner reviewed Aug 22, 2025

View reviewed changes

joshua-adams-1 added 4 commits August 22, 2025 13:36

Merge branch 'main' into master-node-disconnect

8ad70a2

Changes

8316b56

Merge branch 'main' into master-node-disconnect

8bf17e5

Merge branch 'main' into master-node-disconnect

3f46e13

joshua-adams-1 requested a review from DaveCTurner August 27, 2025 16:28

DaveCTurner reviewed Aug 28, 2025

View reviewed changes

joshua-adams-1 added 2 commits September 1, 2025 14:59

David Comments

75b118a

Merge branch 'main' into master-node-disconnect

c9f608f

DaveCTurner reviewed Sep 1, 2025

View reviewed changes

joshua-adams-1 added 4 commits September 2, 2025 09:02

Merge branch 'main' into master-node-disconnect

2f493f5

Merge branch 'master-node-disconnect' of github.com:joshua-adams-1/el…

dc7eb96

…asticsearch into master-node-disconnect

David Comments

3683676

Merge branch 'main' into master-node-disconnect

cd8502a

DaveCTurner approved these changes Sep 2, 2025

View reviewed changes

joshua-adams-1 merged commit 656a7a9 into elastic:main Sep 3, 2025
33 checks passed

joshua-adams-1 deleted the master-node-disconnect branch September 3, 2025 08:06

joshua-adams-1 mentioned this pull request Sep 3, 2025

Clean up CoordinatorTests #134028

Merged

joshua-adams-1 added a commit to joshua-adams-1/elasticsearch that referenced this pull request Sep 3, 2025

Revert "Master node disconnect (elastic#132023)"

b8ee38f

This reverts commit 656a7a9.

joshua-adams-1 mentioned this pull request Sep 3, 2025

Revert "Master node disconnect (#132023)" #134063

Closed

DaveCTurner added a commit that referenced this pull request Sep 3, 2025

Revert "Master node disconnect (#132023)"

b9620da

This reverts commit 656a7a9.

joshua-adams-1 mentioned this pull request Sep 3, 2025

Unmute tests #134066

Merged

thecoop mentioned this pull request Sep 3, 2025

[CI] ClusterStatsRestCancellationIT class failing #134060

Closed

joshua-adams-1 mentioned this pull request Sep 5, 2025

Master Node Disconnect #134213

Merged

	* If this listener is completed more than once then all results other than the first (whether successful or otherwise) are silently
	* discarded. All subscribed listeners will be notified of the same result, exactly once, even if several completions occur concurrently.

Master node disconnect #132023

Master node disconnect #132023

Uh oh!

Conversation

joshua-adams-1 commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshua-adams-1 commented Aug 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshua-adams-1 commented Aug 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

joshua-adams-1 commented Jul 28, 2025 •

edited

Loading