You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -179,8 +175,8 @@ public void testNodeTriesToJoinClusterAndThenDifferentMasterIsElected() {
179
175
180
176
// Tests whether a WARN log is thrown when a node attempts to join a cluster, and then the same master node is re-elected (#126192)
181
177
@TestLogging(
182
-
reason = "test includes assertions about logging",
183
-
value = "org.elasticsearch.cluster.coordination.NodeJoinExecutor:WARN,org.elasticsearch.cluster.coordination.NodeJoinExecutor:INFO,org.elasticsearch.cluster.coordination.MasterService:WARN,org.elasticsearch.cluster.coordination.MasterService:INFO,org.elasticsearch.cluster.coordination.ClusterApplierService:WARN"
178
+
reason = "test includes assertions about logging",
179
+
value = "org.elasticsearch.cluster.coordination.NodeJoinExecutor:WARN,org.elasticsearch.cluster.coordination.NodeJoinExecutor:INFO,org.elasticsearch.cluster.coordination.MasterService:WARN,org.elasticsearch.cluster.coordination.MasterService:INFO,org.elasticsearch.cluster.coordination.ClusterApplierService:WARN"
This prevents a corner case, explained in #ES-11449, occurring as follows:
680
-
- Master M is in term T and has cluster state (T, V).
681
-
- Node N tries to join the cluster.
682
-
- M proposes cluster state (T, V+1) with N in the cluster.
683
-
- M accepts its own proposal and commits it to disk.
684
-
- M receives no responses. M doesn't know whether the state was accepted by a majority of nodes, rejected, or did not reach any nodes.
685
-
- There is a re-election and M wins. M publishes cluster state (T+1, V+2).
686
-
Since it's built from the cluster state on disk, N is still in the cluster.
687
-
- Since (T, V+1) failed, N's connection is dropped, even though its inclusion in the cluster may have been committed on a majority of master nodes.
688
-
- It can rejoin, but this throws a WARN log since it did not restart.
689
-
690
-
To mitigate this, we listen for any cluster state update:
691
-
1. (T, V+1) is accepted -> NodeConnectionsService now stores an open connection to N. It can be closed.
692
-
2. (T, V+1) is rejected -> A new cluster state is published without N in it. It is right to close the connection and retry.
693
-
3. The above scenario occurs. We do not close the connection after (T, V+1) fails and keep it open:
694
-
3.1 (T+1, V+2) is accepted -> By waiting, we did not close the connection to N unnecessarily
695
-
3.2 (T+1, V+2) is rejected -> A new cluster state is published without N in it. Closing is correct here.
696
-
*/
697
-
logger.info("inside callback, node is is {}, source node is {}", clusterService.state().nodes().getLocalNode().getName(), joinRequest.getSourceNode().getName());
0 commit comments