Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions muted-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -564,9 +564,6 @@ tests:
- class: org.elasticsearch.cluster.ClusterStateSerializationTests
method: testSerializationPreMultiProject
issue: https://github.com/elastic/elasticsearch/issues/130872
- class: org.elasticsearch.cluster.coordination.votingonly.VotingOnlyNodePluginTests
method: testPreferFullMasterOverVotingOnlyNodes
issue: https://github.com/elastic/elasticsearch/issues/130883

# Examples:
#
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -965,12 +965,13 @@ public void awaitMasterNode(String viaNode, String masterNodeName) {
}

/**
* Waits for all nodes in the cluster to have a consistent view of which node is currently the master.
* Waits for all nodes forming the publish quorum in the cluster to have a consistent view of which node is currently the master.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct. A publish quorum is some subset of the master-eligible nodes in the cluster (e.g. 2 of the 3 masters) which is much weaker than what this does. Here we're waiting for all the nodes in the cluster (i.e. in ClusterState#nodes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I meant the nodes could still be in a different cluster state version after this method is returned, is this true? But you are right that the nodes should all have the same master. The trouble here is that there could still be nodes trying to join the cluster when this returns. In that case, it does not guarantee the "all nodes", which is more than ClusterState#nodes, to see the same master. Technically it is the current behaviour, but kindas defeats its purpose in such situation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setWaitForEvents(Priority.LANGUID) means that there was a point in time during the execution of this method where there was a committed cluster state and all the nodes in that cluster state were on the same cluster state version (or stuff was timing out, but that doesn't happen in these tests). It can't guarantee that another publication hasn't started before the method returns ofc.

You're right that there may also be nodes still trying to join the cluster at this point, but they're not in the cluster (they haven't joined it yet). In practice, it's up to the caller to account for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted changes to the comment in 5445ef6. We can update it separately (if at all). Thanks!

*/
public void awaitMasterNode() {
// The cluster health API always runs on the master node, and the master only completes cluster state publication when all nodes
// in the cluster have accepted the new cluster state. By waiting for all events to have finished on the master node, we ensure
// that the whole cluster has a consistent view of which node is the master.
// forming the publish quorum have accepted the new cluster state. By waiting for all events to have finished on the
// master node, we ensure that all nodes that are part of the publish quorum have a consistent view of which node is the master.
// But nodes that are _not_ part of the quorum might still not have the same view yet.
clusterAdmin().prepareHealth(TEST_REQUEST_TIMEOUT).setTimeout(TEST_REQUEST_TIMEOUT).setWaitForEvents(Priority.LANGUID).get();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,8 @@ public void testPreferFullMasterOverVotingOnlyNodes() throws Exception {
internalCluster().setBootstrapMasterNodeIndex(0);
internalCluster().startNodes(2);
internalCluster().startNode(addRoles(Set.of(DiscoveryNodeRole.VOTING_ONLY_NODE_ROLE)));
internalCluster().startDataOnlyNodes(randomInt(2));
final int numDataNodes = randomInt(2);
internalCluster().startDataOnlyNodes(numDataNodes);
assertBusy(
() -> assertThat(
clusterAdmin().prepareState(TEST_REQUEST_TIMEOUT).get().getState().getLastCommittedConfiguration().getNodeIds().size(),
Expand All @@ -108,6 +109,7 @@ public void testPreferFullMasterOverVotingOnlyNodes() throws Exception {
internalCluster().stopCurrentMasterNode();
awaitMasterNode();
assertNotEquals(originalMaster, internalCluster().getMasterName());
ensureStableCluster(2 + numDataNodes); // wait for all nodes to join
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This weakens the test slightly more than I'd like. The trouble here is in fact that these tests run with autoManageMasterNodes = false so they skip over the call to validateClusterFormed() when starting nodes here:

if (autoManageMasterNodes) {
validateClusterFormed();
}

That makes sense in general because if we're not auto-bootstrapping the test cluster we cannot expect it to be fully-formed after each node starts. But in these tests we can expect the cluster to be fully formed after we've finished starting all the nodes, so we should call that method explicitly ourselves there. Importantly, I think we should do that before stopping the original master node, since the new master node should not spuriously drop any of the other nodes from the cluster when it takes over.

I think I'd like us to rework these tests slightly so that rather than an assertBusy() on the current state having the right size of configuration we use ESIntegTestCase#awaitClusterState(Predicate<ClusterState>) to wait for a state which has the right voting configuration and the right number of nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as suggested in ce1452c
Please let me know if it looks right to you. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update also includes fix for a new but similar nature failure #130979

assertThat(
VotingOnlyNodePlugin.isVotingOnlyNode(
clusterAdmin().prepareState(TEST_REQUEST_TIMEOUT).get().getState().nodes().getMasterNode()
Expand Down