Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
21d7b30
Don't fail on first node-join publication failure
joshua-adams-1 Jul 28, 2025
829c69f
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Jul 28, 2025
fb05a2a
Checkstyle
joshua-adams-1 Jul 28, 2025
962b3b2
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Jul 28, 2025
4b97f07
Adds Node Joining Integration Tests
joshua-adams-1 Aug 7, 2025
2a541b0
Adds nodeExistsWithName method
joshua-adams-1 Aug 7, 2025
eb572d4
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Aug 7, 2025
3e501b4
Minor tweaks
joshua-adams-1 Aug 7, 2025
b6ddfba
Uncomments out solution, and makes NodeJoiningIT
joshua-adams-1 Aug 11, 2025
dac102a
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Aug 11, 2025
d269e47
Ran ./gradlew spotlessApply precommit
joshua-adams-1 Aug 11, 2025
a668ab1
Modify Coordinator Logic
joshua-adams-1 Aug 12, 2025
17e6bd6
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Aug 14, 2025
cc67155
Generalise logging expectations
joshua-adams-1 Aug 14, 2025
d47915b
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Aug 14, 2025
f7cbcbd
[CI] Auto commit changes from spotless
Aug 14, 2025
c989fd0
Add YAML test for "missing lookup key" scenario (#132870)
smalyshev Aug 14, 2025
c1b1ef2
Add memory accounting to exponential histogram library. (#132580)
JonasKunz Aug 14, 2025
6708bb4
Vectorize BQVectorUtils#packAsBinary (#132923)
iverase Aug 14, 2025
4150643
Mute org.elasticsearch.xpack.esql.qa.mixed.EsqlClientYamlIT test {p0=…
elasticsearchmachine Aug 14, 2025
170a518
Remove mutes for resolved CsvTests issues (#132924)
idegtiarenko Aug 14, 2025
2ee0871
ESQL - Allow null values in vector similarity functions (#132919)
carlosdelest Aug 14, 2025
5eec40f
Vectorize BQSpaceUtils#transposeHalfByte (#132935)
iverase Aug 14, 2025
c27fe6a
Send max of two types of max queue latency to ClusterInfo (#132675)
DiannaHohensee Aug 14, 2025
15d6693
[DiskBBQ] Replace n_probe, related to the number of centroids with v…
iverase Aug 14, 2025
6c3fadc
[ML] Add spec files for Llama and AI21 (#132724)
jonathan-buttner Aug 14, 2025
f45e1ad
Remove awaits for closed issues (#132306)
smalyshev Aug 14, 2025
b238367
Suppport per-project behavior in ESQL extra verifiers (#131884)
mark-vieira Aug 14, 2025
36c9f02
Add random tests with match_only_text multi-field (#132380)
parkertimmins Aug 14, 2025
ba8c5e6
Store ignored source in unique stored fields per entry (#132142)
jordan-powers Aug 14, 2025
4a8f3f7
Rename skipping logic to remove hard link to skip_unavailable (#132861)
smalyshev Aug 14, 2025
cdfdb5e
Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetchMan…
elasticsearchmachine Aug 14, 2025
a4ac7fb
Adding simulate ingest effective mapping (#132833)
masseyke Aug 14, 2025
cbbacc0
Precompute the BitsetCacheKey hashCode (#132875)
joegallo Aug 14, 2025
c0a079f
Fix failing UT by adding a required capability (#132947)
julian-elastic Aug 14, 2025
3337cf2
Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetch #1…
elasticsearchmachine Aug 14, 2025
57e5887
Remove CrossClusterCancellationIT.createLocalIndex() (#132952)
JeremyDahlgren Aug 14, 2025
25333ca
Unmuting simulate index data stream mapping overrides yaml rest test …
masseyke Aug 14, 2025
433b827
Mute org.elasticsearch.cluster.ClusterInfoServiceIT testMaxQueueLaten…
elasticsearchmachine Aug 14, 2025
9203679
Introduce execution location marker for better handling of remote/loc…
smalyshev Aug 14, 2025
e6b86ef
Implement v_magnitude function (#132765)
svilen-mihaylov-elastic Aug 14, 2025
a180eaf
Breakdown undesired allocations by shard routing role (#132235)
nicktindall Aug 14, 2025
57db61c
Switch to PR-based benchmark pipeline defined in ES repo (#132941)
gbanasiak Aug 15, 2025
af0c58e
Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {…
elasticsearchmachine Aug 15, 2025
b7922ff
Implement WriteLoadConstraintDecider#canAllocate (#132041)
DiannaHohensee Aug 15, 2025
6485e97
Simplify EsqlSession (#132848)
idegtiarenko Aug 15, 2025
1d35bd3
Mute org.elasticsearch.index.mapper.LongFieldMapperTests testSyntheti…
elasticsearchmachine Aug 15, 2025
332a86c
Speed up loading keyword fields with index sorts (#132950)
dnhatn Aug 15, 2025
98be4ad
Merge remote-tracking branch 'upstream/main' into master-node-disconnect
joshua-adams-1 Aug 15, 2025
0e227f7
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Aug 15, 2025
00b27f3
Remove logger.info
joshua-adams-1 Aug 15, 2025
8c4de89
David Turner Comments
joshua-adams-1 Aug 19, 2025
d5141bc
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Aug 19, 2025
ed73225
Fix unit tests
joshua-adams-1 Aug 20, 2025
a7b2f0e
Merge branch 'master-node-disconnect' of github.com:joshua-adams-1/el…
joshua-adams-1 Aug 20, 2025
ad45d19
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Aug 20, 2025
0b4560f
Comments
joshua-adams-1 Aug 22, 2025
aa82afe
Unused method
joshua-adams-1 Aug 22, 2025
8323493
Merge branch 'master-node-disconnect' of github.com:joshua-adams-1/el…
joshua-adams-1 Aug 22, 2025
ff5b4ec
[CI] Auto commit changes from spotless
Aug 22, 2025
8ad70a2
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Aug 22, 2025
8316b56
Changes
joshua-adams-1 Aug 22, 2025
8bf17e5
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Aug 26, 2025
3f46e13
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Aug 27, 2025
75b118a
David Comments
joshua-adams-1 Sep 1, 2025
c9f608f
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Sep 1, 2025
2f493f5
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Sep 2, 2025
dc7eb96
Merge branch 'master-node-disconnect' of github.com:joshua-adams-1/el…
joshua-adams-1 Sep 2, 2025
3683676
David Comments
joshua-adams-1 Sep 2, 2025
cd8502a
Merge branch 'main' into master-node-disconnect
joshua-adams-1 Sep 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,13 @@
import org.elasticsearch.action.support.ActionFilters;
import org.elasticsearch.action.support.PlainActionFuture;
import org.elasticsearch.cluster.ClusterState;
import org.elasticsearch.cluster.ClusterStateApplier;
import org.elasticsearch.cluster.ClusterStateUpdateTask;
import org.elasticsearch.cluster.block.ClusterBlockException;
import org.elasticsearch.cluster.coordination.LeaderChecker;
import org.elasticsearch.cluster.coordination.MasterElectionTestCase;
import org.elasticsearch.cluster.coordination.PublicationTransportHandler;
import org.elasticsearch.cluster.coordination.StatefulPreVoteCollector;
import org.elasticsearch.cluster.metadata.IndexNameExpressionResolver;
import org.elasticsearch.cluster.node.DiscoveryNode;
import org.elasticsearch.cluster.service.ClusterService;
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.settings.Settings;
Expand All @@ -35,8 +34,6 @@
import org.elasticsearch.plugins.ActionPlugin;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.tasks.Task;
import org.elasticsearch.test.ClusterServiceUtils;
import org.elasticsearch.test.ESIntegTestCase;
import org.elasticsearch.test.transport.MockTransportService;
import org.elasticsearch.threadpool.ThreadPool;
import org.elasticsearch.transport.TransportService;
Expand All @@ -45,13 +42,11 @@
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.CyclicBarrier;

import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.greaterThan;

public class TransportMasterNodeActionIT extends ESIntegTestCase {
public class TransportMasterNodeActionIT extends MasterElectionTestCase {

@SuppressWarnings("unchecked")
@Override
Expand Down Expand Up @@ -86,7 +81,7 @@ public void testRoutingLoopProtection() {
.get()
.getState()
.term();
final var previousMasterKnowsNewMasterIsElectedLatch = configureElectionLatch(newMaster, cleanupTasks);
final var previousMasterKnowsNewMasterIsElectedLatch = configureElectionLatchForNewMaster(newMaster, cleanupTasks);

final var newMasterReceivedReroutedMessageFuture = new PlainActionFuture<>();
final var newMasterReceivedReroutedMessageListener = ActionListener.assertOnce(newMasterReceivedReroutedMessageFuture);
Expand Down Expand Up @@ -158,73 +153,6 @@ public void onFailure(Exception e) {
}
}

/**
* Block the cluster state applier on a node. Returns only when applier is blocked.
*
* @param nodeName The name of the node on which to block the applier
* @param cleanupTasks The list of clean up tasks
* @return A cyclic barrier which when awaited on will un-block the applier
*/
private static CyclicBarrier blockClusterStateApplier(String nodeName, ArrayList<Releasable> cleanupTasks) {
final var stateApplierBarrier = new CyclicBarrier(2);
internalCluster().getInstance(ClusterService.class, nodeName).getClusterApplierService().onNewClusterState("test", () -> {
// Meet to signify application is blocked
safeAwait(stateApplierBarrier);
// Wait for the signal to unblock
safeAwait(stateApplierBarrier);
return null;
}, ActionListener.noop());
cleanupTasks.add(stateApplierBarrier::reset);

// Wait until state application is blocked
safeAwait(stateApplierBarrier);
return stateApplierBarrier;
}

/**
* Configure a latch that will be released when the existing master knows of the new master's election
*
* @param newMaster The name of the newMaster node
* @param cleanupTasks The list of cleanup tasks
* @return A latch that will be released when the old master acknowledges the new master's election
*/
private CountDownLatch configureElectionLatch(String newMaster, List<Releasable> cleanupTasks) {
final String originalMasterName = internalCluster().getMasterName();
logger.info("Original master was {}, new master will be {}", originalMasterName, newMaster);
final var previousMasterKnowsNewMasterIsElectedLatch = new CountDownLatch(1);
ClusterStateApplier newMasterMonitor = event -> {
DiscoveryNode masterNode = event.state().nodes().getMasterNode();
if (masterNode != null && masterNode.getName().equals(newMaster)) {
previousMasterKnowsNewMasterIsElectedLatch.countDown();
}
};
ClusterService originalMasterClusterService = internalCluster().getInstance(ClusterService.class, originalMasterName);
originalMasterClusterService.addStateApplier(newMasterMonitor);
cleanupTasks.add(() -> originalMasterClusterService.removeApplier(newMasterMonitor));
return previousMasterKnowsNewMasterIsElectedLatch;
}

/**
* Add some master-only nodes and block until they've joined the cluster
* <p>
* Ensure that we've got 5 voting nodes in the cluster, this means even if the original
* master accepts its own failed state update before standing down, we can still
* establish a quorum without its (or our own) join.
*/
private static String ensureSufficientMasterEligibleNodes() {
final var votingConfigSizeListener = ClusterServiceUtils.addTemporaryStateListener(
cs -> 5 <= cs.coordinationMetadata().getLastCommittedConfiguration().getNodeIds().size()
);

try {
final var newNodeNames = internalCluster().startMasterOnlyNodes(Math.max(1, 5 - internalCluster().numMasterNodes()));
safeAwait(votingConfigSizeListener);
return newNodeNames.get(0);
} finally {
votingConfigSizeListener.onResponse(null);
}
}

private static final ActionType<ActionResponse.Empty> TEST_ACTION_TYPE = new ActionType<>("internal:test");

public static final class TestActionPlugin extends Plugin implements ActionPlugin {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the "Elastic License
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
* Public License v 1"; you may not use this file except in compliance with, at
* your election, the "Elastic License 2.0", the "GNU Affero General Public
* License v3.0 only", or the "Server Side Public License, v 1".
*/

package org.elasticsearch.cluster.coordination;

import org.elasticsearch.action.ActionListener;
import org.elasticsearch.cluster.ClusterStateApplier;
import org.elasticsearch.cluster.node.DiscoveryNode;
import org.elasticsearch.cluster.service.ClusterService;
import org.elasticsearch.core.Releasable;
import org.elasticsearch.test.ClusterServiceUtils;
import org.elasticsearch.test.ESIntegTestCase;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.CyclicBarrier;

/**
* An integration test base class to be used when a test requires a master re-election
*/
public abstract class MasterElectionTestCase extends ESIntegTestCase {

/**
* Block the cluster state applier on a node. Returns only when applier is blocked.
*
* @param nodeName The name of the node on which to block the applier
* @param cleanupTasks The list of clean up tasks
* @return A cyclic barrier which when awaited on will un-block the applier
*/
protected static CyclicBarrier blockClusterStateApplier(String nodeName, ArrayList<Releasable> cleanupTasks) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think you need to pull this one up to the base class, it only has one caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a668ab1

final var stateApplierBarrier = new CyclicBarrier(2);
internalCluster().getInstance(ClusterService.class, nodeName).getClusterApplierService().onNewClusterState("test", () -> {
// Meet to signify application is blocked
safeAwait(stateApplierBarrier);
// Wait for the signal to unblock
safeAwait(stateApplierBarrier);
return null;
}, ActionListener.noop());
cleanupTasks.add(stateApplierBarrier::reset);

// Wait until state application is blocked
safeAwait(stateApplierBarrier);
return stateApplierBarrier;
}

/**
* Configure a latch that will be released when the existing master knows of the new master's election
*
* @param newMaster The name of the newMaster node
* @param cleanupTasks The list of cleanup tasks
* @return A latch that will be released when the old master acknowledges the new master's election
*/
protected CountDownLatch configureElectionLatchForNewMaster(String newMaster, List<Releasable> cleanupTasks) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this'd be much simpler implemented with ClusterServiceUtils.addTemporaryStateListener. Not sure why we didn't do so when first written - I think it must have evolved to this from something more complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, have updated in a668ab1

final String originalMasterName = internalCluster().getMasterName();
logger.info("Original master was {}, new master will be {}", originalMasterName, newMaster);
final var previousMasterKnowsNewMasterIsElectedLatch = new CountDownLatch(1);
ClusterStateApplier newMasterMonitor = event -> {
DiscoveryNode masterNode = event.state().nodes().getMasterNode();
if (masterNode != null && masterNode.getName().equals(newMaster)) {
previousMasterKnowsNewMasterIsElectedLatch.countDown();
}
};
ClusterService originalMasterClusterService = internalCluster().getInstance(ClusterService.class, originalMasterName);
originalMasterClusterService.addStateApplier(newMasterMonitor);
cleanupTasks.add(() -> originalMasterClusterService.removeApplier(newMasterMonitor));
return previousMasterKnowsNewMasterIsElectedLatch;
}

/**
* Configure a latch that will be released when the existing master knows it has been re-elected
*
* @param masterNodeName The name of the current master node
* @param electedTerm The term the current master node was elected
* @param cleanupTasks The list of cleanup tasks
* @return A latch that will be released when the master acknowledges it's re-election
*/
protected CountDownLatch configureElectionLatchForReElectedMaster(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a separate method here? In the new tests I think we can just wait for the term to increase, as observed by any node - no need to worry about which node is master or anything so fiddly.

Also this one is only called from one place. I think it's a premature optimization to generalize these two test suites like this when they have so little in common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a668ab1

String masterNodeName,
long electedTerm,
List<Releasable> cleanupTasks
) {
final var masterKnowsItIsReElectedLatch = new CountDownLatch(1);
ClusterStateApplier newMasterMonitor = event -> {
DiscoveryNode masterNode = event.state().nodes().getMasterNode();
long currentTerm = event.state().coordinationMetadata().term();
if (masterNode != null && masterNode.getName().equals(masterNodeName) && currentTerm > electedTerm) {
logger.info("Master knows it's re-elected");
masterKnowsItIsReElectedLatch.countDown();
}
};
ClusterService masterClusterService = internalCluster().getInstance(ClusterService.class, masterNodeName);
masterClusterService.addStateApplier(newMasterMonitor);
cleanupTasks.add(() -> masterClusterService.removeApplier(newMasterMonitor));
return masterKnowsItIsReElectedLatch;
}

/**
* Add some master-only nodes and block until they've joined the cluster
* <p>
* Ensure that we've got 5 voting nodes in the cluster, this means even if the original
* master accepts its own failed state update before standing down, we can still
* establish a quorum without its (or our own) join.
*/
protected static String ensureSufficientMasterEligibleNodes() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests don't need 5 master nodes, and indeed having 5 (rather than 3) makes the situation unnecessarily complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in a668ab1

final var votingConfigSizeListener = ClusterServiceUtils.addTemporaryStateListener(
cs -> 5 <= cs.coordinationMetadata().getLastCommittedConfiguration().getNodeIds().size()
);

try {
final var newNodeNames = internalCluster().startMasterOnlyNodes(Math.max(1, 5 - internalCluster().numMasterNodes()));
safeAwait(votingConfigSizeListener);
return newNodeNames.get(0);
} finally {
votingConfigSizeListener.onResponse(null);
}
}
}
Loading