Skip to content

Conversation

DiannaHohensee
Copy link
Contributor

@DiannaHohensee DiannaHohensee commented Aug 25, 2025

Closes ES-12620


@DiannaHohensee DiannaHohensee self-assigned this Aug 25, 2025
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.2.0 labels Aug 25, 2025
@DiannaHohensee DiannaHohensee added :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team and removed needs:triage Requires assignment of a team area label labels Aug 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)


public NodeUsageStatsForThreadPools(StreamInput in) throws IOException {
this(in.readString(), in.readMap(ThreadPoolUsageStats::new));
this(in.readString(), in.readImmutableMap(ThreadPoolUsageStats::new));
Copy link
Contributor Author

@DiannaHohensee DiannaHohensee Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change the NodeUsageStatsForThreadPool constructor to read into an immutable map because the ShardMovements*Simulator expects an immutable map.

@DiannaHohensee DiannaHohensee added the >test Issues or PRs that are addressing/adding tests label Aug 25, 2025
Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some nits

protected Collection<Class<? extends Plugin>> getMockPlugins() {
final Collection<Class<? extends Plugin>> plugins = new ArrayList<>(super.getMockPlugins());
plugins.add(MockTransportService.TestPlugin.class);
return plugins;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe

return CollectionUtils.appendToCopy(super.nodePlugins(), MockTransportService.TestPlugin.class);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done 👍 bb30938

indexName,
Settings.builder().put(SETTING_NUMBER_OF_SHARDS, randomNumberOfShards).put(SETTING_NUMBER_OF_REPLICAS, 0).build()
);
index(indexName, Integer.toString(randomInt(10)), Collections.singletonMap("foo", "bar"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe

indexRandom(false, indexName, 1);

to make it clear the ID and doc is not important?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact is this even necessary? or does createIndex create all the shards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be necessary, good point. Removed and it works fine 👍

MockTransportService.getInstance(secondDataNodeName)
.addRequestHandlingBehavior(IndicesStatsAction.NAME + "[n]", (handler, request, channel, task) -> {
// Return no stats for the index because none are assigned to this node.
TransportIndicesStatsAction instance = internalCluster().getInstance(TransportIndicesStatsAction.class, firstDataNodeName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

internalCluster().getInstance(TransportIndicesStatsAction.class, secondDataNodeName);

Also below on the third one (maybe copy-paste error)? It probably doesn't matter because it doesn't look like NodeResponse references anything from the enclosing class when used this way, but it's a bit confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it should be! Thanks!

*/

logger.info("---> Refreshing the cluster info to pull in the dummy thread pool stats with a hot-spotting node");
final InternalClusterInfoService clusterInfoService = (InternalClusterInfoService) internalCluster().getInstance(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe

final InternalClusterInfoService clusterInfoService = asInstanceOf(InternalClusterService.class, internalCluster.getInstance(...

Just to make it fail with an assertion error if it's not what we expect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, done 👍

DiscoveryNode discoveryNode,
int totalWriteThreadPoolThreads,
float averageWriteThreadPoolUtilization,
long averageWriteThreadPoolQueueLatencyMillis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it max write thread pool queue latency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste takes another victim. Fixed. Thanks!

averageWriteThreadPoolQueueLatencyMillis
);
var threadPoolUsageMap = new HashMap<String, NodeUsageStatsForThreadPools.ThreadPoolUsageStats>();
threadPoolUsageMap.put(ThreadPool.Names.WRITE, writeThreadPoolUsageStats);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe this is more succinct (and also results in an immutable singleton map)?

var threadPoolUsageMap = Map.of(ThreadPool.Names.WRITE, new NodeUsageStatsForThreadPools.ThreadPoolUsageStats(
            totalWriteThreadPoolThreads,
            averageWriteThreadPoolUtilization,
            averageWriteThreadPoolQueueLatencyMillis
        );

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, immutable sounds like a win too 👍


/**
* Helper to create a dummy {@link ShardStats} for the given index shard with the supplied {@code peekWriteLoad} value.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: peak rather than peek?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍

.put(
WriteLoadConstraintSettings.WRITE_LOAD_DECIDER_ENABLED_SETTING.getKey(),
WriteLoadConstraintSettings.WriteLoadDeciderStatus.ENABLED
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set the write load threshold explicitly (and randomly) and calculate the above/below thresholds based on that below? We're relying on defaults as it stands and they might change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. IIUC, you're also proposing randomizing the node's reported thread pool utilization to somewhere above the random threshold? Updated as such. 5d3dd54

Copy link
Contributor Author

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed up 👍 Thanks for the review Nick.


/**
* Helper to create a dummy {@link ShardStats} for the given index shard with the supplied {@code peekWriteLoad} value.
*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍

averageWriteThreadPoolQueueLatencyMillis
);
var threadPoolUsageMap = new HashMap<String, NodeUsageStatsForThreadPools.ThreadPoolUsageStats>();
threadPoolUsageMap.put(ThreadPool.Names.WRITE, writeThreadPoolUsageStats);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, immutable sounds like a win too 👍

DiscoveryNode discoveryNode,
int totalWriteThreadPoolThreads,
float averageWriteThreadPoolUtilization,
long averageWriteThreadPoolQueueLatencyMillis
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste takes another victim. Fixed. Thanks!

*/

logger.info("---> Refreshing the cluster info to pull in the dummy thread pool stats with a hot-spotting node");
final InternalClusterInfoService clusterInfoService = (InternalClusterInfoService) internalCluster().getInstance(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, done 👍

MockTransportService.getInstance(secondDataNodeName)
.addRequestHandlingBehavior(IndicesStatsAction.NAME + "[n]", (handler, request, channel, task) -> {
// Return no stats for the index because none are assigned to this node.
TransportIndicesStatsAction instance = internalCluster().getInstance(TransportIndicesStatsAction.class, firstDataNodeName);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it should be! Thanks!

indexName,
Settings.builder().put(SETTING_NUMBER_OF_SHARDS, randomNumberOfShards).put(SETTING_NUMBER_OF_REPLICAS, 0).build()
);
index(indexName, Integer.toString(randomInt(10)), Collections.singletonMap("foo", "bar"));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be necessary, good point. Removed and it works fine 👍

protected Collection<Class<? extends Plugin>> getMockPlugins() {
final Collection<Class<? extends Plugin>> plugins = new ArrayList<>(super.getMockPlugins());
plugins.add(MockTransportService.TestPlugin.class);
return plugins;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done 👍 bb30938

.put(
WriteLoadConstraintSettings.WRITE_LOAD_DECIDER_ENABLED_SETTING.getKey(),
WriteLoadConstraintSettings.WriteLoadDeciderStatus.ENABLED
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. IIUC, you're also proposing randomizing the node's reported thread pool utilization to somewhere above the random threshold? Updated as such. 5d3dd54

@DiannaHohensee DiannaHohensee added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Aug 29, 2025
@elasticsearchmachine elasticsearchmachine merged commit ae3134d into elastic:main Aug 29, 2025
33 checks passed
@DiannaHohensee DiannaHohensee deleted the 2025/08/21/ES-12620-IT-Test branch August 29, 2025 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team >test Issues or PRs that are addressing/adding tests v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants