Implement WriteLoadConstraintDecider#canAllocate #132041

DiannaHohensee · 2025-07-28T20:03:54Z

The initial version of the write load decider with #canAllocate
implemented. Checks whether the new node assignment for a shard
would exceed the node's simulated utilization threshold.

Closes ES-12564

Is this direction alright with folks? I'd like to get this working end-to-end (so we don't block each other), then we can improve pieces of the system in parallel. I need to add the testing, but want to check in first.

Update: Ready for review now 👍 I've filed ES-12620 as a followup for IT testing. the Monitor (ES-11992) may be needed for more thorough testing, I haven't thought through how to write the testing yet, but I expect we should be able to get something working without it.

nicktindall

Yep this seems reasonable to me

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java

improvements and *Tests are complete

elasticsearchmachine · 2025-08-11T23:00:28Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

server/src/main/java/org/elasticsearch/cluster/routing/ShardMovementWriteLoadSimulator.java

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java

mhl-b · 2025-08-12T00:01:14Z

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java

+        assert nodeUsageStatsForThreadPools.threadPoolUsageStatsMap().get(ThreadPool.Names.WRITE) != null;
+        var nodeWriteThreadPoolStats = nodeUsageStatsForThreadPools.threadPoolUsageStatsMap().get(ThreadPool.Names.WRITE);
+        var nodeWriteThreadPoolLoadThreshold = writeLoadConstraintSettings.getWriteThreadPoolHighUtilizationThresholdSetting();
+        if (nodeWriteThreadPoolStats.averageThreadPoolUtilization() >= nodeWriteThreadPoolLoadThreshold) {


if (nodeWriteThreadPoolStats.averageThreadPoolUtilization() >= nodeWriteThreadPoolLoadThreshold) {
This one looks redundant after calculateShardMovementChange. If simulation fails this one will fail too. I don't think "overhead" of calculateShardMovementChange will be noticeable anywhere.

Yes, the next check would also catch this. It's a matter of explanation message for the NO decision, really, at this point. They could be combined; separately the messages can be clearer for the user to understand, I think.

calculateShardMovementChange already contains information about current thread pool utilization, so it's not hard to read that node is at high threshold before movement attempt.

I'm not sure what you're suggesting. Are you proposing to add logic to calculateShardMovementChange? If the threshold is already exceeded, the calculation adds on top of the value (exceeding threshold more), nothing need change there.

I like the clarity of the separate explain messages. Do you feel strongly about merging the two if statements?

I don't feel strongly

server/src/main/java/org/elasticsearch/cluster/routing/ShardMovementWriteLoadSimulator.java

...va/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderTests.java

mhl-b · 2025-08-12T00:16:14Z

...va/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderTests.java

+        ClusterState clusterState = ClusterStateCreationUtils.stateWithAssignedPrimariesAndReplicas(new String[] { indexName }, 3, 1);
+        // The number of data nodes the util method above creates is numberOfReplicas+1.
+        assertEquals(3, clusterState.nodes().size());
+        assertEquals(1, clusterState.metadata().getTotalNumberOfIndices());


I find these assertion very distractive from the actual change. Unit test should assert behaviour of the unit in question, preferably one or two per unit test, we should not assert our setup infrastructure.

If these methods are not trusted we'd better make them trusted, or put assertion inside of those.

The helper method is not documented and goes through several method layers before selecting the number of nodes as a side-effect of the input. Properly improving the method, in my mind, would require: method renames; a new method parameter through the stack; and documentation. It would make a lot of noise in this PR.

The original intent of the helper method was pretty clearly index focused, not nodes. But it's very helpful in setting up the ClusterState so I don't have to roll it all by hand again myself. Perhaps I can do a follow up patch to address this? Rather than making the noise in this PR.

or dont use assertions for these, IMHO it will be fine, less code, easier to read

Uh, I'm not really comfortable relying on hidden method behavior without asserting that it continues to be true. If the behavior were to be changed, unaware of this dependency, this test would fail in unclear ways.

Would a follow up refactor be satisfactory? The asserts would no longer be necessary if the contract were changed.

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java

...va/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderTests.java

… the allocation explain response

DiannaHohensee

Thanks for the feedback, updated and ready for another round.

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java

DiannaHohensee · 2025-08-13T22:21:25Z

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java

+        assert nodeUsageStatsForThreadPools.threadPoolUsageStatsMap().get(ThreadPool.Names.WRITE) != null;
+        var nodeWriteThreadPoolStats = nodeUsageStatsForThreadPools.threadPoolUsageStatsMap().get(ThreadPool.Names.WRITE);
+        var nodeWriteThreadPoolLoadThreshold = writeLoadConstraintSettings.getWriteThreadPoolHighUtilizationThresholdSetting();
+        if (nodeWriteThreadPoolStats.averageThreadPoolUtilization() >= nodeWriteThreadPoolLoadThreshold) {


I'm not sure what you're suggesting. Are you proposing to add logic to calculateShardMovementChange? If the threshold is already exceeded, the calculation adds on top of the value (exceeding threshold more), nothing need change there.

I like the clarity of the separate explain messages. Do you feel strongly about merging the two if statements?

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java

DiannaHohensee · 2025-08-13T22:27:28Z

...va/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderTests.java

+        ClusterState clusterState = ClusterStateCreationUtils.stateWithAssignedPrimariesAndReplicas(new String[] { indexName }, 3, 1);
+        // The number of data nodes the util method above creates is numberOfReplicas+1.
+        assertEquals(3, clusterState.nodes().size());
+        assertEquals(1, clusterState.metadata().getTotalNumberOfIndices());


Uh, I'm not really comfortable relying on hidden method behavior without asserting that it continues to be true. If the behavior were to be changed, unaware of this dependency, this test would fail in unclear ways.

Would a follow up refactor be satisfactory? The asserts would no longer be necessary if the contract were changed.

...va/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderTests.java

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java

nicktindall · 2025-08-14T06:01:43Z

server/src/main/java/org/elasticsearch/cluster/InternalClusterInfoService.java


            try (var ignoredRefs = fetchRefs) {
-                maybeFetchIndicesStats(diskThresholdEnabled || writeLoadConstraintEnabled == WriteLoadDeciderStatus.ENABLED);
+                maybeFetchIndicesStats(diskThresholdEnabled || writeLoadConstraintEnabled.atLeastLowThresholdEnabled());


Nit: alternatively we could have a method WriteLoadDeciderStatus#requiresShardLevelWriteLoads() (and requiresNodeLevelWriteLoads()) which would return true for LOW_THRESHOLD_ONLY and ENABLED but false for DISABLED.

It would read nicer if the writeLoadConstraintEnabled field was called writeLoadDeciderStatus if we went that way.

Don't feel strongly about this naming thing though.

requiresShardLevelWriteLoads and requiresNodeLevelWriteLoads doesn't seem like the right split, as I understand it. I was imagining LOW as the best-effort hot-spot prevention (canAllocate) without hot-spot correction (canRemain), and fully enabled as including hot-spot correction. Both node and shard level stats are needed for prevention, to compare the shard move write load change with the node's overall write load.

I'll leave this as is until some follow up.

nicktindall

LGTM, with some nits

DiannaHohensee

Applied updates per Nick's review.

DiannaHohensee · 2025-08-14T15:31:59Z

server/src/main/java/org/elasticsearch/cluster/InternalClusterInfoService.java


            try (var ignoredRefs = fetchRefs) {
-                maybeFetchIndicesStats(diskThresholdEnabled || writeLoadConstraintEnabled == WriteLoadDeciderStatus.ENABLED);
+                maybeFetchIndicesStats(diskThresholdEnabled || writeLoadConstraintEnabled.atLeastLowThresholdEnabled());


requiresShardLevelWriteLoads and requiresNodeLevelWriteLoads doesn't seem like the right split, as I understand it. I was imagining LOW as the best-effort hot-spot prevention (canAllocate) without hot-spot correction (canRemain), and fully enabled as including hot-spot correction. Both node and shard level stats are needed for prevention, to compare the shard move write load change with the node's overall write load.

I'll leave this as is until some follow up.

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java

mhl-b

LGTM

* upstream/main: (32 commits) Speed up loading keyword fields with index sorts (elastic#132950) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testSyntheticSourceWithTranslogSnapshot elastic#132964 Simplify EsqlSession (elastic#132848) Implement WriteLoadConstraintDecider#canAllocate (elastic#132041) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/400_synthetic_source/_doc_count} elastic#132965 Switch to PR-based benchmark pipeline defined in ES repo (elastic#132941) Breakdown undesired allocations by shard routing role (elastic#132235) Implement v_magnitude function (elastic#132765) Introduce execution location marker for better handling of remote/local compatibility (elastic#132205) Mute org.elasticsearch.cluster.ClusterInfoServiceIT testMaxQueueLatenciesInClusterInfo elastic#132957 Unmuting simulate index data stream mapping overrides yaml rest test (elastic#132946) Remove CrossClusterCancellationIT.createLocalIndex() (elastic#132952) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetch elastic#132956 Fix failing UT by adding a required capability (elastic#132947) Precompute the BitsetCacheKey hashCode (elastic#132875) Adding simulate ingest effective mapping (elastic#132833) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetchMany elastic#132948 Rename skipping logic to remove hard link to skip_unavailable (elastic#132861) Store ignored source in unique stored fields per entry (elastic#132142) Add random tests with match_only_text multi-field (elastic#132380) ...

The initial version of the write load decider with #canAllocate implemented. Checks whether the new node assignment for a shard would exceed the node's simulated utilization threshold. Closes ES-12564

…-stats * upstream/main: (36 commits) Fix reproducability of builds against Java EA versions (elastic#132847) Speed up loading keyword fields with index sorts (elastic#132950) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testSyntheticSourceWithTranslogSnapshot elastic#132964 Simplify EsqlSession (elastic#132848) Implement WriteLoadConstraintDecider#canAllocate (elastic#132041) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/400_synthetic_source/_doc_count} elastic#132965 Switch to PR-based benchmark pipeline defined in ES repo (elastic#132941) Breakdown undesired allocations by shard routing role (elastic#132235) Implement v_magnitude function (elastic#132765) Introduce execution location marker for better handling of remote/local compatibility (elastic#132205) Mute org.elasticsearch.cluster.ClusterInfoServiceIT testMaxQueueLatenciesInClusterInfo elastic#132957 Unmuting simulate index data stream mapping overrides yaml rest test (elastic#132946) Remove CrossClusterCancellationIT.createLocalIndex() (elastic#132952) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetch elastic#132956 Fix failing UT by adding a required capability (elastic#132947) Precompute the BitsetCacheKey hashCode (elastic#132875) Adding simulate ingest effective mapping (elastic#132833) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetchMany elastic#132948 Rename skipping logic to remove hard link to skip_unavailable (elastic#132861) Store ignored source in unique stored fields per entry (elastic#132142) ...

henningandersen · 2025-08-19T13:02:24Z

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java

+                shardRouting.shardId()
+            );
+            logger.debug(explain);
+            return Decision.single(Decision.Type.NO, NAME, explain);


I think this should respond "not-preferred" instead?

There was disagreement on implementing Decision#NOT_PREFERRED. So we're getting the basics in with NO, and I plan to explore the balancer and decision code in ES-11998 this sprint.

WriteLoadConstraintDecider PoC

e508206

DiannaHohensee requested review from mhl-b and nicktindall July 28, 2025 20:03

DiannaHohensee self-assigned this Jul 28, 2025

DiannaHohensee added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team v9.2.0 labels Jul 28, 2025

DiannaHohensee requested a review from DaveCTurner July 28, 2025 20:04

add test extension

72471c9

nicktindall reviewed Jul 29, 2025

View reviewed changes

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java Show resolved Hide resolved

mhl-b reviewed Jul 29, 2025

View reviewed changes

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java Show resolved Hide resolved

mhl-b reviewed Jul 29, 2025

View reviewed changes

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java Show resolved Hide resolved

mhl-b reviewed Jul 29, 2025

View reviewed changes

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java Show resolved Hide resolved

DiannaHohensee and others added 12 commits July 29, 2025 18:49

Merge branch 'main' into 2025/07/28/write-load-decider

4fdc73f

WIP testing -- can't run until internet returns

121d4de

wip

3dcb94d

Merge branch 'main' into 2025/07/28/write-load-decider

9102349

log msg

6d85655

Merge branch 'main' into 2025/07/28/write-load-decider

64d6344

Merge branch 'main' into 2025/07/28/write-load-decider

606cadc

check that new shard assignment won't exceed threshold

68be3df

improvements and *Tests are complete

cleanup

6244ef8

[CI] Auto commit changes from spotless

da441c3

clear IT test for later

7ce1e59

Merge branch 'main' into 2025/07/28/write-load-decider

a6783a2

DiannaHohensee changed the title ~~WriteLoadConstraintDecider PoC~~ Implement WriteLoadConstraintDecider#canAllocate Aug 11, 2025

DiannaHohensee marked this pull request as ready for review August 11, 2025 23:00

DiannaHohensee commented Aug 11, 2025

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/routing/ShardMovementWriteLoadSimulator.java Show resolved Hide resolved

mhl-b reviewed Aug 12, 2025

View reviewed changes

nicktindall reviewed Aug 12, 2025

View reviewed changes

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java Show resolved Hide resolved

nicktindall reviewed Aug 12, 2025

View reviewed changes

...va/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderTests.java Outdated Show resolved Hide resolved

DiannaHohensee added 6 commits August 13, 2025 15:13

fix allocation decider call order

74b38e1

update WriteLoadDeciderStatus callers with enum helper methods

8b955e6

change debugMsg to explain, to reflect what that the output goes into…

fc934dc

… the allocation explain response

randomize testing to select ENABLED or LOW_THRESHOLD_ONLY

a0e61e0

modularize test set up

3014d88

Merge branch 'main' into 2025/07/28/write-load-decider

6197bcb

DiannaHohensee commented Aug 13, 2025

View reviewed changes

DiannaHohensee requested review from mhl-b and nicktindall August 13, 2025 23:26

nicktindall reviewed Aug 14, 2025

View reviewed changes

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java Show resolved Hide resolved

nicktindall reviewed Aug 14, 2025

View reviewed changes

...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java Outdated Show resolved Hide resolved

nicktindall reviewed Aug 14, 2025

View reviewed changes

nicktindall approved these changes Aug 14, 2025

View reviewed changes

DiannaHohensee added 3 commits August 14, 2025 08:27

Merge branch 'main' into 2025/07/28/write-load-decider

b896291

improve explanation messages: limit decimal points and improve wording

0824b59

Merge branch 'main' into 2025/07/28/write-load-decider

2e6824f

DiannaHohensee commented Aug 14, 2025

View reviewed changes

DiannaHohensee added 2 commits August 14, 2025 15:16

add explanation to canRemain method to fix testing

85f4448

Merge branch 'main' into 2025/07/28/write-load-decider

1e53258

mhl-b approved these changes Aug 14, 2025

View reviewed changes

DiannaHohensee merged commit c1aadc4 into elastic:main Aug 15, 2025
32 of 33 checks passed

henningandersen reviewed Aug 19, 2025

View reviewed changes

Implement WriteLoadConstraintDecider#canAllocate #132041

Implement WriteLoadConstraintDecider#canAllocate #132041

Uh oh!

Conversation

DiannaHohensee commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicktindall left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhl-b Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DiannaHohensee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicktindall left a comment

Choose a reason for hiding this comment

Uh oh!

DiannaHohensee left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mhl-b left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

DiannaHohensee commented Jul 28, 2025 •

edited

Loading

mhl-b Aug 12, 2025 •

edited

Loading