-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Implement WriteLoadConstraintDecider#canAllocate #132041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement WriteLoadConstraintDecider#canAllocate #132041
Conversation
nicktindall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this seems reasonable to me
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Show resolved
Hide resolved
improvements and *Tests are complete
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
server/src/main/java/org/elasticsearch/cluster/routing/ShardMovementWriteLoadSimulator.java
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Outdated
Show resolved
Hide resolved
| assert nodeUsageStatsForThreadPools.threadPoolUsageStatsMap().get(ThreadPool.Names.WRITE) != null; | ||
| var nodeWriteThreadPoolStats = nodeUsageStatsForThreadPools.threadPoolUsageStatsMap().get(ThreadPool.Names.WRITE); | ||
| var nodeWriteThreadPoolLoadThreshold = writeLoadConstraintSettings.getWriteThreadPoolHighUtilizationThresholdSetting(); | ||
| if (nodeWriteThreadPoolStats.averageThreadPoolUtilization() >= nodeWriteThreadPoolLoadThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (nodeWriteThreadPoolStats.averageThreadPoolUtilization() >= nodeWriteThreadPoolLoadThreshold) {
This one looks redundant after calculateShardMovementChange. If simulation fails this one will fail too. I don't think "overhead" of calculateShardMovementChange will be noticeable anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the next check would also catch this. It's a matter of explanation message for the NO decision, really, at this point. They could be combined; separately the messages can be clearer for the user to understand, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calculateShardMovementChange already contains information about current thread pool utilization, so it's not hard to read that node is at high threshold before movement attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you're suggesting. Are you proposing to add logic to calculateShardMovementChange? If the threshold is already exceeded, the calculation adds on top of the value (exceeding threshold more), nothing need change there.
I like the clarity of the separate explain messages. Do you feel strongly about merging the two if statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly
server/src/main/java/org/elasticsearch/cluster/routing/ShardMovementWriteLoadSimulator.java
Show resolved
Hide resolved
...va/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderTests.java
Outdated
Show resolved
Hide resolved
| ClusterState clusterState = ClusterStateCreationUtils.stateWithAssignedPrimariesAndReplicas(new String[] { indexName }, 3, 1); | ||
| // The number of data nodes the util method above creates is numberOfReplicas+1. | ||
| assertEquals(3, clusterState.nodes().size()); | ||
| assertEquals(1, clusterState.metadata().getTotalNumberOfIndices()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find these assertion very distractive from the actual change. Unit test should assert behaviour of the unit in question, preferably one or two per unit test, we should not assert our setup infrastructure.
If these methods are not trusted we'd better make them trusted, or put assertion inside of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The helper method is not documented and goes through several method layers before selecting the number of nodes as a side-effect of the input. Properly improving the method, in my mind, would require: method renames; a new method parameter through the stack; and documentation. It would make a lot of noise in this PR.
The original intent of the helper method was pretty clearly index focused, not nodes. But it's very helpful in setting up the ClusterState so I don't have to roll it all by hand again myself. Perhaps I can do a follow up patch to address this? Rather than making the noise in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or dont use assertions for these, IMHO it will be fine, less code, easier to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, I'm not really comfortable relying on hidden method behavior without asserting that it continues to be true. If the behavior were to be changed, unaware of this dependency, this test would fail in unclear ways.
Would a follow up refactor be satisfactory? The asserts would no longer be necessary if the contract were changed.
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Show resolved
Hide resolved
...va/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderTests.java
Outdated
Show resolved
Hide resolved
… the allocation explain response
DiannaHohensee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, updated and ready for another round.
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Show resolved
Hide resolved
| assert nodeUsageStatsForThreadPools.threadPoolUsageStatsMap().get(ThreadPool.Names.WRITE) != null; | ||
| var nodeWriteThreadPoolStats = nodeUsageStatsForThreadPools.threadPoolUsageStatsMap().get(ThreadPool.Names.WRITE); | ||
| var nodeWriteThreadPoolLoadThreshold = writeLoadConstraintSettings.getWriteThreadPoolHighUtilizationThresholdSetting(); | ||
| if (nodeWriteThreadPoolStats.averageThreadPoolUtilization() >= nodeWriteThreadPoolLoadThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you're suggesting. Are you proposing to add logic to calculateShardMovementChange? If the threshold is already exceeded, the calculation adds on top of the value (exceeding threshold more), nothing need change there.
I like the clarity of the separate explain messages. Do you feel strongly about merging the two if statements?
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Outdated
Show resolved
Hide resolved
| ClusterState clusterState = ClusterStateCreationUtils.stateWithAssignedPrimariesAndReplicas(new String[] { indexName }, 3, 1); | ||
| // The number of data nodes the util method above creates is numberOfReplicas+1. | ||
| assertEquals(3, clusterState.nodes().size()); | ||
| assertEquals(1, clusterState.metadata().getTotalNumberOfIndices()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, I'm not really comfortable relying on hidden method behavior without asserting that it continues to be true. If the behavior were to be changed, unaware of this dependency, this test would fail in unclear ways.
Would a follow up refactor be satisfactory? The asserts would no longer be necessary if the contract were changed.
...va/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderTests.java
Outdated
Show resolved
Hide resolved
...va/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderTests.java
Outdated
Show resolved
Hide resolved
...va/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderTests.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Outdated
Show resolved
Hide resolved
|
|
||
| try (var ignoredRefs = fetchRefs) { | ||
| maybeFetchIndicesStats(diskThresholdEnabled || writeLoadConstraintEnabled == WriteLoadDeciderStatus.ENABLED); | ||
| maybeFetchIndicesStats(diskThresholdEnabled || writeLoadConstraintEnabled.atLeastLowThresholdEnabled()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: alternatively we could have a method WriteLoadDeciderStatus#requiresShardLevelWriteLoads() (and requiresNodeLevelWriteLoads()) which would return true for LOW_THRESHOLD_ONLY and ENABLED but false for DISABLED.
It would read nicer if the writeLoadConstraintEnabled field was called writeLoadDeciderStatus if we went that way.
Don't feel strongly about this naming thing though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requiresShardLevelWriteLoads and requiresNodeLevelWriteLoads doesn't seem like the right split, as I understand it. I was imagining LOW as the best-effort hot-spot prevention (canAllocate) without hot-spot correction (canRemain), and fully enabled as including hot-spot correction. Both node and shard level stats are needed for prevention, to compare the shard move write load change with the node's overall write load.
I'll leave this as is until some follow up.
nicktindall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with some nits
DiannaHohensee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied updates per Nick's review.
|
|
||
| try (var ignoredRefs = fetchRefs) { | ||
| maybeFetchIndicesStats(diskThresholdEnabled || writeLoadConstraintEnabled == WriteLoadDeciderStatus.ENABLED); | ||
| maybeFetchIndicesStats(diskThresholdEnabled || writeLoadConstraintEnabled.atLeastLowThresholdEnabled()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requiresShardLevelWriteLoads and requiresNodeLevelWriteLoads doesn't seem like the right split, as I understand it. I was imagining LOW as the best-effort hot-spot prevention (canAllocate) without hot-spot correction (canRemain), and fully enabled as including hot-spot correction. Both node and shard level stats are needed for prevention, to compare the shard move write load change with the node's overall write load.
I'll leave this as is until some follow up.
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Show resolved
Hide resolved
mhl-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* upstream/main: (32 commits) Speed up loading keyword fields with index sorts (elastic#132950) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testSyntheticSourceWithTranslogSnapshot elastic#132964 Simplify EsqlSession (elastic#132848) Implement WriteLoadConstraintDecider#canAllocate (elastic#132041) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/400_synthetic_source/_doc_count} elastic#132965 Switch to PR-based benchmark pipeline defined in ES repo (elastic#132941) Breakdown undesired allocations by shard routing role (elastic#132235) Implement v_magnitude function (elastic#132765) Introduce execution location marker for better handling of remote/local compatibility (elastic#132205) Mute org.elasticsearch.cluster.ClusterInfoServiceIT testMaxQueueLatenciesInClusterInfo elastic#132957 Unmuting simulate index data stream mapping overrides yaml rest test (elastic#132946) Remove CrossClusterCancellationIT.createLocalIndex() (elastic#132952) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetch elastic#132956 Fix failing UT by adding a required capability (elastic#132947) Precompute the BitsetCacheKey hashCode (elastic#132875) Adding simulate ingest effective mapping (elastic#132833) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetchMany elastic#132948 Rename skipping logic to remove hard link to skip_unavailable (elastic#132861) Store ignored source in unique stored fields per entry (elastic#132142) Add random tests with match_only_text multi-field (elastic#132380) ...
The initial version of the write load decider with #canAllocate implemented. Checks whether the new node assignment for a shard would exceed the node's simulated utilization threshold. Closes ES-12564
…-stats * upstream/main: (36 commits) Fix reproducability of builds against Java EA versions (elastic#132847) Speed up loading keyword fields with index sorts (elastic#132950) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testSyntheticSourceWithTranslogSnapshot elastic#132964 Simplify EsqlSession (elastic#132848) Implement WriteLoadConstraintDecider#canAllocate (elastic#132041) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/400_synthetic_source/_doc_count} elastic#132965 Switch to PR-based benchmark pipeline defined in ES repo (elastic#132941) Breakdown undesired allocations by shard routing role (elastic#132235) Implement v_magnitude function (elastic#132765) Introduce execution location marker for better handling of remote/local compatibility (elastic#132205) Mute org.elasticsearch.cluster.ClusterInfoServiceIT testMaxQueueLatenciesInClusterInfo elastic#132957 Unmuting simulate index data stream mapping overrides yaml rest test (elastic#132946) Remove CrossClusterCancellationIT.createLocalIndex() (elastic#132952) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetch elastic#132956 Fix failing UT by adding a required capability (elastic#132947) Precompute the BitsetCacheKey hashCode (elastic#132875) Adding simulate ingest effective mapping (elastic#132833) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetchMany elastic#132948 Rename skipping logic to remove hard link to skip_unavailable (elastic#132861) Store ignored source in unique stored fields per entry (elastic#132142) ...
| shardRouting.shardId() | ||
| ); | ||
| logger.debug(explain); | ||
| return Decision.single(Decision.Type.NO, NAME, explain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should respond "not-preferred" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was disagreement on implementing Decision#NOT_PREFERRED. So we're getting the basics in with NO, and I plan to explore the balancer and decision code in ES-11998 this sprint.
The initial version of the write load decider with #canAllocate
implemented. Checks whether the new node assignment for a shard
would exceed the node's simulated utilization threshold.
Closes ES-12564
Is this direction alright with folks? I'd like to get this working end-to-end (so we don't block each other), then we can improve pieces of the system in parallel. I need to add the testing, but want to check in first.Update: Ready for review now 👍 I've filed ES-12620 as a followup for IT testing. the Monitor (ES-11992) may be needed for more thorough testing, I haven't thought through how to write the testing yet, but I expect we should be able to get something working without it.