-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Extract metric handling from the desired balancer reconciler #120085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
DiannaHohensee
wants to merge
7
commits into
elastic:main
from
DiannaHohensee:2025/01/10/ES-10341-extract-metrics-handling-from-reconciler
Closed
Changes from 2 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
defdd38
Extract metric handling from the desired balancer reconciler
DiannaHohensee 7f24a2b
fix link
DiannaHohensee c9ff4f1
Merge branch 'main' into 2025/01/10/ES-10341-extract-metrics-handling…
DiannaHohensee c57565e
revamp, purely a refactor
DiannaHohensee 133ea3a
[CI] Auto commit changes from spotless
7fcf9fe
touchups
DiannaHohensee 68b2f97
Merge branch 'main' into 2025/01/10/ES-10341-extract-metrics-handling…
DiannaHohensee File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
60 changes: 60 additions & 0 deletions
60
...main/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancingRoundStats.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
|
|
||
| package org.elasticsearch.cluster.routing.allocation.allocator; | ||
|
|
||
| /** | ||
| * Data structure to pass allocation statistics between the desired balance classes. | ||
| */ | ||
| public record BalancingRoundStats( | ||
| long unassignedShards, | ||
| long totalAllocations, | ||
| long undesiredAllocationsExcludingShuttingDownNodes, | ||
| boolean rebalancing | ||
| ) { | ||
|
|
||
| /** | ||
| * Tracks in-progress balancing round statistics. | ||
| */ | ||
| public static class Builder { | ||
| long unassignedShards = 0; | ||
| long totalAllocations = 0; | ||
| long undesiredAllocationsExcludingShuttingDownNodes = 0; | ||
| boolean rebalancing = false; | ||
|
|
||
| public BalancingRoundStats build() { | ||
| return new BalancingRoundStats(unassignedShards, totalAllocations, undesiredAllocationsExcludingShuttingDownNodes, rebalancing); | ||
| } | ||
|
|
||
| public void setUnassignedShards(long numUnassignedShards) { | ||
| this.unassignedShards = numUnassignedShards; | ||
| } | ||
|
|
||
| public void incTotalAllocations() { | ||
| ++this.totalAllocations; | ||
| } | ||
|
|
||
| public void incUndesiredAllocationsExcludingShuttingDownNodes() { | ||
| ++this.undesiredAllocationsExcludingShuttingDownNodes; | ||
| } | ||
|
|
||
| public long getTotalAllocations() { | ||
| return this.totalAllocations; | ||
| } | ||
|
|
||
| public long getUndesiredAllocationsExcludingShuttingDownNodes() { | ||
| return this.undesiredAllocationsExcludingShuttingDownNodes; | ||
| } | ||
|
|
||
| public void setNoRebalancing() { | ||
| this.rebalancing = false; | ||
| } | ||
|
|
||
| } | ||
| }; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it's a bit confusing how there's a stats builder on the input, but we only use it for the eager reconciliation (assuming I've understood the code right). Could we instead just create one specifically for the eager reconciliation to make it clear it's local to that process?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. What part are you referring to as eager reconciliation? Is it this balance() method?
My intention with passing around a stats data structure is to collect additional statistics throughout the balancing round -- for example, I'll want balancing round start and end times, so I'll need stats tracking as early as this line before computation begins. A balancing round begins with a DesiredBalanceInput submitted to the ContinuousComputation object, which queues requests for a balancing round: the latest DesiredBalanceInput submission will be run async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean in
allocate()we callreconcile(...)withdesiredBalanceInput.clusterAllocationStatsBuilder()but then in the continuous computation we create separateStatsBuildersto pass tosubmitReconcileTask, even thoughdesiredBalanceInputis in scope.I just wonder whether the statsBuilder belongs in desiredBalanceInput? but I find the code a bit hard to follow so maybe there's an obvious reason for it being there.
I think I'm struggling to understand the lifecycle of the builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh. I think you found something buggy. So allocate will queue a DesiredBalanceInput for computation, which gets picked up asynchronously. But then reconcile() is run immediately and updates the metrics. Computation happens eventually.
I thought I was being thorough by passing the stats builder everywhere, but reconciliation always runs whereas computation occasionally runs (queues, so an input will get skipped if there's a new input). And then computation will normally lead to reconciliation being called.
Okay... So
allocate()needs two separate StatsBuilders, one for immediate reconciliation, and the other passed to computation but ultimately reconciliation. Or alternatively we skip instrumenting computation and create fresh StatsBuilders for each reconciliation event.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there will be some use of the
StatsBuilderin the computation phase (is that what// TODO: this will be expanded upon shortly in ES-10341.refers to?)If so, is is as simple as just
reconcile(...)inallocate(...)StatsBuilderinprocessInput?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've updated this patch to be purely refactor work. It doesn't lead into new work anymore. I think it's cleaner without pushing desired metrics objects into the Reconciler to handle: this cleans up the constructor and purpose of the Reconciler.