Skip to content

Conversation

@DiannaHohensee
Copy link
Contributor

@DiannaHohensee DiannaHohensee commented Feb 10, 2025

The life cycle of the node weights during the balancer process:

  1. BalancedShardsAllocator runs a round and places the node weights on the RoutingNodes copy
  2. DesiredBalanceComputer then moves the node weights from the RoutingNodes to the DesiredBalance
  3. Reconciliation then takes node weights from the DesiredBalance to the DesiredBalanceMetrics#update*() method.

So I'm grabbing the node weights from the DesiredBalance, comparing the previous and new DesiredBalance instances after a balancing round goes to update the current DesiredBalance reference.

I settled on summarizing the changes by saving the old DesiredBalance weights per node along with a weights diff to reach the new DesiredBalance's weights per node. This allows me to combine multiple summaries by using the oldest summary's base node weights, and summing the diffs across summaries to reach the final node weight diffs.

@DiannaHohensee DiannaHohensee added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team labels Feb 10, 2025
@DiannaHohensee DiannaHohensee self-assigned this Feb 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

* @param numberOfBalancingRounds How many balancing round summaries are combined in this report.
* @param numberOfShardMoves The sum of shard moves for each balancing round being combined into a single summary.
*/
public record CombinedBalancingRoundSummary(int numberOfBalancingRounds, long numberOfShardMoves) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this into the BalancingRoundSummary class because I wanted to make it more obvious that both should be updated at the same time. I have not yet thought of a way to enforce the relationship.

/**
* Summarizes the work required to move from an old to new desired balance shard allocation.
*/
private BalancingRoundSummary calculateBalancingRoundSummary(DesiredBalance oldDesiredBalance, DesiredBalance newDesiredBalance) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this logic into the AllocationBalancingRoundSummaryService to make it more unit testable. The summaries are also going to become more complex as I add more summary stats/metrics: that logic seems more appropriate in the summary service.

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor comments, biggest concern is probably that delta in assertDoubleEquals

Again, sorry for the delay

balancerRoundSummaryService.addBalancerRoundSummary(calculateBalancingRoundSummary(oldDesiredBalance, newDesiredBalance));
balancerRoundSummaryService.addBalancerRoundSummary(
AllocationBalancingRoundSummaryService.createBalancerRoundSummary(oldDesiredBalance, newDesiredBalance)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I wonder if it's worth just changing the signature (and perhaps the name to reflect the change) of addBalancerRoundSummary to just take the old and new desired balances? then reduces some of the coupling between DesiredBalanceShardsAllocator and the AllocationBalancingRoundSummaryService?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could do that and keep the testability (it could just call a static method on itself that could be exposed for testing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your suggestion. If I change the method to

public void addBalancerRoundSummary(DesiredBalance oldDesired, DesiredBalance newDesired)

Then for testing addBalancerRoundSummary I'll have to create DesiredBalance instances and reason about the BalancingRoundSummary that results. Right now it's simpler to test addBalancerRoundSummary.

The createBalancerRoundSummary method could still be tested the same way even if it was internal (exposed for testing, as you say). But addBalancerRoundSummary seems more complicated, unless I'm missing something 🤔

Copy link
Contributor

@nicktindall nicktindall Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I just meant maybe it's easier, for the caller, to provide an overload e.g.

class AllocationBalancerRoundSummary {

   public void addBalancerRoundSummary(DesiredBalance old, DesiredBalance new) {
      addBalancerRoundSummary(createBalancerRoundSummary(old, new));
   }

   public void addBalancerRoundSummary(BalancingRoundSummary summary) {
      // ...
   }

   public static BalancingRoundSummary createBalancingRoundSummary(DesiredBalance old, DesiredBalance new) { 
      // ...
   }
}

That way DesiredBalanceShardAllocator doesn't need to know about createBalancingRoundSummary?

The overload being just a convenience that I'd argue doesn't need to be explicitly tested itself because it's just delegating to things that are tested.

But no strong feelings about this one. Up to you :)

Copy link
Contributor Author

@DiannaHohensee DiannaHohensee Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, thanks for explaining. Sure, done in 3df5904

Copy link
Contributor Author

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with the feedback: e99d3b9

I realized there was a little more cleanup I could do, per this comment. I also don't think I understand your suggestion in this other comment: could you take another look? @nicktindall

balancerRoundSummaryService.addBalancerRoundSummary(calculateBalancingRoundSummary(oldDesiredBalance, newDesiredBalance));
balancerRoundSummaryService.addBalancerRoundSummary(
AllocationBalancingRoundSummaryService.createBalancerRoundSummary(oldDesiredBalance, newDesiredBalance)
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your suggestion. If I change the method to

public void addBalancerRoundSummary(DesiredBalance oldDesired, DesiredBalance newDesired)

Then for testing addBalancerRoundSummary I'll have to create DesiredBalance instances and reason about the BalancingRoundSummary that results. Right now it's simpler to test addBalancerRoundSummary.

The createBalancerRoundSummary method could still be tested the same way even if it was internal (exposed for testing, as you say). But addBalancerRoundSummary seems more complicated, unless I'm missing something 🤔

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DiannaHohensee DiannaHohensee merged commit b1e6908 into elastic:main Feb 26, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants