-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Balancer skips disk related computation when disk weight factor is zero #136352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Balancer skips disk related computation when disk weight factor is zero #136352
Conversation
Flamegraph shows Balancer instantiation takes consider amount of time in an allocate call. More than 1/4 of the instantiation time is for computing disk related stats which is wasteful when the disk weight factor is zero. This PR skips these computations in such case.
avgDiskUsageInBytesPerNode = skipDiskUsageCalculation | ||
? 0 | ||
: WeightFunction.avgDiskUsageInBytesPerNode(allocation.clusterInfo(), metadata, routingNodes); | ||
nodes = Collections.unmodifiableMap(buildModelFromAssigned(skipDiskUsageCalculation)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost saving is realistic. My main question is whether the approach is considered hacky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about if rather than having the additional flag, we passed the weighting around, and the expensive parts could only perform the calculation if the weighting was non-zero?
I'm not sure if that's better, but it's a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could refer to this.balancingWeights
perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We synced offline and agreed to change the boolean flag to be a method on BalancingWeights
.
…or is zero" This reverts commit e6eef9e.
…tation-conditionally
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some comments, but nothing worth holding the change up for.
Map.of(), | ||
Map.of(), | ||
Map.of() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could use ClusterInfo.builder().shardSizes(...).build()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot there is builder. Pushed b7e22fc looks much nicer! Thanks!
private float maxShardSizeBytes(ProjectIndex index) { | ||
if (balancingWeights.diskUsageIgnored()) { | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit I have minor apprehensions about. But it seems like it's not an easy one to skip on the caller side. And we do have that information available to us here via the balancingWeights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove this change if you prefer. It does not really show up in the flamegraph. So I could be over-zealous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe that would be nicer. The behaviour seems a little surprising potentially. It would seem safer if the caller was skipping the call rather than the callee just returning zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that we can check it at the call site. Not sure why I initially thought it was not feasible ... Pushed 0affb99
Let me know if this works for you. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, ship it!
Flamegraph shows Balancer instantiation takes consider amount of time in an allocate call. More than 1/4 of the instantiation time is for computing disk related stats which is wasteful when the disk weight factor is zero. This PR skips these computations in such case.