allocation: add balancer round summary as metrics #136043

schase-es · 2025-10-06T17:26:30Z

This commit adds the BalancerRoundSummary as a collection of APM/open telemetry metrics. These are already logged. The summary collected every ten seconds or so is set as the current state into the telemetry metrics class (AllocationBalancingRoundMetrics). Whenever the telemetry runs, each metric picks up its current view.

Fixes: ES-10343

This commit adds the BalancerRoundSummary as a collection of APM/open telemetry metrics. These are already logged. The summary collected every ten seconds or so is set as the current state into the telemetry metrics class (AllocationBalancingRoundMetrics). Whenever the telemetry runs, each metric picks up its current view.

elasticsearchmachine · 2025-10-08T20:15:52Z

Hi @schase-es, I've created a changelog YAML for you.

nicktindall

Just some comments, I think we should discuss with Dianna what we're aiming for with the node deltas

nicktindall · 2025-10-14T22:44:33Z

server/src/main/java/org/elasticsearch/cluster/ClusterModule.java

        bind(AllocationStatsService.class).toInstance(allocationStatsService);
        bind(TelemetryProvider.class).toInstance(telemetryProvider);
        bind(DesiredBalanceMetrics.class).toInstance(desiredBalanceMetrics);
+        bind(AllocationBalancingRoundMetrics.class).toInstance(balancingRoundMetrics);


Binding like this is only necessary if we use the instance in a cluster annotated @Inject, I'm not sure if we do with the AllocationBalancingRoundMetrics? It probably doesn't matter, but I think in general we don't do it unless we need to.

nicktindall · 2025-10-14T22:46:22Z

.../org/elasticsearch/cluster/routing/allocation/allocator/AllocationBalancingRoundMetrics.java

+        assert summary != null : "balancing round metrics cannot be null";
+
+        nodeNameToWeightChangesRef.set(summary.nodeNameToWeightChanges());
+        if (enableSending) {


Do we need to set the nodeNameToWeightChangesRef if enableSending = false ?

nicktindall · 2025-10-14T23:22:41Z

.../org/elasticsearch/cluster/routing/allocation/allocator/AllocationBalancingRoundMetrics.java

+            long shardCount = nodeWeightChanges.baseWeights().shardCount() + nodeWeightChanges.weightsDiff().shardCountDiff();
+            metrics.add(new LongWithAttributes(shardCount, getNodeAttributes(nodeWeights.getKey())));
+        }
+        return metrics;


I don't think we want the current values of shard count, disk usage and write load, because we already have those in the cluster balance dashboard

Perhaps the deltas are more interesting? I think number of balancing rounds, and shard movements are definitely valuable as you have them published now, but I'm less clear on the value of the specific node shard/weight/disk usage deltas/values that we don't get already from existing metrics.

Maybe the absolute amount of change (e.g. if one node loses X weight and another gains X the sum would be 2X) in those values might be interesting?

If we plotted any of these values for the cluster by simply adding them together, they'd sum to zero I think? because for every node gaining X shards there are other node(s) losing X shards. @DiannaHohensee might have more clarity on the direction here.

I think for serverless as well, disk usage will always be zero after this change?

As discussed I think we should reduce the scope on this PR to just the two metrics we know how to publish and do a second PR when we've discussed how we'd like to publish the write load/shard count/disk usage

nicktindall · 2025-10-14T23:35:29Z

.../org/elasticsearch/cluster/routing/allocation/allocator/AllocationBalancingRoundMetrics.java

+
+    private Map<String, Object> getNodeAttributes(String nodeId) {
+        return Map.of("node_id", nodeId);
+    }


I think metrics by default get a node_id and a corresponding node_name, for the node they are emitted from. Perhaps we should use something other than node_id here? Or set node_name also?

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.3.0 labels Oct 6, 2025

[CI] Auto commit changes from spotless

99844ad

schase-es added :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement and removed needs:triage Requires assignment of a team area label labels Oct 8, 2025

schase-es marked this pull request as draft October 8, 2025 20:15

schase-es and others added 4 commits October 8, 2025 13:15

Update docs/changelog/136043.yaml

dbce2bd

Added enableSending flag in, and some renames

840b003

Added metrics consolidation, correct diff calculation, and some tests.

5a94657

[CI] Auto commit changes from spotless

c7195fc

nicktindall reviewed Oct 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

allocation: add balancer round summary as metrics #136043

allocation: add balancer round summary as metrics #136043

Uh oh!

schase-es commented Oct 6, 2025

Uh oh!

elasticsearchmachine commented Oct 8, 2025

Uh oh!

nicktindall left a comment

Uh oh!

nicktindall Oct 14, 2025

Uh oh!

nicktindall Oct 14, 2025

Uh oh!

nicktindall Oct 14, 2025

Uh oh!

nicktindall Oct 15, 2025

Uh oh!

nicktindall Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

allocation: add balancer round summary as metrics #136043

Are you sure you want to change the base?

allocation: add balancer round summary as metrics #136043

Uh oh!

Conversation

schase-es commented Oct 6, 2025

Uh oh!

elasticsearchmachine commented Oct 8, 2025

Uh oh!

nicktindall left a comment

Choose a reason for hiding this comment

Uh oh!

nicktindall Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

nicktindall Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

nicktindall Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

nicktindall Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

nicktindall Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants