Skip to content

[xds] Implement A114: WRR support for custom backend metrics#12645

Open
sauravzg wants to merge 3 commits intogrpc:masterfrom
sauravzg:wrr-custom-metrics
Open

[xds] Implement A114: WRR support for custom backend metrics#12645
sauravzg wants to merge 3 commits intogrpc:masterfrom
sauravzg:wrr-custom-metrics

Conversation

@sauravzg
Copy link
Contributor

@sauravzg sauravzg commented Feb 4, 2026

Description

This PR implements gRFC A114: WRR Support for Custom Backend Metrics.

It updates the weighted_round_robin policy to allow users to configure which backend metrics drive the load balancing weights.

Key Changes

  • Configuration: Supports the new metric_names_for_computing_utilization field in WeightedRoundRobinLbConfig.
  • Weight Calculation: Implements logic to resolve custom metrics (including map lookups like named_metrics.foo) when application_utilization is absent.
  • Refactor: Centralizes the complex metric lookup and validation logic (checking for NaN, <= 0, etc.) into a new internal utility MetricReportUtils.
  • Testing: Verifies correct precedence: application_utilization > custom_metrics (max valid value) > cpu_utilization.

Updates the Weighted Round Robin (WRR) load balancing policy to support
customizable utilization metrics via the `metric_names_for_computing_utilization` configuration.
This allows endpoint weights to be driven by arbitrary named metrics (e.g. `named_metrics.foo`)
or other standard metrics (e.g. `memory_utilization`) instead of solely `application_utilization`
or the `cpu_utilization` fallback.
Refactors metric resolution logic into `io.grpc.xds.internal.MetricReportUtils`
to handle the new map lookup and validation requirements.
@sauravzg
Copy link
Contributor Author

cc: @danielzhaotongliu To TAL at the PR.

if (val != null) {
return OptionalDouble.of(val);
}
} else if (metricName.startsWith("named_metrics.")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, this should be orthogonal to gRFC A85, but I would like to double check whether the values here should be "in sync" which what is propagated from ORCA load report to LRS. e.g. if lrs_report_endpoint_metrics only allows named_metrics.foo and callinggetMetric(report, "bar") would still be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial idea based on Envoy's current implemention. But , this is still in discussion and maybe A114 may be restricted to just supporting named metrics.

https://github.com/grpc/proposal/pull/536/changes#r2831008956

In which case the implementation will be changes appropriately. Keeping this open for now.

if (orcaReportListener != null
&& orcaReportListener.errorUtilizationPenalty == errorUtilizationPenalty) {
&& orcaReportListener.errorUtilizationPenalty == errorUtilizationPenalty
&& Objects.equals(orcaReportListener.metricNamesForComputingUtilization,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the reason using Objects.equals be handling null cases?

Copy link
Contributor Author

@sauravzg sauravzg Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a simple equals. Yep. I initially decided we should keep it for defensive programming in case someone decides to change make things nullable, but maybe it's moot to worry about it since it's somewhat guaranteed to be non null currently.

@sauravzg sauravzg force-pushed the wrr-custom-metrics branch from 785c5f9 to 63c5bf3 Compare March 3, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants