Skip to content

Conversation

@anuragagarwal561994
Copy link

@anuragagarwal561994 anuragagarwal561994 commented Nov 2, 2025

What type of PR is this?

What this PR does / why we need it:
This PR provides addition of new load balancer type client side weighted round robin. This is a new load balancing extension introduced since envoy 1.32

https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/load_balancing_policies/client_side_weighted_round_robin/v3/client_side_weighted_round_robin.proto

Which issue(s) this PR fixes:

Fixes #7305

Release Notes: Yes/No

@anuragagarwal561994
Copy link
Author

@jukie I have added the implementation and also tested it on my local setup

PS: the repo is so easy to contribute everything just works with the docs given on the site :)

@anuragagarwal561994
Copy link
Author

Also I wanted to know should I include slow start in client wrr?

So the thing is that I have submitted the proposal in grpc-xds grpc/proposal#498 and also in envoy I have got the proto updated.

It is not implemented yet, but I am trying to pick it up this month if my time allows

@anuragagarwal561994
Copy link
Author

anuragagarwal561994 commented Nov 2, 2025

Also I am unsure of how to test this e2e, so I have just included an AI generated e2e test suite.

The challenge here is that we need multiple replicas with each server respond with a specific header containing rps and cpu_utilisation and then the traffic is distributed by calculating the weight (rps / cpu)

I don't know what the current e2e tests allow and if this type of test case is feasible to write

@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

❌ Patch coverage is 72.85714% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.38%. Comparing base (aa3ad43) to head (777a1c1).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/xds/translator/cluster.go 58.53% 16 Missing and 1 partial ⚠️
internal/gatewayapi/clustersettings.go 92.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7407      +/-   ##
==========================================
+ Coverage   72.36%   72.38%   +0.02%     
==========================================
  Files         233      233              
  Lines       34343    34410      +67     
==========================================
+ Hits        24851    24908      +57     
- Misses       7712     7723      +11     
+ Partials     1780     1779       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@anuragagarwal561994
Copy link
Author

anuragagarwal561994 commented Nov 4, 2025

Also I wanted to know should I include slow start in client wrr?

So the thing is that I have submitted the proposal in grpc-xds grpc/proposal#498 and also in envoy I have got the proto updated.

It is not implemented yet, but I am trying to pick it up this month if my time allows

I have started the implementation of slow_start_config and locality lb config with WRR as well, if possible I would also want to include them in the gatway implementation.

envoyproxy/envoy#41841

@jukie
Copy link
Contributor

jukie commented Nov 13, 2025

We wait to add features here until they've made it into a full envoy release. The flow would be getting this lb support added for 1.7 and if your envoy changes get merged we can add that support to gateway in 1.8.

Let's keep the scope of this PR to what's currently available and we can always include additional features in a follow-up. Are you able to make the suggested changes or can you join the contributors call next week to discuss?

@anuragagarwal561994
Copy link
Author

Sure, I just paused because the other changes were also approved, but I understand I will make the respective changes as suggested. Will try to complete them by today / tomorrow @jukie

@anuragagarwal561994
Copy link
Author

@jukie I have made the respective changes

…Gateway CRDs, ensuring configurable parameters and validation rules are integrated. Includes e2e test for validation.

Signed-off-by: anurag.ag <[email protected]>
…eway CRDs and related configurations. Update associated test data and documentation.

Signed-off-by: anurag.ag <[email protected]>
…entSideWeightedRoundRobin configuration, update affected tests and CRDs.

Signed-off-by: anurag.ag <[email protected]>
…cross Gateway CRDs, configuration files, and related tests. Adjust documentation to reflect percentage-based representation.

Signed-off-by: anurag.ag <[email protected]>
// ClientSideWeightedRoundRobin defines configuration for Envoy's Client-Side Weighted Round Robin policy.
// See Envoy proto: envoy.extensions.load_balancing_policies.client_side_weighted_round_robin.v3.ClientSideWeightedRoundRobin
// Note: SlowStart is not supported for this policy in Envoy Gateway at this time.
type ClientSideWeightedRoundRobin struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw your pr adding slow_start was merged in as well so feel free to include that now if you'd like and we'll get this into v1.7.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the existing slow_start field be reused ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it uses extensions.load_balancing_policies.common.v3.SlowStartConfig which is the same as used in other lb policies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also support the min_weight_percent and aggression now in this MR or in a separate MR?

@jukie
Copy link
Contributor

jukie commented Nov 27, 2025

Overall looks good, just a few more comments @anuragagarwal561994! Thanks for adding this and make sure to run gen-check before pushing.

Sorry for the delayed review on this. I'll prioritize helping you with this next week.

// ClientSideWeightedRoundRobin defines configuration for Envoy's Client-Side Weighted Round Robin policy.
// See Envoy proto: envoy.extensions.load_balancing_policies.client_side_weighted_round_robin.v3.ClientSideWeightedRoundRobin
// Note: SlowStart is not supported for this policy in Envoy Gateway at this time.
type ClientSideWeightedRoundRobin struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the first iteration, do we need to add all these fields, or can be rely on the defaults that are preset ?
can you share the motivation for adding these fields

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have removed the other fields which are not currently supported by envoy, but at present I guess the fields being published are important to properly configure the system.

For example black out period can be different for different applications based on how much time they take to get the metrics state. We have some apps which does heavy data loading in the beginning and hence their blackout periods are high.

I think all of these metrics are application specific and we have in our sytems already configured / tuned them for different services. So since the implementation is already done are there any concerns of why we should not include them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah as a project, we want to limit the scope of API fields to the ones majority users will configure, trying to simplify the complexity and only add it when needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that compromises with the configurability as it depends lot on the service being served

}

// ClientSideWeightedRoundRobin defines configuration for Envoy's Client-Side Weighted Round Robin policy.
// See Envoy proto: envoy.extensions.load_balancing_policies.client_side_weighted_round_robin.v3.ClientSideWeightedRoundRobin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/load_balancing_policies/client_side_weighted_round_robin/v3/client_side_weighted_round_robin.proto is not descriptive enough, and doesnt mention how this needs to be instrumented in the upstream, are there other links than can be used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the docs aren't great... for example out-of-band reporting isn't supported at all yet reading the docs would indicate otherwise.

We could contribute some docs changes upstream but agreed that including a reference to how endpoint-load-metrics and endpoint-load-metrics-bin are instrumented would be useful here.

Copy link
Author

@anuragagarwal561994 anuragagarwal561994 Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arkodg what changes would you suggest me to add here?

This is a bit more elaborative document https://docs.cloud.google.com/load-balancing/docs/https/applb-custom-metrics although this can also be confusing and gcp specific.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do any docs exist that explain how a backend / server should be enhanced to send the appropriate trailors in the response ?
looking for a more end user facing doc for the feature ( which is described in https://www.youtube.com/watch?v=lfv_Oj1BLn0)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a google doc I will have to find it though but nothing stated officially as far I am aware of

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider adding one if it doesnt exist
cc @AndresGuedez

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, this is good feedback on the docs gaps.

I'll work with @efimki on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @AndresGuedez ! Happy to be the beta tester

// RoundRobinLoadBalancerType load balancer policy.
RoundRobinLoadBalancerType LoadBalancerType = "RoundRobin"
// ClientSideWeightedRoundRobinLoadBalancerType load balancer policy.
ClientSideWeightedRoundRobinLoadBalancerType LoadBalancerType = "ClientSideWeightedRoundRobin"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on other names involving BackendMetrics or Utilization or ORCA ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If length is your concern ClientSideWRR is an abbreviation I've seen used in a few places.

I agree the name is confusing but it's hard to rename this as the exclusive ORCA or Utilization lb type because similar lb policies exist such as WrrLocality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also client side doesnt convey any useful meaning to the user, we refer to downstream as clients in EG

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change the name, also is there a discussion group or slack channel where we can disucss and close this?

…ercentage-based representation in Gateway CRDs, configurations, and tests. Adjust related documentation and ensure backward compatibility.

Signed-off-by: anurag.ag <[email protected]>
…r` across gateway configurations and tests for cleaner and consistent duration handling. Remove unused `metav1` imports where applicable.

Signed-off-by: anurag.ag <[email protected]>
…dRoundRobin load balancer in Gateway CRDs, configurations, and tests. Adjust validation rules and documentation accordingly.

Signed-off-by: anurag.ag <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for Client Load Balancing Weighted Round Robin

5 participants