-
Notifications
You must be signed in to change notification settings - Fork 558
Description
Description
Problem
go-control-plane's ADS consistency check rejects RDS configurations containing clusters with weight=0 when used with gRPC clients (grpc-go, grpc-c++), breaking Blue-Green deployments during pod cold starts.
This issue was originally reported to grpc-go (grpc/grpc-go#8865, PR grpc/grpc-go#8866), but grpc-go maintainers indicated this may be a go-control-plane issue.
Reproduction Scenario
Setup: Control plane sends RDS with Blue-Green at 100:0 ratio
RouteConfiguration {
clusters: [
{ name: "cluster-blue", weight: { value: 100 } },
{ name: "cluster-green", weight: { value: 0 } }
]
}Failure flow:
- gRPC client receives RDS with
[blue: 100, green: 0] - Client determines it only needs
blue(sincegreenwon't receive traffic) - Client sends CDS request:
resourceNames = [blue] - go-control-plane rejects:
"cluster-green" not listed - requested resources not a superset of subscribed resources - Client enters
TRANSIENT_FAILURE
Logs:
I0129 01:06:50 [OnStreamResponse] RouteConfiguration:
clusters:{name:"cluster-green" weight:{}} ← weight=0
clusters:{name:"cluster-blue" weight:{value:100}}
I0129 01:06:50 CDS Request: resourceNames=[cluster-blue] ← green not included
W0129 01:06:50 [go-control-plane] not responding to request:
"cluster-green" not listed
Root Cause
gRPC uses explicit CDS subscriptions (unlike Envoy's wildcard * subscriptions), only requesting clusters it intends to use. go-control-plane's ADS validation requires clients to subscribe to all clusters in the snapshot, even weight=0 clusters that will never receive traffic.
| Client | CDS Subscription | Subscribes to weight=0? | Affected? |
|---|---|---|---|
| Envoy | Wildcard * |
Yes (all clusters) | ❌ No |
| grpc-go | Explicit names | No | ✅ Yes |
| grpc-c++ | Explicit names | No | ✅ Yes |
grpc-go Maintainer Feedback
From grpc/grpc-go#8865:
"After internal discussion, we believe this is likely a bug in go-control-plane; specifically, the consistency check appears overly strict for gRPC clients. gRPC shouldn't have to subscribe to CDS resources it doesn't intend to use, and doing so would be inefficient.
If the control plane sets a cluster weight to 0, it shouldn't require a subscription for that cluster before responding. Envoy avoids this by using wildcard CDS subscriptions, whereas gRPC does not. We recommend investigating this as a go-control-plane issue regarding its handling of gRPC.
Note that gRPC C++ also behaves similarly to gRPC Go in this case."
Question
Should go-control-plane relax the ADS consistency check for clusters with weight=0?
The current check assumes all clients use wildcard subscriptions (like Envoy), but gRPC's explicit subscription model is a deliberate efficiency choice. Requiring subscriptions to weight=0 clusters that will never receive traffic seems inefficient.
Impact
This blocks:
- Blue-Green deployments with instant rollback (100:0 ↔ 0:100)
- Pod cold starts during 100:0 or 0:100 states
- Emergency traffic switches requiring weight=0 standby clusters
Current Workaround
We're using a vendor patch to grpc-go that subscribes to weight=0 clusters, but this goes against the grpc-go maintainers' design philosophy.
Related Issues
- xDS: RDS unmarshal skips weight=0 clusters causing ADS validation failures grpc/grpc-go#8865 - Original issue report
- xds: include weight=0 clusters in RDS unmarshal for ADS consistency grpc/grpc-go#8866 - PR closed, redirected here
- xDS spec allows
weight=0(meaning 0% traffic)
We're happy to contribute a PR if there's consensus on whether this should be fixed in go-control-plane. Thank you!