Skip to content

ADS consistency check rejects weight=0 clusters with gRPC clients (non-Envoy) #1389

@flyingyang

Description

@flyingyang

Description

Problem

go-control-plane's ADS consistency check rejects RDS configurations containing clusters with weight=0 when used with gRPC clients (grpc-go, grpc-c++), breaking Blue-Green deployments during pod cold starts.

This issue was originally reported to grpc-go (grpc/grpc-go#8865, PR grpc/grpc-go#8866), but grpc-go maintainers indicated this may be a go-control-plane issue.

Reproduction Scenario

Setup: Control plane sends RDS with Blue-Green at 100:0 ratio

RouteConfiguration {
  clusters: [
    { name: "cluster-blue",  weight: { value: 100 } },
    { name: "cluster-green", weight: { value: 0 } }
  ]
}

Failure flow:

  1. gRPC client receives RDS with [blue: 100, green: 0]
  2. Client determines it only needs blue (since green won't receive traffic)
  3. Client sends CDS request: resourceNames = [blue]
  4. go-control-plane rejects: "cluster-green" not listed - requested resources not a superset of subscribed resources
  5. Client enters TRANSIENT_FAILURE

Logs:

I0129 01:06:50 [OnStreamResponse] RouteConfiguration:
  clusters:{name:"cluster-green" weight:{}}              ← weight=0
  clusters:{name:"cluster-blue" weight:{value:100}}

I0129 01:06:50 CDS Request: resourceNames=[cluster-blue]  ← green not included

W0129 01:06:50 [go-control-plane] not responding to request:
  "cluster-green" not listed

Root Cause

gRPC uses explicit CDS subscriptions (unlike Envoy's wildcard * subscriptions), only requesting clusters it intends to use. go-control-plane's ADS validation requires clients to subscribe to all clusters in the snapshot, even weight=0 clusters that will never receive traffic.

Client CDS Subscription Subscribes to weight=0? Affected?
Envoy Wildcard * Yes (all clusters) ❌ No
grpc-go Explicit names No ✅ Yes
grpc-c++ Explicit names No ✅ Yes

grpc-go Maintainer Feedback

From grpc/grpc-go#8865:

"After internal discussion, we believe this is likely a bug in go-control-plane; specifically, the consistency check appears overly strict for gRPC clients. gRPC shouldn't have to subscribe to CDS resources it doesn't intend to use, and doing so would be inefficient.

If the control plane sets a cluster weight to 0, it shouldn't require a subscription for that cluster before responding. Envoy avoids this by using wildcard CDS subscriptions, whereas gRPC does not. We recommend investigating this as a go-control-plane issue regarding its handling of gRPC.

Note that gRPC C++ also behaves similarly to gRPC Go in this case."

Question

Should go-control-plane relax the ADS consistency check for clusters with weight=0?

The current check assumes all clients use wildcard subscriptions (like Envoy), but gRPC's explicit subscription model is a deliberate efficiency choice. Requiring subscriptions to weight=0 clusters that will never receive traffic seems inefficient.

Impact

This blocks:

  • Blue-Green deployments with instant rollback (100:0 ↔ 0:100)
  • Pod cold starts during 100:0 or 0:100 states
  • Emergency traffic switches requiring weight=0 standby clusters

Current Workaround

We're using a vendor patch to grpc-go that subscribes to weight=0 clusters, but this goes against the grpc-go maintainers' design philosophy.

Related Issues


We're happy to contribute a PR if there's consensus on whether this should be fixed in go-control-plane. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions