Skip to content

Reset configSelector on realChannel while entering IDLE state#12832

Open
kkalin68 wants to merge 2 commits into
grpc:masterfrom
kkalin68:xds-fix-channel-idle-limbo
Open

Reset configSelector on realChannel while entering IDLE state#12832
kkalin68 wants to merge 2 commits into
grpc:masterfrom
kkalin68:xds-fix-channel-idle-limbo

Conversation

@kkalin68
Copy link
Copy Markdown

@kkalin68 kkalin68 commented May 27, 2026

ManagedChannel will stuck in IDLE state when xDS control plane doesn't have a resource anymore.
The scenario is following:

  1. Channel is open for a xds resource.
  2. XdsNameResolver subscribes to the resource on xDS control plane.
  3. The resource is removed from xDS control plane for extended period of time (unhealthy for more than the idle timeout on Channel)
  4. Channel enters into TRANSIENT_FAILURE
  5. The Idle timeout triggers and Channel shutdowns XdsNameResolver and other resources. xDS watchers are removed.
  6. The resource comes back online on xDS control plane.
  7. A new GRPC call is executed targeting the channel.
  8. The channel stays in IDLE state and reports: "io.grpc.StatusRuntimeException: UNAVAILABLE: LDS resource xxxx does not exist nodeID: yyyy" because realChannel.configSelector still points to old state.
public <ReqT, RespT> ClientCall<ReqT, RespT> newCall(
        MethodDescriptor<ReqT, RespT> method, CallOptions callOptions) {
      if (configSelector.get() != INITIAL_PENDING_SELECTOR) {
        return newClientCall(method, callOptions);
      }
...

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented May 27, 2026

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: kkalin68 / name: Konstantin Kalin (f09b7d9)

@AgraVator
Copy link
Copy Markdown
Contributor

AgraVator commented Jun 2, 2026

Hey,
Please fix the failing test cases. Thanks.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a ManagedChannel idle-mode edge case where a stale InternalConfigSelector can prevent the channel from properly exiting IDLE after xDS resources disappear and later reappear.

Changes:

  • Reset realChannel’s config selector to INITIAL_PENDING_SELECTOR when entering IDLE, forcing the next call to trigger resolver/LB restart.
  • Prevent pending-call reprocessing when updateConfigSelector() is called with INITIAL_PENDING_SELECTOR (only reprocess when transitioning away from the initial pending selector).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 425 to 429
shutdownNameResolverAndLoadBalancer(true);
delayedTransport.reprocess(null);
realChannel.updateConfigSelector(INITIAL_PENDING_SELECTOR);
channelLogger.log(ChannelLogLevel.INFO, "Entering IDLE state");
channelStateManager.gotoState(IDLE);
@kkalin68
Copy link
Copy Markdown
Author

kkalin68 commented Jun 2, 2026

Hey, Please fix the failing test cases. Thanks.

The tests are passing locally. Also the failures in tests(11) and tests(17) runs are unrelated to changes I made. And same tests passed in tests(8) and test(21). Also I don't see an option to retrigger the failed tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants