Skip to content

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Oct 3, 2025

Fixes #8125

The original race in the xDS client:

  • Resource watch is cancelled by the user of the xdsClient (e.g. xdsResolver)
  • xdsClient removes the resource from its cache and queues an unsubscribe request to the ADS stream.
  • A watch for the same resource is registered immediately, and the xdsClient instructs the ADS stream to subscribe (as it's not in cache).
  • The ADS stream sends a redundant request (same resources, version, nonce) which the management server ignores.
  • The new resource watch sees a "resource-not-found" error once the watch timer fires.

The original fix was to delay the resource's removal from the cache until the unsubscribe request was transmitted over the wire, a change implemented in #8369. However, this solution introduced new complications:

  • The resource's removal from the xdsClient's cache became an asynchronous operation, occurring while the unsubscribe request was being sent.
  • This asynchronous behavior meant the state maintained within the ADS stream could still diverge from the cache's state.
  • A critical section was absent between the ADS stream's message transmission logic and the xdsClient's cache access, which is performed during subscription/unsubscription by its users.

This PR simplifies the implementation of the ADS stream by removing two pieces of functionality

  • Stop batching of writes on the ADS stream
    • If the user registers multiple watches, e.g. resource A, B, and C, the stream would now send three requests: [A], [A B], [A B C].
  • Don't buffer writes when waiting for flow control
    • Flow control is already blocking reads from the stream. Blocking writes as well during this period might provide some additional flow control, but not much, and removing this logic simplifies the stream implementation quite a bit.

RELEASE NOTES:

  • xdsclient: fix a race in the xdsClient that could lead to resource-not-found errors

@easwars easwars added Type: Bug Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Oct 3, 2025
@easwars easwars added this to the 1.77 Release milestone Oct 3, 2025
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

❌ Patch coverage is 79.41176% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.10%. Comparing base (8389ddb) to head (54c1664).

Files with missing lines Patch % Lines
internal/xds/clients/xdsclient/ads_stream.go 81.81% 2 Missing and 2 partials ⚠️
internal/xds/clients/internal/buffer/unbounded.go 75.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8627      +/-   ##
==========================================
- Coverage   82.13%   82.10%   -0.03%     
==========================================
  Files         415      415              
  Lines       40711    40677      -34     
==========================================
- Hits        33437    33397      -40     
- Misses       5897     5906       +9     
+ Partials     1377     1374       -3     
Files with missing lines Coverage Δ
internal/xds/clients/internal/buffer/unbounded.go 94.23% <75.00%> (-5.77%) ⬇️
internal/xds/clients/xdsclient/ads_stream.go 82.83% <81.81%> (-1.32%) ⬇️

... and 26 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@easwars easwars requested review from arjan-bal and dfawley October 7, 2025 23:32
@easwars
Copy link
Contributor Author

easwars commented Oct 7, 2025

@danielzhaotongliu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xdsclient: race around resource subscriptions and unsubscsriptions
3 participants