-
Notifications
You must be signed in to change notification settings - Fork 225
fix: goroutine leaks #2206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
NadavLevi
wants to merge
3
commits into
main
Choose a base branch
from
fix-goroutine-leaks
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
fix: goroutine leaks #2206
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 2 files with indirect coverage changes 🚀 New features to boost your workflow:
|
d5b223b to
079d3e6
Compare
The busy-wait goroutine that polls connection state did not check for context cancellation, causing it to run forever when connections timed out before becoming Ready. This was the primary cause of goroutine leaks observed in production when all providers were blocked during startup. Changes: - Add context.Done() check in the goroutine's polling loop - Use buffered channel to prevent goroutine blocking on send - Add non-blocking send to handle race between Ready state and context cancel - Add regression tests for goroutine cleanup behavior The fix ensures that when ConnectRawClientWithTimeout returns due to context timeout/cancellation, the internal goroutine exits promptly instead of running indefinitely.
…ic epoch checks The second chance goroutine previously slept for the full 3-minute retrySecondChanceAfter duration before checking if the epoch changed. This caused goroutine accumulation when many providers were blocked in quick succession during startup. Changes: - Replace single time.After(3min) with ticker checking every 10 seconds - Exit early if epoch changes, reducing goroutine lifetime from 3min to ~10s - Add trace logging for early exit to aid debugging - Add test for early exit on epoch change behavior This fix reduces goroutine accumulation when providers are repeatedly blocked during startup before they become available. Instead of 251 goroutines living for 3 minutes each, they now exit within ~10 seconds of an epoch change.
Add a sync.RWMutex to the Endpoint struct to protect concurrent access to Connections, ConnectionRefusals, and Enabled fields. This fixes a pre-existing race condition that occurred when multiple goroutines (from probeProviders) accessed the same Endpoint object simultaneously. The race was detected in fetchEndpointConnectionFromConsumerSessionWithProvider where multiple providers could share the same Endpoint object and concurrently modify its fields without synchronization. The fix: - Added `mu sync.RWMutex` field to Endpoint struct - Wrapped modifications to Connections, ConnectionRefusals, and Enabled with Lock/Unlock - Used RLock/RUnlock for read-only access to Enabled field - Carefully release lock before blocking network calls and re-acquire after
d82099a to
ea5a923
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Closes: #XXXX
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!in the type prefix if API or client breaking changemainbranchReviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...