-
Notifications
You must be signed in to change notification settings - Fork 204
fix: memory leak on account of pending dials #790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: memory leak on account of pending dials #790
Conversation
82146cd to
1ccdfbe
Compare
pkg/server/tunnel.go
Outdated
| klog.ErrorS(err, "no tunnels available") | ||
| conn.Write([]byte(fmt.Sprintf("HTTP/1.1 500 Internal Server Error\r\nContent-Type: text/plain\r\n\r\ncurrently no tunnels available: %v", err))) | ||
| conn.Close() | ||
| // conn.Close() is handled by the top-level defer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(non blocking) Can we fix the comment to indicate is should be closed by the closeOnce above?
| established := false | ||
| defer func() { | ||
| if !established { | ||
| if t.Server.PendingDial.Remove(random) != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little concerned that established does not guarantee that random is in PendingDial. If we got a response (DIAL_CLS or DIAL_RSP) but the select below goes to a different case than connection.connected, established will be false but random will be absent from PendingDial and I do not think Remove() is resilient to that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cheftako I hear your concern but Remove is using delete which treats removal of a non-existent entry as a no-op
|
/test pull-apiserver-network-proxy-test-master |
This commit fixes a goroutine leak in the HTTP-Connect tunnel (tunnel.go) that could occur during connection setup. The leak happened when a backend agent disconnected at a very specific time: after the server sent a DIAL_REQ but before the connection was fully established. In this scenario, the cleanup logic was never called, and the handler goroutine would hang forever. I've refactored ServeHTTP to make it more robust and prevent this leak: Added a deferred cleanup function: A defer block now acts as a safety net. It uses a flag (established) to track whether the connection succeeded. If the function exits for any reason before the connection is established, this deferred code runs and guarantees that the pending dial is removed from our tracking map. Fixed a race condition with a single select: The old code had separate, racy checks for different failure modes. I've replaced this with a single, atomic select block that waits for all possible outcomes at once: a successful connection, the client disconnecting, or the agent's context being cancelled. This makes the logic much safer and easier to follow. Signed-off-by: Imran Pochi <[email protected]>
61b8654 to
e12c96f
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheftako, ipochi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…k-pending-dials fix: memory leak on account of pending dials
…k-pending-dials fix: memory leak on account of pending dials
…k-pending-dials fix: memory leak on account of pending dials
Backporting #790 to release-0.32 from kinvolk/imran/fix-mem-leak-pending-dials
Backporting #790 to release-0.31 from kinvolk/imran/fix-mem-leak-pending-dials
Backporting #790 to release-0.33 from kinvolk/imran/fix-mem-leak-pending-dials
Fixes: #789