Set TIMEOUT and CONNECTTIMEOUT on pycurl handles in CurlClient#2488
Open
jlmorton wants to merge 1 commit intocelery:mainfrom
Open
Set TIMEOUT and CONNECTTIMEOUT on pycurl handles in CurlClient#2488jlmorton wants to merge 1 commit intocelery:mainfrom
jlmorton wants to merge 1 commit intocelery:mainfrom
Conversation
The CurlClient._setup_request() method configures many pycurl options but never sets TIMEOUT or CONNECTTIMEOUT on the curl handle. The Request class already defines request_timeout (30s) and connect_timeout (30s), but these values are never applied to the underlying curl transfer. Without TIMEOUT, pycurl will wait indefinitely for a response on a socket that remains open but stops delivering data. This happens during network partitions, proxy restarts, or sidecar terminations — the TCP connection stays established (no RST, no FIN) but the remote end stops sending. The in-flight HTTP request hangs forever, which permanently breaks the SQS async polling loop: the promise callback from _get_from_sqs never fires, so _loop1 is never re-invoked and the consumer stops polling for messages. The worker process stays alive (event loop keeps running, liveness probes pass) but never consumes another message. With TIMEOUT set, pycurl aborts the stalled transfer after request_timeout seconds and returns an error. The existing error handling in _process/_process_pending_requests picks this up, the transport reconnects, and the polling loop resumes normally.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds explicit per-request timeout configuration to the pycurl-based async HTTP client to prevent indefinitely hanging sockets (notably impacting long-running async SQS polling).
Changes:
- Apply
request_timeoutto the curl handle viapycurl.TIMEOUT. - Apply
connect_timeoutto the curl handle viapycurl.CONNECTTIMEOUT. - Document rationale in-code (network partitions / proxy termination leading to hangs).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Comment on lines
+205
to
+209
| if request.request_timeout: | ||
| setopt(_pycurl.TIMEOUT, int(request.request_timeout)) | ||
| if request.connect_timeout: | ||
| setopt(_pycurl.CONNECTTIMEOUT, int(request.connect_timeout)) | ||
|
|
There was a problem hiding this comment.
Two issues with the new timeout handling:
int()truncates float seconds and can disable timeouts. Example: request_timeout=0.5 passes theifcheck, butint(0.5)becomes 0; in libcurl/pycurl a TIMEOUT/CONNECTTIMEOUT of 0 means “no timeout”, reintroducing indefinite hangs.- Curl handles are reused from a pool; when the
ifcondition is false (e.g. request_timeout=None/0), the option is left unchanged and can leak the previous request’s timeout.
Consider always setting a per-request value and usingTIMEOUT_MS/CONNECTTIMEOUT_MSwith rounding (and >=1ms) or explicitly unsetting/setting 0 when the timeout is disabled.
Suggested change
| if request.request_timeout: | |
| setopt(_pycurl.TIMEOUT, int(request.request_timeout)) | |
| if request.connect_timeout: | |
| setopt(_pycurl.CONNECTTIMEOUT, int(request.connect_timeout)) | |
| request_timeout = request.request_timeout | |
| connect_timeout = request.connect_timeout | |
| # Always set per-request timeout values on the handle to avoid | |
| # leaking settings between pooled Curl instances. Use millisecond | |
| # resolution when supported, rounding and ensuring that any | |
| # positive timeout is at least 1ms. A value of 0 disables the | |
| # timeout in libcurl/pycurl. | |
| if hasattr(_pycurl, 'TIMEOUT_MS'): | |
| if request_timeout: | |
| timeout_ms = max(1, int(round(request_timeout * 1000.0))) | |
| else: | |
| timeout_ms = 0 | |
| setopt(_pycurl.TIMEOUT_MS, timeout_ms) | |
| else: | |
| if request_timeout: | |
| # Fall back to second granularity; ensure at least 1s for | |
| # any positive timeout. | |
| timeout_s = max(1, int(round(request_timeout))) | |
| else: | |
| timeout_s = 0 | |
| setopt(_pycurl.TIMEOUT, timeout_s) | |
| if hasattr(_pycurl, 'CONNECTTIMEOUT_MS'): | |
| if connect_timeout: | |
| conn_timeout_ms = max(1, int(round(connect_timeout * 1000.0))) | |
| else: | |
| conn_timeout_ms = 0 | |
| setopt(_pycurl.CONNECTTIMEOUT_MS, conn_timeout_ms) | |
| else: | |
| if connect_timeout: | |
| conn_timeout_s = max(1, int(round(connect_timeout))) | |
| else: | |
| conn_timeout_s = 0 | |
| setopt(_pycurl.CONNECTTIMEOUT, conn_timeout_s) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The CurlClient._setup_request() method configures many pycurl options but never sets TIMEOUT or CONNECTTIMEOUT on the curl handle. The Request class already defines request_timeout (30s) and connect_timeout (30s), but these values are never applied to the underlying curl transfer.
Without TIMEOUT, pycurl will wait indefinitely for a response on a socket that remains open but stops delivering data. This happens during network partitions, proxy restarts, or sidecar terminations — the TCP connection stays established (no RST, no FIN) but the remote end stops sending. The in-flight HTTP request hangs forever, which permanently breaks the SQS async polling loop: the promise callback from _get_from_sqs never fires, so _loop1 is never re-invoked and the consumer stops polling for messages. The worker process stays alive (event loop keeps running, liveness probes pass) but never consumes another message.
With TIMEOUT set, pycurl aborts the stalled transfer after request_timeout seconds and returns an error. The existing error handling in _process/_process_pending_requests picks this up, the transport reconnects, and the polling loop resumes normally.