Fix test_cursor_execute_timeout failure on Windows Python 3.9/3.10#2822
Merged
sfc-gh-fpawlowski merged 10 commits intomainfrom Mar 25, 2026
Merged
Fix test_cursor_execute_timeout failure on Windows Python 3.9/3.10#2822sfc-gh-fpawlowski merged 10 commits intomainfrom
sfc-gh-fpawlowski merged 10 commits intomainfrom
Conversation
Replace time.sleep(10) with threading.Event synchronization in test_cursor_execute_timeout. On Windows Python <3.11, time.sleep() uses alertable I/O (WaitForSingleObjectEx) which can return early when APCs are triggered by --dist worksteal socket communication, causing the timebomb to be cancelled before it fires. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sfc-gh-mborins
approved these changes
Mar 24, 2026
…endent Replace the threading.Event-based approach with a synchronous timer mock: - Patch _TrackedQueryCancellationTimer to fire its callback immediately on start() - This eliminates all background threads and blocking waits from the test - The previous Event-based fix was still flaky: the timer thread could crash before setting the event (e.g. due to platform-specific threading behavior on Windows Python <3.11 under --dist worksteal), leaving cancel_called permanently unset and causing a 10s timeout followed by assertion failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace threading.Event polling loop with threading.Lock.acquire() (no timeout). Lock.acquire() with no timeout maps to WaitForSingleObjectEx with INFINITE, which cannot return early. The previous approach used Event.wait(N) with finite N, which on Windows Python <3.11 under --dist worksteal can return before N seconds due to heavy socket I/O. The Lock is pre-acquired before mock_cmd_query runs; the mock's side_effect releases it; mock_cmd_query blocks on the second acquire until the real timer fires. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Lock.acquire() (INFINITE) approach correctly prevents spurious early returns on Windows Python <3.11, but removed the 10-second safety bound that caused the old test to fail cleanly when __cancel_query was never called. Without a bound, a timer that fails to fire causes an indefinite hang rather than a clear assertion failure. pytest-timeout is already a declared dependency with a global 1200s default. The per-test @pytest.mark.timeout(30) tightens this to 30s and produces a named TIMEOUT failure instead of an opaque job-level kill. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Lock approach deadlocked: cursor.execute() calls cmd_query in the same thread as the test, so cancel_lock.acquire() (pre-held by the test thread) blocked indefinitely when called again from mock_cmd_query. Python's Lock documents that a locked acquire() waits for "another thread" to release it — same-thread re-entry deadlocks. Semaphore(0) has the same INFINITE wait property (WaitForSingleObjectEx with INFINITE, immune to Windows WAIT_FAILED under socket load) but requires no pre-acquisition. The semaphore starts at 0; mock_cmd_query blocks on acquire(); the timer fires after 1s, calls release(), and mock_cmd_query unblocks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…check Two independent failure modes observed on Windows Python <3.11 under --dist worksteal: 1. Event.wait(N) with a finite timeout returns early due to WaitForSingleObjectEx returning WAIT_FAILED under heavy socket I/O, causing mock_cmd_query to exit before the timer fires. The timer is then cancelled in the finally block and the assertion fails. 2. INFINITE waits (Lock/Semaphore with no timeout) hang when the timer thread is CPU-starved on an overloaded CI runner and never gets scheduled within the pytest-timeout window. Fix: poll with 50ms Event.wait() slices and check Event.is_set() between each slice. Event.is_set() reads self._flag (a Python bool under the GIL) with no OS call — always reliable regardless of WaitForSingleObjectEx state. Event.set() writes self._flag reliably even if notify fails to wake sleepers. 15s deadline + @pytest.mark.timeout(60) provide a clean failure path without infinite hangs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-dist load (#2826) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Removed timeout decorator and refactored test logic to handle cancellation more reliably.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
test_cursor_execute_timeoutconsistently failing on Windows Python 3.9 and 3.10 since--dist workstealwas added in Fix AWS integration regressions without matrix changes #2819time.sleep(10)withthreading.Eventsynchronization somock_cmd_queryblocks until the timebomb actually fires, rather than relying on sleep durationRoot cause
On Windows Python <3.11,
time.sleep()usesWaitForSingleObjectExwith alertable I/O. The--dist workstealxdist mode causes frequent inter-worker socket communication, which triggers APCs (Asynchronous Procedure Calls) that waketime.sleep()early without raising an exception (CPython issue, fixed in 3.11). When sleep returns early, thefinallyblock in_execute_helpercancels the timebomb before it fires, so__cancel_queryis never called.Test plan
🤖 Generated with Claude Code