Skip to content

Fix test_cursor_execute_timeout failure on Windows Python 3.9/3.10#2822

Merged
sfc-gh-fpawlowski merged 10 commits intomainfrom
fix-test-cursor-execute-timeout-windows
Mar 25, 2026
Merged

Fix test_cursor_execute_timeout failure on Windows Python 3.9/3.10#2822
sfc-gh-fpawlowski merged 10 commits intomainfrom
fix-test-cursor-execute-timeout-windows

Conversation

@sfc-gh-fpawlowski
Copy link
Contributor

Summary

  • Fix test_cursor_execute_timeout consistently failing on Windows Python 3.9 and 3.10 since --dist worksteal was added in Fix AWS integration regressions without matrix changes #2819
  • Replace time.sleep(10) with threading.Event synchronization so mock_cmd_query blocks until the timebomb actually fires, rather than relying on sleep duration

Root cause

On Windows Python <3.11, time.sleep() uses WaitForSingleObjectEx with alertable I/O. The --dist worksteal xdist mode causes frequent inter-worker socket communication, which triggers APCs (Asynchronous Procedure Calls) that wake time.sleep() early without raising an exception (CPython issue, fixed in 3.11). When sleep returns early, the finally block in _execute_helper cancels the timebomb before it fires, so __cancel_query is never called.

Test plan

  • Verify Windows Python 3.9 and 3.10 unit tests pass in CI
  • Verify no regressions on other platforms

🤖 Generated with Claude Code

Replace time.sleep(10) with threading.Event synchronization in
test_cursor_execute_timeout. On Windows Python <3.11, time.sleep()
uses alertable I/O (WaitForSingleObjectEx) which can return early
when APCs are triggered by --dist worksteal socket communication,
causing the timebomb to be cancelled before it fires.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sfc-gh-fpawlowski sfc-gh-fpawlowski requested a review from a team as a code owner March 24, 2026 22:35
sfc-gh-fpawlowski and others added 2 commits March 24, 2026 22:36
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sfc-gh-fpawlowski sfc-gh-fpawlowski added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO_ASYNC_CHANGES labels Mar 24, 2026
sfc-gh-fpawlowski and others added 7 commits March 24, 2026 23:49
…endent

Replace the threading.Event-based approach with a synchronous timer mock:
- Patch _TrackedQueryCancellationTimer to fire its callback immediately on start()
- This eliminates all background threads and blocking waits from the test
- The previous Event-based fix was still flaky: the timer thread could crash
  before setting the event (e.g. due to platform-specific threading behavior
  on Windows Python <3.11 under --dist worksteal), leaving cancel_called
  permanently unset and causing a 10s timeout followed by assertion failure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace threading.Event polling loop with threading.Lock.acquire() (no
timeout). Lock.acquire() with no timeout maps to WaitForSingleObjectEx
with INFINITE, which cannot return early. The previous approach used
Event.wait(N) with finite N, which on Windows Python <3.11 under
--dist worksteal can return before N seconds due to heavy socket I/O.

The Lock is pre-acquired before mock_cmd_query runs; the mock's
side_effect releases it; mock_cmd_query blocks on the second acquire
until the real timer fires.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Lock.acquire() (INFINITE) approach correctly prevents spurious early
returns on Windows Python <3.11, but removed the 10-second safety bound
that caused the old test to fail cleanly when __cancel_query was never
called. Without a bound, a timer that fails to fire causes an indefinite
hang rather than a clear assertion failure.

pytest-timeout is already a declared dependency with a global 1200s
default. The per-test @pytest.mark.timeout(30) tightens this to 30s and
produces a named TIMEOUT failure instead of an opaque job-level kill.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Lock approach deadlocked: cursor.execute() calls cmd_query in the
same thread as the test, so cancel_lock.acquire() (pre-held by the test
thread) blocked indefinitely when called again from mock_cmd_query.
Python's Lock documents that a locked acquire() waits for "another
thread" to release it — same-thread re-entry deadlocks.

Semaphore(0) has the same INFINITE wait property (WaitForSingleObjectEx
with INFINITE, immune to Windows WAIT_FAILED under socket load) but
requires no pre-acquisition. The semaphore starts at 0; mock_cmd_query
blocks on acquire(); the timer fires after 1s, calls release(), and
mock_cmd_query unblocks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…check

Two independent failure modes observed on Windows Python <3.11 under
--dist worksteal:

1. Event.wait(N) with a finite timeout returns early due to
   WaitForSingleObjectEx returning WAIT_FAILED under heavy socket I/O,
   causing mock_cmd_query to exit before the timer fires. The timer is
   then cancelled in the finally block and the assertion fails.

2. INFINITE waits (Lock/Semaphore with no timeout) hang when the timer
   thread is CPU-starved on an overloaded CI runner and never gets
   scheduled within the pytest-timeout window.

Fix: poll with 50ms Event.wait() slices and check Event.is_set() between
each slice. Event.is_set() reads self._flag (a Python bool under the GIL)
with no OS call — always reliable regardless of WaitForSingleObjectEx
state. Event.set() writes self._flag reliably even if notify fails to
wake sleepers. 15s deadline + @pytest.mark.timeout(60) provide a clean
failure path without infinite hangs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-dist load (#2826)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Removed timeout decorator and refactored test logic to handle cancellation more reliably.
@sfc-gh-fpawlowski sfc-gh-fpawlowski merged commit 1f6635d into main Mar 25, 2026
47 of 49 checks passed
@sfc-gh-fpawlowski sfc-gh-fpawlowski deleted the fix-test-cursor-execute-timeout-windows branch March 25, 2026 11:44
@github-actions github-actions bot locked and limited conversation to collaborators Mar 25, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

NO_ASYNC_CHANGES NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants