fix: extract handle value in Event.query() call to match other driver calls by ccam80 · Pull Request #747 · NVIDIA/numba-cuda

ccam80 · 2026-01-26T21:16:05Z

PR Description

This PR extracts value from the Event's handle and passes it to the driver function, mirroring surrounding code.

It looks as if the Event.query method missed out on the update in 20a2e3b. The method includes a try/except statement to catch CUDA_ERROR_NOT_READY, but with a non-integer argument the method raises a TypeError. In my library on 0.23, no TypeErrors were raised for some reason, the call just returned True. On the current main branch, evt.query() raises the appropriate TypeError, as shown by the regression test in the PR. Using the handle.value instead fixes this and follows the pattern from the surrounding code.

MWE

from numba import cuda, int32
from time import perf_counter

@cuda.jit
def spin(ms):
    # Sleep for ms
    for i in range(ms):
        cuda.nanosleep(int32(1_000_000))  # 1 ms

stream = cuda.stream()
evt = cuda.event()

# Run once to compile
spin[1, 1, stream](1)

t0 = perf_counter()
spin_ms = 250
spin[1, 1, stream](250)
evt.record(stream)

# Query immediately.
while not evt.query():
    event_time = perf_counter() - t0

# Syncronize and capture stream-finished time.
evt.synchronize()
sync_time = perf_counter() - t0

print(f"Event time: {event_time}s")
print(f"Sync time:  {sync_time}s")

… calls It looks as if the `query` method missed out on the update in 20a2e3b. The method includes a try/except statement to catch CUDA_ERROR_NOT_READY, which was falling through to an `else` statement which returned True. This `else` swallowed the exception raised due to providing a non-integer handle and returned True no matter the progress of the stream being queried.

copy-pr-bot · 2026-01-26T21:16:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-01-26T21:18:00Z

Greptile Overview

Greptile Summary

Fixed Event.query() to extract handle.value before passing to cuEventQuery(), matching the pattern used in other Event methods (record(), synchronize(), wait()).

Key Changes:

Extracted handle.value in Event.query() to fix TypeError that occurred when passing the handle object directly to the driver function
Restructured try/except to move the return statement inside the try block for clearer control flow
Added comprehensive regression test that verifies query() works correctly with async GPU operations

Context:
This method was missed in commit df8f261 which refactored Event and Stream classes to decouple them from Context. All other Event methods were updated to extract .value from the handle, but query() was overlooked, causing it to pass the wrong type to the driver function.

Confidence Score: 5/5

This PR is safe to merge - it's a straightforward bug fix that makes the code consistent with other methods
The fix is minimal, well-tested, and follows the exact pattern used by all other Event methods. The regression test validates the fix works correctly.
No files require special attention

Important Files Changed

Filename	Overview
numba_cuda/numba/cuda/cudadrv/driver.py	Fixed `Event.query()` to extract handle value before passing to driver function, matching pattern used in other Event methods
numba_cuda/numba/cuda/tests/cudadrv/test_events.py	Added regression test for `Event.query()` that verifies it works correctly with async operations

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

numba_cuda/numba/cuda/tests/cudadrv/test_events.py

Copilot

Pull request overview

This PR fixes a bug in the Event.query() method where it was incorrectly passing a ctypes pointer object instead of its integer value to the driver function. This caused TypeErrors instead of the intended CUDA_ERROR_NOT_READY exception handling. The fix makes Event.query() consistent with other Event methods like record(), synchronize(), and wait().

Changes:

Fixed Event.query() to extract handle.value before passing to driver.cuEventQuery(), matching the pattern used in other Event methods
Added regression test test_event_query() to verify the query method works correctly with asynchronous stream operations

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
numba_cuda/numba/cuda/cudadrv/driver.py	Fixed Event.query() to pass integer handle value instead of c_void_p object to driver function, and restructured try/except for clarity
numba_cuda/numba/cuda/tests/cudadrv/test_events.py	Added regression test that exercises Event.query() in a realistic scenario with delayed stream operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

numba_cuda/numba/cuda/tests/cudadrv/test_events.py

gmarkall · 2026-01-27T14:20:14Z

/ok to test b698a5d

ccam80 · 2026-01-28T01:30:40Z

numba_cuda/numba/cuda/tests/cudadrv/test_events.py

+        assert sync_time * 1000 > spin_ms * 0.9  # nanosleep isnt reliable
+
+        # Give a few ms overhead for the synchronize call to complete
+        assert sync_time - event_time < 2e-3


Suggested change

assert sync_time - event_time < 2e-3

assert sync_time - event_time < 10e-3

ccam80 · 2026-01-28T01:32:56Z

Picking through the test results, it looks as if most are build failures or server timeouts - do these indicate a problem with the patch, or are they commonplace? Two tests failed due to an AssertionError - the 2ms overhead limit for the synchronize call was arbitrary and looks too small. I've added a suggestion to change it to 10ms, which still allows >200ms for the kernel call to return so shouldn't hide the bug.

I don't want to interfere with your test/review process - let me know if the suggested changes (copilot, greptile, me) look good and I can update the PR.

kkraus14 · 2026-01-28T04:28:06Z

numba_cuda/numba/cuda/tests/cudadrv/test_events.py

+        # Give a few ms overhead for the synchronize call to complete
+        assert sync_time - event_time < 2e-3


In general, we've had a very poor experience with trying to find healthy timing tolerances for things like this. What I would suggest is that we ensure the time is a positive number, but beyond that I don't think we should worry about actually trying to capture the timing in a test.

That makes sense - on a second look, this assertion is redundant anyway. If the event query time is >0.9 of the requested sleep time (250ms) then we can be pretty confident that the query didn't fall straight through and this extra test is pointless nitpicking. I'll reorder and clarify the two other assertions.

cpcloud · 2026-01-28T20:38:17Z

I'm going to apply some of the AI commentary to get rid of all the noise.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

numba_cuda/numba/cuda/tests/cudadrv/test_events.py

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

numba_cuda/numba/cuda/tests/cudadrv/test_events.py

…y-bug

ccam80 · 2026-01-29T21:24:46Z

@cpcloud Thanks for the de-noising, I've disabled auto-review for future PRs, greptile seems very capable of finding real errors and I'll do a typos pass before committing.

@kkraus14 Thanks for the guidance on the absolute time epsilon. The assertion wasn't really doing anything, so I've removed it and reorganised the other two to better indicate the source of failure if the test doesn't pass.

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

kkraus14 · 2026-01-29T22:26:01Z

numba_cuda/numba/cuda/tests/cudadrv/test_events.py

+        event_time = perf_counter() - t0
+        while not evt.query():
+            event_time = perf_counter() - t0


Instead of busy looping the CPU here we can create two events with timing=True, record the start event before launching the kernel, record the end event after launching the kernel, synchronize the stream after recording the end event, and then retrieve the timing from the events using numba.cuda.event_elapsed_time: https://nvidia.github.io/numba-cuda/reference/host.html#numba.cuda.event_elapsed_time

I see that this would be a better design for a test of the Event object in general, but the specific regression we're testing for is that the Event.query() method either raises TypeError or returns a True before the event is synchronized. The timing methods look as if they call different driver functions; I can't find any usages of query or the cuEventQuery driver function anywhere else in the repo. Perhaps I'm patching dead/unused code here?

It's not dead/unused, just used externally to numba-cuda primarily as opposed to internally. We do hope people use cuda.core directly instead of this API shim in the future to eventually deprecate this.

Based on your point about wanting to test event.query specifically, we could synchronize the event before querying (in which case it should always return True). This is nitpicking at this point where I'm fine if we want to do this polling.

kkraus14 · 2026-01-29T23:01:47Z

numba_cuda/numba/cuda/tests/cudadrv/test_events.py

+        # If this assertion fails, it was nanosleep inaccuracy that caused it
+        assert sync_time * 1000 > spin_ms * 0.9
+
+        # If this assertion fails, the event query returned early
+        assert event_time * 1000 > spin_ms * 0.9


I'm not confident in the 0.9 tolerance here where can we just assert that the elapsed time between the events (see above comment) is >0?

Yeah I take your point - if an absolute time delta was low-confidence before, this is no better. Have you found that any hard time threshold is flaky in this area? The early-return-True path should return in a few scant milliseconds, where sync_time should return somewhere in the ballpark of spin_ms (250ms as currently parameterised). I think sync_time > event_time is guaranteed by statement order in the current test design, so a gt/lt comparison won't cut it. Am I misintepreting your suggestion of event timing?

Have you found that any hard time threshold is flaky in this area?

Generally yes because we test against different Windows driver modes and they have different scheduling behaviors. Additionally the machines that run CI are subject to noisy neighbors, so in general any kind of tolerance that isn't egregious ends up eventually flaking.

Given we aren't actually trying to test the implementation of the event time tracking and rely on the CUDA driver having a valid implementation for that, I think it's okay for us to just do the basic assertion that the time captured is a non-zero number.

ccam80 · 2026-02-06T04:00:28Z

@kkraus14 thanks for the review time. I'm unsure if I'm misinterpreting your event timing suggestion (see comment replies) - as Event.query() is a polling method, I can't see how to structure the test without a polling loop of some kind. If this comes after the synchronize() call we'll miss the query method returning a True prematurely. If I'm just lacking in imagination here let me know - I've gone back and forth on this but can't see a path without a busy-wait!

greptile-apps · 2026-02-06T04:00:37Z

Automatic reviews are disabled for this repository.

…y-bug

ccam80 · 2026-02-09T02:57:05Z

@kkraus14 thanks again for the notes, all make sense. I've removed the busy wait and time-based checks. There are two bugs I want to test for, both in Numba-cuda implementation rather than the CUDA driver - Event.query() swallowing a TypeError and reporting True prematurely, and Event.query() raising a TypeError correctly. This tests it more transparently - query immediately after kernel invocation, sync, call query again. If the first query is True, it returned prematurely. If the second is not True, there's an implementation error. If it raises, the test will fail.

kkraus14 · 2026-02-10T03:20:41Z

numba_cuda/numba/cuda/tests/cudadrv/test_events.py

+        evt.synchronize()
+        synced_query = evt.query()
+
+        assert immediate_query is False, "Query returned True prematurely"


I unfortunately suspect this will be flaky as we could end up in a situation where in between launching the kernel and recording the event, or recording the event and querying it, we could have something like Python GC or some kind of OS level interrupt that could easily cause the recording and subsequent querying of the event to take longer than the 200ms spin kernel.

If we really wanted something more deterministic, we could use something like the AI generated kernel:

def test_event_query(self): stream = cuda.stream() evt = cuda.event() # Pinned memory for host-device synchronization started = cuda.pinned_array(1, dtype=np.int32) release = cuda.pinned_array(1, dtype=np.int32) @cuda.jit def gated_kernel(started_flag, release_flag): # Signal that kernel has started started_flag[0] = 1 # Spin until host releases us while release_flag[0] == 0: cuda.nanosleep(int32(1_000)) # Compile first started[0] = 0 release[0] = 1 # Don't block during warmup gated_kernel[1, 1, stream](started, release) stream.synchronize() # Reset for actual test started[0] = 0 release[0] = 0 # Launch - kernel will spin until we release it gated_kernel[1, 1, stream](started, release) evt.record(stream) # Wait until kernel confirms it's running while started[0] == 0: pass # Busy-wait (or use time.sleep with timeout) # NOW we have a guarantee: # - Kernel is running (started == 1) # - Kernel won't finish (release == 0) # - Event is recorded after kernel, so event cannot be complete immediate_query = evt.query() assert immediate_query is False, "Query returned True prematurely" # Release the kernel release[0] = 1 # Wait for completion evt.synchronize() synced_query = evt.query() assert synced_query is True, "Query returned False after sync"

This uses pinned host memory as a way for us to explicitly control the lifetime of the kernel from Python host code so that we aren't subject to any kind of timing issues.

@kkraus14 Thanks again for the time you've spent on this. I like that approach - in classic AI-generated fashion it hangs indefinitely if used verbatim, as pinned arrays don't manage their own transfer h<->d, but swapping for mapped arrays it works a charm. I've implemented this suggestion with minor comment edits and the array type swap, and confirmed passing on my machine.

The proposed test would fail given a ~200ms hang between kernel launch and the first query() call, which is possible in Windows/CI environments and which would cause the test to fail. Instead, mapped arrays track when the kernel starts and allow the host to release the kernel from an infinite spin. This gives two guarantees - the kernel has started, and the kernel hasn't finished, which are what's needed to verify that query() doesn't return prematurely. A final assertion after synchronisation checks that query() does return True when called after stream sync.

ccam80 · 2026-02-13T03:15:18Z

Implemented the host-controlled busy loop as suggested by @kkraus14, now Windows could take a 10s holiday mid-test if it was so inclined and the test would still verify that Event.query() is not returning True prematurely.

kkraus14 · 2026-02-13T14:28:15Z

/ok to test

copy-pr-bot · 2026-02-13T14:28:18Z

/ok to test

@kkraus14, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

kkraus14 · 2026-02-13T14:28:53Z

/ok to test 8138622

kkraus14 · 2026-02-25T17:46:53Z

/ok to test 138411e

numba_cuda/numba/cuda/tests/cudadrv/test_events.py

kkraus14 · 2026-02-25T17:57:36Z

/ok to test 1175800

ccam80 added 2 commits January 25, 2026 15:59

tests: add regression test for Event.query()

b698a5d

Copilot AI review requested due to automatic review settings January 26, 2026 21:16

Copilot started reviewing on behalf of ccam80 January 26, 2026 21:16 View session

greptile-apps bot reviewed Jan 26, 2026

View reviewed changes

numba_cuda/numba/cuda/tests/cudadrv/test_events.py Outdated Show resolved Hide resolved

Copilot AI reviewed Jan 26, 2026

View reviewed changes

gmarkall added the 3 - Ready for Review Ready for review by team label Jan 27, 2026

ccam80 commented Jan 28, 2026

View reviewed changes

kkraus14 reviewed Jan 28, 2026

View reviewed changes

kkraus14 approved these changes Jan 28, 2026

View reviewed changes

Apply suggestions from code review

876c90f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps bot reviewed Jan 28, 2026

View reviewed changes

numba_cuda/numba/cuda/tests/cudadrv/test_events.py Outdated Show resolved Hide resolved

Merge branch 'main' into event-query-bug

93e7577

greptile-apps bot reviewed Jan 29, 2026

View reviewed changes

numba_cuda/numba/cuda/tests/cudadrv/test_events.py Outdated Show resolved Hide resolved

ccam80 added 2 commits January 30, 2026 10:23

remove unnecessary assertion, clarify failure modes.

0951c7c

Merge remote-tracking branch 'origin/event-query-bug' into event-quer…

dc4fb1a

…y-bug

greptile-apps bot reviewed Jan 29, 2026

View reviewed changes

kkraus14 reviewed Jan 29, 2026

View reviewed changes

Merge branch 'main' into event-query-bug

f6d3c59

ccam80 and others added 3 commits February 9, 2026 15:52

fix: remove unnecessary busy-wait and time-based assertions.

ddbb792

Merge branch 'main' into event-query-bug

ce8c81a

Merge remote-tracking branch 'origin/event-query-bug' into event-quer…

1e92818

…y-bug

kkraus14 reviewed Feb 10, 2026

View reviewed changes

ccam80 and others added 2 commits February 13, 2026 15:50

Merge branch 'main' into event-query-bug

c24fc42

kkraus14 approved these changes Feb 13, 2026

View reviewed changes

Merge branch 'main' into event-query-bug

138411e

kkraus14 enabled auto-merge (squash) February 25, 2026 17:46

kkraus14 reviewed Feb 25, 2026

View reviewed changes

numba_cuda/numba/cuda/tests/cudadrv/test_events.py Show resolved Hide resolved

formatting fix

1175800

	assert sync_time - event_time < 2e-3
	assert sync_time - event_time < 10e-3

		# Give a few ms overhead for the synchronize call to complete
		assert sync_time - event_time < 2e-3

Conversation

ccam80 commented Jan 26, 2026

PR Description

MWE

Uh oh!

copy-pr-bot bot commented Jan 26, 2026

Uh oh!

greptile-apps bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmarkall commented Jan 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ccam80 commented Jan 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cpcloud commented Jan 28, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ccam80 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ccam80 commented Feb 6, 2026

Uh oh!

greptile-apps bot commented Feb 6, 2026

Uh oh!

ccam80 commented Feb 9, 2026

Uh oh!

kkraus14 Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ccam80 commented Feb 13, 2026

Uh oh!

kkraus14 commented Feb 13, 2026

greptile-apps bot commented Jan 26, 2026 •

edited

Loading

ccam80 commented Jan 29, 2026 •

edited

Loading

kkraus14 Feb 10, 2026 •

edited

Loading