Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions numba_cuda/numba/cuda/cudadrv/driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -2117,14 +2117,14 @@ def query(self):
otherwise, returns False.
"""
try:
driver.cuEventQuery(self.handle)
handle = self.handle.value
driver.cuEventQuery(handle)
return True
except CudaAPIError as e:
if e.code == enums.CUDA_ERROR_NOT_READY:
return False
else:
raise
else:
return True

def record(self, stream=0):
"""
Expand Down
38 changes: 37 additions & 1 deletion numba_cuda/numba/cuda/tests/cudadrv/test_events.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# SPDX-License-Identifier: BSD-2-Clause

import numpy as np
from numba import cuda
from numba import cuda, int32
from numba.cuda.testing import unittest, CUDATestCase
from numba.cuda._compat import Device
from numba.cuda.testing import skip_on_cudasim
Expand Down Expand Up @@ -48,6 +48,42 @@ def event_elapsed_inner(self, stream):
# Exercise the code path
evtstart.elapsed_time(evtend)

def test_event_query(self):
from time import perf_counter

@cuda.jit
def spin(ms):
# Sleep for ms
for i in range(ms):
cuda.nanosleep(int32(1_000_000)) # 1 ms

stream = cuda.stream()
evt = cuda.event()

# Run once to compile
spin[1, 1, stream](1)

t0 = perf_counter()
spin_ms = 250
spin[1, 1, stream](250)
evt.record(stream)

# Query immediately.
# Query immediately.
event_time = perf_counter() - t0
while not evt.query():
event_time = perf_counter() - t0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of busy looping the CPU here we can create two events with timing=True, record the start event before launching the kernel, record the end event after launching the kernel, synchronize the stream after recording the end event, and then retrieve the timing from the events using numba.cuda.event_elapsed_time: https://nvidia.github.io/numba-cuda/reference/host.html#numba.cuda.event_elapsed_time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this would be a better design for a test of the Event object in general, but the specific regression we're testing for is that the Event.query() method either raises TypeError or returns a True before the event is synchronized. The timing methods look as if they call different driver functions; I can't find any usages of query or the cuEventQuery driver function anywhere else in the repo. Perhaps I'm patching dead/unused code here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not dead/unused, just used externally to numba-cuda primarily as opposed to internally. We do hope people use cuda.core directly instead of this API shim in the future to eventually deprecate this.

Based on your point about wanting to test event.query specifically, we could synchronize the event before querying (in which case it should always return True). This is nitpicking at this point where I'm fine if we want to do this polling.


# Synchronize and capture stream-finished time.
evt.synchronize()
sync_time = perf_counter() - t0

assert event_time * 1000 > spin_ms * 0.9 # nanosleep isn't reliable
assert sync_time * 1000 > spin_ms * 0.9 # nanosleep isn't reliable

# Give a few ms overhead for the synchronize call to complete
assert sync_time - event_time < 2e-3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert sync_time - event_time < 2e-3
assert sync_time - event_time < 10e-3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we've had a very poor experience with trying to find healthy timing tolerances for things like this. What I would suggest is that we ensure the time is a positive number, but beyond that I don't think we should worry about actually trying to capture the timing in a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - on a second look, this assertion is redundant anyway. If the event query time is >0.9 of the requested sleep time (250ms) then we can be pretty confident that the query didn't fall straight through and this extra test is pointless nitpicking. I'll reorder and clarify the two other assertions.



if __name__ == "__main__":
unittest.main()