CUDA: add config tests\n\nAdds kernel-based tests for device-side rea…#736
CUDA: add config tests\n\nAdds kernel-based tests for device-side rea…#736CodersAcademy006 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
…d-only access to CUDA config values: WARP_SIZE, MAX_THREADS_PER_BLOCK, and use in kernel control flow. Skips under cudasim. NumPy is used as the reference. Scope is intentionally limited to safe, well-defined CUDA config semantics.
Greptile SummaryThis PR adds tests for device-side CUDA config constants ( Critical issues:
What needs to happen: Confidence Score: 0/5
Important Files Changed
|
| def warp_size_kernel(out): | ||
| i = cuda.grid(1) | ||
| if i < out.size: | ||
| out[i] = cuda.config.WARP_SIZE |
There was a problem hiding this comment.
logic: cuda.config.WARP_SIZE is not implemented in the codebase. Searched the entire repository and found no device-side constant definition for this attribute. The cuda module's CudaModuleTemplate in cudadecl.py has no resolve_config method, and there's no mechanism to expose config values inside kernels.
| def max_threads_kernel(out): | ||
| i = cuda.grid(1) | ||
| if i < out.size: | ||
| out[i] = cuda.config.MAX_THREADS_PER_BLOCK |
There was a problem hiding this comment.
logic: cuda.config.MAX_THREADS_PER_BLOCK is not implemented. While driver.get_device().MAX_THREADS_PER_BLOCK exists for host-side access (see kernels/transpose.py:35), there's no device-side constant accessible within kernels.
| (d_inp, d_out), | ||
| inp.size, | ||
| ) | ||
| expected = inp * 2 if cuda.config.WARP_SIZE >= 32 else inp |
There was a problem hiding this comment.
logic: This line attempts host-side access to cuda.config.WARP_SIZE, but this also doesn't exist. The test references a non-existent API on both device and host sides.
|
@atmnp Could you please advise on the preferred approach for exposing these constants to CUDA kernels, or provide guidance on the implementation plan? Once the config attributes are available, I can update and validate these tests accordingly. |
CUDA: add config tests
This PR adds kernel-based tests for device-side read-only access to CUDA config values in Numba-CUDA:
cuda.config.WARP_SIZEcuda.config.MAX_THREADS_PER_BLOCKKey features:
This continues the systematic porting of CPU-side tests to CUDA, directly contributing to issue #515.