Skip to content

Commit 87d7ddb

Browse files
committed
update from lucas
Signed-off-by: youkaichao <[email protected]>
1 parent 37edd83 commit 87d7ddb

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2025-08-11-cuda-debugging.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ For debugging consider passing CUDA_LAUNCH_BLOCKING=1
2020
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2121
```
2222

23-
The error message suggests adding `CUDA_LAUNCH_BLOCKING=1` when running the code. However, there are still two problems:
23+
The challenging bit here is: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. In our experience the python stack traces for these types of exceptions are basically **always incorrect and pretty worthless**. To resolve this the error message suggests adding `CUDA_LAUNCH_BLOCKING=1` when running the code. However, there are still two problems:
2424

2525
1. Many people launch CUDA kernels using the `kernel<<<>>>` syntax without adding error checking for the kernel launch status, for example, this [code](https://github.com/pytorch/pytorch/blob/5e320eea665f773b78f6d3bfdbb1898b8e09e051/aten/src/ATen/native/cuda/SortStable.cu#L117). In such cases, even with `CUDA_LAUNCH_BLOCKING=1`, it’s still impossible to locate the faulty kernel.
2626
2. If the illegal memory access occurs inside a kernel within a CUDA graph, then even with `CUDA_LAUNCH_BLOCKING=1`, we can only see that there’s an issue when launching the CUDA graph, but still cannot pinpoint the exact kernel that failed.

0 commit comments

Comments
 (0)