Skip to content

Commit e9f7829

Browse files
authored
explain how to add debug information (#91)
Signed-off-by: youkaichao <[email protected]>
1 parent 059c7a8 commit e9f7829

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

_posts/2025-08-11-cuda-debugging.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -334,6 +334,7 @@ On some GPUs, this line might cause `invalid argument` error instead of `illegal
334334
3. Errors caused by improper use of the driver API are considered [non-sticky errors](https://forums.developer.nvidia.com/t/difference-in-error-handling-between-driver-api-and-runtime-api/336389) and are unrelated to the GPU itself. These errors are reported at the driver API level and do not trigger CUDA core dumps. A common example is an out-of-memory error during `cudaMalloc`, which will not result in a CUDA core dump.
335335
4. For distributed programs involving multi-GPU communication, memory mapping is often used to map the memory of other GPUs to the current GPU. If the program on another GPU exits, the mapped memory becomes invalid, and accessing it will trigger an `illegal memory access`. However, this does not fall under the typical `illegal memory access` issues. Such problems are common during the shutdown process of distributed programs. If GPUs are communicating during shutdown, the order of shutdown may cause some GPUs to report `illegal memory access`. When using CUDA core dump for such programs, it is important to distinguish these false positives.
336336
5. Enabling CUDA core dump does have some performance impact on CUDA kernels (since it needs to check for errors and attribute them when GPU threads exit). Therefore, it is not advisable to enable CUDA core dump in production environments. It is recommended to enable CUDA core dump only after errors like `illegal memory access` can be reliably reproduced for debugging purposes.
337+
6. To get the maximum benefit from CUDA core dump, it is recommended to recompile vLLM with debug symbols, or at least embed line information during compilation. Unfortunately, the default build of vLLM does not contain such information due to the binary size limit. To enjoy the benefit, users have to [compile vLLM from source](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#full-build-with-compilation) with an envrionment variable `export NVCC_PREPEND_FLAGS='-lineinfo'` or `export NVCC_PREPEND_FLAGS='-G'`. It is recommended to start from `-lineinfo`, and only switch to `-G` when `-lineinfo` is not enough. With rich debug information, cuda core dump can trace back to the exact line of code that caused the exception.
337338

338339
# Conclusion
339340

0 commit comments

Comments
 (0)