diff --git a/_posts/2025-08-11-cuda-debugging.md b/_posts/2025-08-11-cuda-debugging.md index 66956a4..a45e9a3 100644 --- a/_posts/2025-08-11-cuda-debugging.md +++ b/_posts/2025-08-11-cuda-debugging.md @@ -334,6 +334,7 @@ On some GPUs, this line might cause `invalid argument` error instead of `illegal 3. Errors caused by improper use of the driver API are considered [non-sticky errors](https://forums.developer.nvidia.com/t/difference-in-error-handling-between-driver-api-and-runtime-api/336389) and are unrelated to the GPU itself. These errors are reported at the driver API level and do not trigger CUDA core dumps. A common example is an out-of-memory error during `cudaMalloc`, which will not result in a CUDA core dump. 4. For distributed programs involving multi-GPU communication, memory mapping is often used to map the memory of other GPUs to the current GPU. If the program on another GPU exits, the mapped memory becomes invalid, and accessing it will trigger an `illegal memory access`. However, this does not fall under the typical `illegal memory access` issues. Such problems are common during the shutdown process of distributed programs. If GPUs are communicating during shutdown, the order of shutdown may cause some GPUs to report `illegal memory access`. When using CUDA core dump for such programs, it is important to distinguish these false positives. 5. Enabling CUDA core dump does have some performance impact on CUDA kernels (since it needs to check for errors and attribute them when GPU threads exit). Therefore, it is not advisable to enable CUDA core dump in production environments. It is recommended to enable CUDA core dump only after errors like `illegal memory access` can be reliably reproduced for debugging purposes. +6. To get the maximum benefit from CUDA core dump, it is recommended to recompile vLLM with debug symbols, or at least embed line information during compilation. Unfortunately, the default build of vLLM does not contain such information due to the binary size limit. To enjoy the benefit, users have to [compile vLLM from source](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#full-build-with-compilation) with an envrionment variable `export NVCC_PREPEND_FLAGS='-lineinfo'` or `export NVCC_PREPEND_FLAGS='-G'`. It is recommended to start from `-lineinfo`, and only switch to `-G` when `-lineinfo` is not enough. With rich debug information, cuda core dump can trace back to the exact line of code that caused the exception. # Conclusion