Skip to content

Commit e77d1bb

Browse files
committed
add Compute Sanitizer
Signed-off-by: youkaichao <[email protected]>
1 parent dcd3538 commit e77d1bb

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2025-08-11-cuda-debugging.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,7 @@ On some GPUs, this line might cause `invalid argument` error instead of `illegal
339339

340340
This blogpost analyzed the principles and use cases of CUDA core dump. This debugging method is effective for issues like improper kernel launches and kernel exceptions within CUDA graphs, making it a powerful tool for debugging `illegal memory access` issues and beyond.
341341

342-
As an example, we recently use this technique to debug a complex `illegal memory access` issue in vLLM, see [this PR](https://github.com/vllm-project/vllm/pull/22593) for more details. Basically, we add a [triton kernel](https://github.com/vllm-project/vllm/pull/22375) for MRope, but that kernel has an implicit assumption that `head_size==rotary_dim` (i.e. it's a full Rope). When `head_size!=rotary_dim` (i.e. it's a partial Rope), the kernel will trigger an `illegal memory access`, which is the case for the new [GLM-4.5V](https://huggingface.co/zai-org/GLM-4.5V) model. Without CUDA core dump, the error is reported as `Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:453 'an illegal memory access was encountered'`, which is very misleading. With CUDA core dump, we can easily pinpoint the error to the MRope kernel, and then fix it.
342+
As an example, we recently use this technique to debug a complex `illegal memory access` issue in vLLM, see [this PR](https://github.com/vllm-project/vllm/pull/22593) for more details. Basically, we add a [triton kernel](https://github.com/vllm-project/vllm/pull/22375) for MRope, but that kernel has an implicit assumption that `head_size==rotary_dim` (i.e. it's a full Rope). When `head_size!=rotary_dim` (i.e. it's a partial Rope), the kernel will trigger an `illegal memory access`, which is the case for the new [GLM-4.5V](https://huggingface.co/zai-org/GLM-4.5V) model. Without CUDA core dump, the error is reported as `Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:453 'an illegal memory access was encountered'`, which is very misleading. With CUDA core dump, we can easily pinpoint the error to the MRope kernel, and then fix it. Note that this example is caused by mis-configuration of the cuda kernel parameters, and finding the kernel that caused the issue is pretty enough for debugging. For more complicated `illegal memory access` issues, we still need to isolate the kernel and reproduce the issue in a minimal example instead of an end-to-end example, and then use more dedicated tools like [Compute Sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html#memcheck-tool) to further investigate the issue.
343343

344344
The vLLM project aims to provide easy, fast, and cheap LLM serving for everyone, and easy debugging is also an important aspect. We will continue to share more debugging tips and techniques in the future, to build a strong LLM inference ecosystem together. To share your story or usage with vLLM, please submit a PR at [the blogpost repository](https://github.com/vllm-project/vllm-project.github.io).
345345

0 commit comments

Comments
 (0)