+As an example, we recently use this technique to debug a complex `illegal memory access` issue in vLLM, see [this PR](https://github.com/vllm-project/vllm/pull/22593) for more details. Basically, we add a [triton kernel](https://github.com/vllm-project/vllm/pull/22375) for MRope, but that kernel has an implicit assumption that `head_size==rotary_dim` (i.e. it's a full Rope). When `head_size!=rotary_dim` (i.e. it's a partial Rope), the kernel will trigger an `illegal memory access`, which is the case for the new [GLM-4.5V](https://huggingface.co/zai-org/GLM-4.5V) model. Without CUDA core dump, the error is reported as `Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:453 'an illegal memory access was encountered'`, which is very misleading. With CUDA core dump, we can easily pinpoint the error to the MRope kernel, and then fix it. Note that this example is caused by mis-configuration of the cuda kernel parameters, and finding the kernel that caused the issue is pretty enough for debugging. For more complicated `illegal memory access` issues, we still need to isolate the kernel and reproduce the issue in a minimal example instead of an end-to-end example, and then use more dedicated tools like [Compute Sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html#memcheck-tool) to further investigate the issue.
0 commit comments