Skip to content

Commit 312ae9a

Browse files
committed
update
Signed-off-by: youkaichao <[email protected]>
1 parent 8bbd332 commit 312ae9a

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

_posts/2025-11-27-improved-cuda-debugging.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ When a GPU kernel hangs, the program typically freezes or becomes unresponsive
1717

1818
Fortunately, there is a better way. The CUDA driver includes a feature called `user induced GPU core dump generation`: the driver opens pipes in the operating system that allow users to trigger a core dump by writing to them. When triggered, the CUDA driver dumps the GPU state to core dump files, enabling inspection of what's happening inside the GPU and, most importantly, identifying which GPU kernel is hanging.
1919

20-
Here is a simple example of a conditional hanging kernel:
20+
Consider a simple example of a conditional hanging kernel:
2121

2222
```python
2323
# save as conditional_hang.py
@@ -88,7 +88,7 @@ x = x + 2
8888
torch.cuda.synchronize()
8989
```
9090

91-
Directly executing the code will hang forever. We can enable the user induced GPU core dump generation to debug the issue:
91+
Executing this code will hang indefinitely. To debug the issue, we can enable user-induced GPU core dump generation:
9292

9393
```bash
9494
CUDA_ENABLE_USER_TRIGGERED_COREDUMP=1 \
@@ -100,15 +100,15 @@ CUDA_COREDUMP_FILE="/tmp/cuda_coredump_%h.%p.%t" \
100100
python conditional_hang.py
101101
```
102102

103-
While the code is running forever, and we suspect it is hanging somewhere, we can trigger the CUDA core dump by writing to the pipe:
103+
While the code is running indefinitely, we can trigger a CUDA core dump by writing to the pipe:
104104

105105
```bash
106106
dd if=/dev/zero bs=1M count=1 > /tmp/cuda_coredump_pipe_hostname.3000837.1764236276
107107
```
108108

109-
Here we write 1MB of zeros to the pipe, which will trigger the CUDA core dump. Simple `echo aaa > /tmp/cuda_coredump_pipe_hostname.3000837.1764236276` might not work due to the buffering of the pipe.
109+
We write 1MB of zeros to the pipe to trigger the CUDA core dump. Note that a simple `echo` command might not work due to pipe buffering.
110110

111-
After we trigger the core dump, in the original terminal where we run the `python conditional_hang.py`, we will see the progress of the core dump:
111+
After triggering the core dump, the original terminal running `python conditional_hang.py` will display the core dump progress:
112112

113113
```text
114114
[01:39:15.256278] coredump: Writing ELF file to /tmp/cuda_coredump_hostname.3000837.1764236276
@@ -120,7 +120,7 @@ After we trigger the core dump, in the original terminal where we run the `pytho
120120
[01:39:15.292128] coredump: All done (took 00s)
121121
```
122122

123-
Then we can use `cuda-gdb` to open the core dump file, and see exactly where the kernel is hanging:
123+
We can then use `cuda-gdb` to open the core dump file and see exactly where the kernel is hanging:
124124

125125
```text
126126
Opening GPU coredump: /tmp/cuda_coredump_hostname.3000837.1764236276
@@ -129,9 +129,9 @@ Opening GPU coredump: /tmp/cuda_coredump_hostname.3000837.1764236276
129129
31 tl.store(x_ptr + offs, x, mask=mask)
130130
```
131131

132-
Excitingly, we can not only exactly locate the kernel `conditional_hang_kernel`, but also the exact line of code that the kernel is hanging at. This is a huge improvement over the previous situation where we have no idea which kernel is hanging, not to mention the exact line of code that caused the hanging.
132+
This approach allows us to not only identify the hanging kernel (`conditional_hang_kernel`) but also pinpoint the exact line of code where it hangs. This represents a significant improvement over the previous situation, where identifying the problematic kernel was impossible, let alone the specific line causing the hang.
133133

134-
One slightly annoying thing is that the core dump pipe's path is dynamically generated by the cuda driver, and it is not easy to find out. We can properly use `CUDA_COREDUMP_PIPE` environment variable to specify the template path of the core dump pipe, so that we can find it easily by looking at the file descriptors of the process:
134+
One minor inconvenience is that the core dump pipe's path is dynamically generated by the CUDA driver, making it difficult to locate. We can address this by using the `CUDA_COREDUMP_PIPE` environment variable to specify a template path for the core dump pipe, allowing us to find it easily by inspecting the process's file descriptors:
135135

136136
```bash
137137
$ ls /proc/3037675/fd/ -alth | grep /tmp/cuda_coredump_pipe_

0 commit comments

Comments
 (0)