update

youkaichao · youkaichao · commit 312ae9a3c58d · 2025-11-28T22:15:37.000+08:00
Signed-off-by: youkaichao &lt;youkaichao@gmail.com&gt;
diff --git a/_posts/2025-11-27-improved-cuda-debugging.md b/_posts/2025-11-27-improved-cuda-debugging.md
@@ -17,7 +17,7 @@ When a GPU kernel hangs, the program typically freezes or becomes unresponsive
 
 Fortunately, there is a better way. The CUDA driver includes a feature called `user induced GPU core dump generation`: the driver opens pipes in the operating system that allow users to trigger a core dump by writing to them. When triggered, the CUDA driver dumps the GPU state to core dump files, enabling inspection of what's happening inside the GPU and, most importantly, identifying which GPU kernel is hanging.
 
-Here is a simple example of a conditional hanging kernel:
+Consider a simple example of a conditional hanging kernel:
 
 ```python
 # save as conditional_hang.py
@@ -88,7 +88,7 @@ x = x + 2
 torch.cuda.synchronize()
 ```
 
-Directly executing the code will hang forever. We can enable the user induced GPU core dump generation to debug the issue:
+Executing this code will hang indefinitely. To debug the issue, we can enable user-induced GPU core dump generation:
 
 ```bash
 CUDA_ENABLE_USER_TRIGGERED_COREDUMP=1 \
@@ -100,15 +100,15 @@ CUDA_COREDUMP_FILE="/tmp/cuda_coredump_%h.%p.%t" \
 python conditional_hang.py
 ```
 
-While the code is running forever, and we suspect it is hanging somewhere, we can trigger the CUDA core dump by writing to the pipe:
+While the code is running indefinitely, we can trigger a CUDA core dump by writing to the pipe:
 
 ```bash
 dd if=/dev/zero bs=1M count=1 > /tmp/cuda_coredump_pipe_hostname.3000837.1764236276
 ```
 
-Here we write 1MB of zeros to the pipe, which will trigger the CUDA core dump. Simple `echo aaa > /tmp/cuda_coredump_pipe_hostname.3000837.1764236276` might not work due to the buffering of the pipe.
+We write 1MB of zeros to the pipe to trigger the CUDA core dump. Note that a simple `echo` command might not work due to pipe buffering.
 
-After we trigger the core dump, in the original terminal where we run the `python conditional_hang.py`, we will see the progress of the core dump:
+After triggering the core dump, the original terminal running `python conditional_hang.py` will display the core dump progress:
 
 ```text
 [01:39:15.256278] coredump: Writing ELF file to /tmp/cuda_coredump_hostname.3000837.1764236276
@@ -120,7 +120,7 @@ After we trigger the core dump, in the original terminal where we run the `pytho
 [01:39:15.292128] coredump: All done (took 00s)
 ```
 
-Then we can use `cuda-gdb` to open the core dump file, and see exactly where the kernel is hanging:
+We can then use `cuda-gdb` to open the core dump file and see exactly where the kernel is hanging:
 
 ```text
 Opening GPU coredump: /tmp/cuda_coredump_hostname.3000837.1764236276
@@ -129,9 +129,9 @@ Opening GPU coredump: /tmp/cuda_coredump_hostname.3000837.1764236276
 31                  tl.store(x_ptr + offs, x, mask=mask)
 ```
 
-Excitingly, we can not only exactly locate the kernel `conditional_hang_kernel`, but also the exact line of code that the kernel is hanging at. This is a huge improvement over the previous situation where we have no idea which kernel is hanging, not to mention the exact line of code that caused the hanging.
+This approach allows us to not only identify the hanging kernel (`conditional_hang_kernel`) but also pinpoint the exact line of code where it hangs. This represents a significant improvement over the previous situation, where identifying the problematic kernel was impossible, let alone the specific line causing the hang.
 
-One slightly annoying thing is that the core dump pipe's path is dynamically generated by the cuda driver, and it is not easy to find out. We can properly use `CUDA_COREDUMP_PIPE` environment variable to specify the template path of the core dump pipe, so that we can find it easily by looking at the file descriptors of the process:
+One minor inconvenience is that the core dump pipe's path is dynamically generated by the CUDA driver, making it difficult to locate. We can address this by using the `CUDA_COREDUMP_PIPE` environment variable to specify a template path for the core dump pipe, allowing us to find it easily by inspecting the process's file descriptors:
 
 ```bash
 $ ls /proc/3037675/fd/ -alth | grep /tmp/cuda_coredump_pipe_