You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Profiler] Follow vLLM pattern for diffusion profiler integration
Use vLLM's CudaProfilerWrapper/TorchProfilerWrapper in DiffusionWorker
instead of custom implementation. This unifies the profiler approach
between omni models and diffusion models.
- Import and use vLLM's profiler wrappers based on profiler_config
- VLLM_TORCH_CUDA_PROFILE=1 enables CudaProfilerWrapper for nsys
- VLLM_TORCH_PROFILER_DIR enables TorchProfilerWrapper for traces
- Remove dependency on CurrentProfiler from diffusion profiler module
- Update docs with vLLM-style nsys usage
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
Copy file name to clipboardExpand all lines: docs/contributing/profiling.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -135,11 +135,14 @@ python image_to_video.py \
135
135
136
136
### 4. Nsight Systems Profiling (Diffusion)
137
137
138
-
For deeper GPU-level analysis of diffusion workloads, use NVIDIA Nsight Systems (`nsys`). The diffusion worker integrates with nsys via `torch.cuda.profiler.start()/stop()` when profiling is triggered.
138
+
For deeper GPU-level analysis of diffusion workloads, use NVIDIA Nsight Systems (`nsys`). Diffusion workers follow the same profiler pattern as vLLM — set `VLLM_TORCH_CUDA_PROFILE=1` to enable the CUDA profiler which signals nsys via `torch.cuda.profiler.start()/stop()`.
Set `VLLM_TORCH_PROFILER_DIR`to trigger profiling, which also opens nsys capture regions in diffusion worker processes.
155
+
The `VLLM_TORCH_CUDA_PROFILE=1` environment variable configures diffusion workers to use vLLM's `CudaProfilerWrapper`, which brackets GPU work with `torch.cuda.profiler.start()/stop()` calls that nsys captures.
0 commit comments