You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Performance Profiling Guidelines Profiling capabilities in vLLM-Omni are reserved for development and maintenance tasks aimed at temporal analysis of the codebase. Production use is **strongly discouraged**; enabling the profiler incurs a substantial overhead that negatively impacts inference latency.
3
+
> **Warning:**Profiling incurs significant overhead. Use only for development and debugging, never in production.
4
4
5
-
**Mechanism**: vLLM-Omni implements cross-stage profiling via the PyTorch Profiler. To accommodate the architecture—where stages operate as distinct engine instances in separate processes—the profiling interface supports both holistic capturing (all stages) and targeted capturing (specific stages).
5
+
vLLM-Omni uses the PyTorch Profiler to analyze performance across both **Multi-Stage LLMs**and **Diffusion Models**.
6
6
7
-
**1. Enabling the Profiler**
7
+
### 1. Set the Output Directory
8
+
Before running any script, set this environment variable. The system detects this and automatically saves traces here.
8
9
9
-
Before running your script, you must set the ```VLLM_TORCH_PROFILER_DIR``` environment
**Highly Recommended: Limit Profiling to a Single Iteration**
17
-
For most use cases (especially when profiling audio stages), you should limit the profiler to just **one iteration** to keep trace files small and readable.
14
+
### 2. Start Profiling
18
15
16
+
It is best to limit profiling to one iteration to keep trace files manageable.
19
17
20
18
```bash
21
19
export VLLM_PROFILER_MAX_ITERS=1
22
20
```
23
21
24
-
**2. Offline Inference**
25
-
26
-
For offlinie processing using ```OmniLLM```, you can wrap your ```generate``` calls with ```start_profile``` and ```stop_profile()```.
The profiler is default to function across all stages. But It is highly recommended to profile specific stages by passing the stages list, preventing from producing too large trace files:
For online serving using AsyncOmni, the methods are asynchronous. This allows you to toggle profiling dynamically without restarting the server.
64
138
@@ -77,23 +151,16 @@ async for output in async_omni.generate(prompt, sampling_params, request_id):
77
151
await async_omni.stop_profile()
78
152
```
79
153
80
-
**4. Analyzing Omni Traces**
154
+
### 3. Analyzing Omni Traces
81
155
82
-
After ``stop_profile()`` completes (and the file write wait time has passed), the directory specified in ```VLLM_TORCH_PROFILER_DIR``` will contain the trace files.
156
+
Output files are saved to your configured ```VLLM_TORCH_PROFILER_DIR```.
**Note**: vLLM-Omni reuses the PyTorch Profiler infrastructure from vLLM.
99
-
For more advanced configuration options (memory profiling, custom activities, etc.), see the official vLLM profiler documentation: [vLLM Profiling Guide](https://docs.vllm.ai/en/latest/dev/profiling.html)
166
+
**Note**: vLLM-Omni reuses the PyTorch Profiler infrastructure from vLLM. See the official vLLM profiler documentation: [vLLM Profiling Guide](https://docs.vllm.ai/en/latest/dev/profiling.html)
0 commit comments