Skip to content

Commit 5007adc

Browse files
authored
Merge branch 'main' into config_ref
2 parents 5595b93 + a8cec43 commit 5007adc

File tree

1 file changed

+26
-18
lines changed

1 file changed

+26
-18
lines changed

docs/contributing/metrics.md

Lines changed: 26 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,27 @@
11

2-
# Metrics vLLM-Omni:
2+
# Metrics
33

44
You can use these metrics in production to monitor the health and performance of the vLLM-omni system. Typical scenarios include:
5+
56
- **Performance Monitoring**: Track throughput (e.g., `e2e_avg_tokens_per_s`), latency (e.g., `e2e_total_ms`), and resource utilization to verify that the system meets expected standards.
7+
68
- **Debugging and Troubleshooting**: Use detailed per-request metrics to diagnose issues, such as high transfer times or unexpected token counts.
79

810
## How to Enable and View Metrics
911

10-
### 1. Start the Service with Metrics Logging
12+
### Start the Service with Metrics Logging
1113

1214
```bash
1315
vllm serve /workspace/models/Qwen3-Omni-30B-A3B-Instruct --omni --port 8014 --log-stats
1416
```
1517

16-
### 2. Send a Request
18+
### Send a Request
1719

1820
```bash
1921
python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image
2022
```
2123

22-
### 3. What You Will See
24+
### What You Will See
2325

2426
With `--log-stats` enabled, the server will output detailed metrics logs after each request. Example output:
2527

@@ -69,9 +71,13 @@ With `--log-stats` enabled, the server will output detailed metrics logs after e
6971

7072

7173
These logs include:
74+
7275
- **Overall summary**: total requests, wall time, average tokens/sec, etc.
76+
7377
- **E2E table**: per-request latency and token counts.
78+
7479
- **Stage table**: per-stage batch and timing details.
80+
7581
- **Transfer table**: data transfer and timing for each edge.
7682

7783
You can use these logs to monitor system health, debug performance, and analyze request-level metrics as described above.
@@ -87,6 +93,8 @@ For **online inference** (serving mode), the summary is always per-request. `e2e
8793

8894
## Parameter Details
8995

96+
### Summary Metrics
97+
9098
| Field | Meaning |
9199
|---------------------------|----------------------------------------------------------------------------------------------|
92100
| `e2e_requests` | Number of completed requests. |
@@ -98,7 +106,7 @@ For **online inference** (serving mode), the summary is always per-request. `e2e
98106

99107
---
100108

101-
## E2E Table (per request)
109+
### E2E Table (per request)
102110

103111
| Field | Meaning |
104112
|---------------------------|-----------------------------------------------------------------------|
@@ -110,7 +118,7 @@ For **online inference** (serving mode), the summary is always per-request. `e2e
110118

111119
---
112120

113-
## Stage Table (per stage event / request)
121+
### Stage Table (per stage event / request)
114122

115123
| Field | Meaning |
116124
|---------------------------|-------------------------------------------------------------------------------------------------|
@@ -125,7 +133,7 @@ For **online inference** (serving mode), the summary is always per-request. `e2e
125133

126134
---
127135

128-
## Transfer Table (per edge / request)
136+
### Transfer Table (per edge / request)
129137

130138
| Field | Meaning |
131139
|----------------------|---------------------------------------------------------------------------|
@@ -135,31 +143,31 @@ For **online inference** (serving mode), the summary is always per-request. `e2e
135143
| `in_flight_time_ms` | In-flight time in ms. |
136144

137145

138-
## Expectation of the Numbers (Verification)
146+
### Expectation of the Numbers (Verification)
139147

140148
**Formulas:**
149+
141150
- `e2e_total_tokens = Stage0's num_tokens_in + sum(all stages' num_tokens_out)`
151+
142152
- `transfers_total_time_ms = sum(tx_time_ms + rx_decode_time_ms + in_flight_time_ms)` for every edge
143153

144154
**Using the example above:**
145155

146-
### e2e_total_tokens
156+
**e2e_total_tokens**
157+
147158
- Stage0's `num_tokens_in`: **4,860**
148159
- Stage0's `num_tokens_out`: **67**
149160
- Stage1's `num_tokens_out`: **275**
150161
- Stage2's `num_tokens_out`: **0**
151162

152-
So,
153-
```
154-
e2e_total_tokens = 4,860 + 67 + 275 + 0 = 5,202
155-
```
156-
This matches the table value: `e2e_total_tokens = 5,202`.
163+
so `e2e_total_tokens = 4,860 + 67 + 275 + 0 = 5,202`, which matches the table value `e2e_total_tokens`.
164+
165+
**transfers_total_time_ms**
157166

158-
### transfers_total_time_ms
159167
For each edge:
168+
160169
- 0->1: tx_time_ms (**78.701**) + rx_decode_time_ms (**111.865**) + in_flight_time_ms (**2.015**) = **192.581**
161-
- 1->2: tx_time_ms (**18.790**) + rx_decode_time_ms (**31.706**) + in_flight_time_ms (**2.819**) = **53.315**
162170

163-
Sum: 192.581 + 53.315 = **245.896**
171+
- 1->2: tx_time_ms (**18.790**) + rx_decode_time_ms (**31.706**) + in_flight_time_ms (**2.819**) = **53.315**
164172

165-
The table shows `transfers_total_time_ms = 245.895`, which matches the calculation (difference is due to rounding).
173+
192.581 + 53.315 = **245.896** = transfers_total_time_ms, which matches the calculation (difference is due to rounding)

0 commit comments

Comments
 (0)