[Feature] Opt metrics structure by LJH-LBJ · Pull Request #891 · vllm-project/vllm-omni

LJH-LBJ · 2026-01-22T02:02:04Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Resolves: #533
make metrics more clear and opt metrics's format

design doc:
https://docs.google.com/document/d/1St1tHMyp1kPwbYzHUFJYQHBGoWcQJA_dcGb9pemUZGI/edit?tab=t.0

Test Plan

Test 1
Omni online inference

vllm serve /workspace/models/Qwen3-Omni-30B-A3B-Instruct --omni --port 8014 --log-stats

python openai_chat_completion_client_for_multimodal_generation.py \
  --query-type use_video \
  --video-path t2v_out_1.mp4 \
  --model /workspace/models/Qwen3-Omni-30B-A3B-Instruct \
  --prompt "What are the main activities shown in this video?"

Test 2
Omni offline inference
need to add --log-stats in run_multiple_prompts.sh

python end2end.py --output-wav output_audio \
                  --query-type text \
                  --txt-prompts text_prompts_10.txt \
                  --py-generator \
                  --log-stats

cd examples/offline_inference/qwen3_omni
bash run_multiple_prompts.sh

Test 3

vllm serve /workspace/models/Qwen-Image --omni --port 8014 --log-stats

curl -s http://localhost:8014/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "a cup of coffee on the table"}
    ],
    "extra_body": {
      "height": 1024,
      "width": 1024,
      "num_inference_steps": 50,
      "guidance_scale": 4.0,
      "seed": 42
    }
  }' \
  | jq -r '.choices[0].message.content[0].image_url.url' \
  | cut -d',' -f2 | base64 -d > coffee.png

Test Result

Test result 1

(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] [Overall Summary]
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] +-----------------------------+------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | Field                       |      Value |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] +-----------------------------+------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_requests                |          1 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_wall_time_ms            | 40,828.324 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_total_tokens            |      5,105 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_avg_time_per_request_ms | 40,828.324 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_avg_tokens_per_s        |    125.036 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_stage_0_wall_time_ms    | 10,659.139 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_stage_1_wall_time_ms    | 24,827.949 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_stage_2_wall_time_ms    |    625.227 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] +-----------------------------+------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] 
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] [RequestE2EStats [request_id=chatcmpl-bd653d4b6bcdc00e]]
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] +-------------------------+-------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] | Field                   |       Value |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] +-------------------------+-------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] | e2e_total_ms            |  40,827.682 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] | e2e_total_tokens        |       5,105 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] | transfers_total_kbytes  | 137,606.358 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] | transfers_total_time_ms |     349.074 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] +-------------------------+-------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] 
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] [StageRequestStats [request_id=chatcmpl-bd653d4b6bcdc00e]]
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] +------------------------+-----------+-----------+---------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | Field                  |         0 |         1 |       2 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] +------------------------+-----------+-----------+---------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | audio_generated_frames |         0 |         0 | 362,325 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | batch_id               |        53 |       189 |       0 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | batch_size             |         1 |         1 |       1 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | num_tokens_in          |     4,860 |     4,826 |   3,024 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | num_tokens_out         |        55 |       190 |       0 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | postprocess_time_ms    | 4,523.629 |     0.533 |   0.000 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | stage_gen_time_ms      |   120.209 | 1,010.551 | 582.322 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] +------------------------+-----------+-----------+---------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] 
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] [TransferEdgeStats [request_id=chatcmpl-bd653d4b6bcdc00e]]
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] +-------------------+-------------+------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] | Field             |        0->1 |       1->2 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] +-------------------+-------------+------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] | in_flight_time_ms |       2.096 |      2.588 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] | rx_decode_time_ms |     125.193 |     30.728 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] | size_kbytes       | 108,797.315 | 28,809.043 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] | tx_time_ms        |     158.411 |     30.057 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] +-------------------+-------------+------------+

Test 2 result

INFO 02-06 15:29:11 [stats.py:454] [Overall Summary]
INFO 02-06 15:29:11 [stats.py:454] +-----------------------------+------------+
INFO 02-06 15:29:11 [stats.py:454] | Field                       |      Value |
INFO 02-06 15:29:11 [stats.py:454] +-----------------------------+------------+
INFO 02-06 15:29:11 [stats.py:454] | e2e_requests                |         10 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_wall_time_ms            | 81,430.702 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_total_tokens            |      3,347 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_avg_time_per_request_ms |  8,143.070 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_avg_tokens_per_s        |     41.102 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_stage_0_wall_time_ms    | 19,442.824 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_stage_1_wall_time_ms    | 59,771.091 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_stage_2_wall_time_ms    |  3,015.388 |
INFO 02-06 15:29:11 [stats.py:454] +-----------------------------+------------+
INFO 02-06 15:29:11 [stats.py:480] 
INFO 02-06 15:29:11 [stats.py:480] [RequestE2EStats [request_id=0_72e5beab-aa6d-447d-9727-a3ca66667ac0]]
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:480] | Field                   |      Value |
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:480] | e2e_total_ms            | 78,705.673 |
INFO 02-06 15:29:11 [stats.py:480] | e2e_total_tokens        |         89 |
INFO 02-06 15:29:11 [stats.py:480] | transfers_total_kbytes  |  1,187.154 |
INFO 02-06 15:29:11 [stats.py:480] | transfers_total_time_ms |     10.231 |
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:533] 
INFO 02-06 15:29:11 [stats.py:533] [StageRequestStats [request_id=0_72e5beab-aa6d-447d-9727-a3ca66667ac0]]
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+---------+
INFO 02-06 15:29:11 [stats.py:533] | Field               |         0 |          1 |       2 |
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+---------+
INFO 02-06 15:29:11 [stats.py:533] | batch_id            |         1 |          1 |       1 |
INFO 02-06 15:29:11 [stats.py:533] | batch_size          |        10 |         10 |       1 |
INFO 02-06 15:29:11 [stats.py:533] | num_tokens_in       |        55 |         21 |     400 |
INFO 02-06 15:29:11 [stats.py:533] | num_tokens_out      |         8 |         26 |       0 |
INFO 02-06 15:29:11 [stats.py:533] | postprocess_time_ms | 1,451.138 |      0.481 |   0.000 |
INFO 02-06 15:29:11 [stats.py:533] | stage_gen_time_ms   | 7,121.535 | 49,647.185 | 285.141 |
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+---------+
INFO 02-06 15:29:11 [stats.py:573] 
INFO 02-06 15:29:11 [stats.py:573] [TransferEdgeStats [request_id=0_72e5beab-aa6d-447d-9727-a3ca66667ac0]]
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+-------+
INFO 02-06 15:29:11 [stats.py:573] | Field             |      0->1 |  1->2 |
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+-------+
INFO 02-06 15:29:11 [stats.py:573] | in_flight_time_ms |     1.047 | 1.672 |
INFO 02-06 15:29:11 [stats.py:573] | rx_decode_time_ms |     2.749 | 1.676 |
INFO 02-06 15:29:11 [stats.py:573] | size_kbytes       | 1,185.429 | 1.726 |
INFO 02-06 15:29:11 [stats.py:573] | tx_time_ms        |     2.280 | 0.806 |
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+-------+
INFO 02-06 15:29:11 [stats.py:480] 
INFO 02-06 15:29:11 [stats.py:480] [RequestE2EStats [request_id=1_0184b448-9ab3-40a5-85a7-06b2aa1ffcfe]]
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:480] | Field                   |      Value |
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:480] | e2e_total_ms            | 79,877.905 |
INFO 02-06 15:29:11 [stats.py:480] | e2e_total_tokens        |        434 |
INFO 02-06 15:29:11 [stats.py:480] | transfers_total_kbytes  |  4,630.604 |
INFO 02-06 15:29:11 [stats.py:480] | transfers_total_time_ms |    298.846 |
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:533] 
INFO 02-06 15:29:11 [stats.py:533] [StageRequestStats [request_id=1_0184b448-9ab3-40a5-85a7-06b2aa1ffcfe]]
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+-----------+
INFO 02-06 15:29:11 [stats.py:533] | Field               |         0 |          1 |         2 |
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+-----------+
INFO 02-06 15:29:11 [stats.py:533] | batch_id            |         1 |          1 |         2 |
INFO 02-06 15:29:11 [stats.py:533] | batch_size          |        10 |         10 |         1 |
INFO 02-06 15:29:11 [stats.py:533] | num_tokens_in       |        57 |         23 |     4,528 |
INFO 02-06 15:29:11 [stats.py:533] | num_tokens_out      |        93 |        284 |         0 |
INFO 02-06 15:29:11 [stats.py:533] | postprocess_time_ms |     8.152 |      0.377 |     0.000 |
INFO 02-06 15:29:11 [stats.py:533] | stage_gen_time_ms   | 7,121.535 | 49,647.185 | 1,161.031 |
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+-----------+
INFO 02-06 15:29:11 [stats.py:573] 
INFO 02-06 15:29:11 [stats.py:573] [TransferEdgeStats [request_id=1_0184b448-9ab3-40a5-85a7-06b2aa1ffcfe]]
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+---------+
INFO 02-06 15:29:11 [stats.py:573] | Field             |      0->1 |    1->2 |
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+---------+
INFO 02-06 15:29:11 [stats.py:573] | in_flight_time_ms |     0.000 | 285.377 |
INFO 02-06 15:29:11 [stats.py:573] | rx_decode_time_ms |     3.297 |   3.658 |
INFO 02-06 15:29:11 [stats.py:573] | size_kbytes       | 4,617.656 |  12.948 |
INFO 02-06 15:29:11 [stats.py:573] | tx_time_ms        |     6.226 |   0.288 |
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+---------+
...

Test result 3

(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] [Overall Summary]
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] +-----------------------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] | Field                       |      Value |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] +-----------------------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] | e2e_requests                |          1 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] | e2e_wall_time_ms            | 19,773.057 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] | e2e_avg_time_per_request_ms | 19,773.057 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] | e2e_stage_0_wall_time_ms    | 19,772.584 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] +-----------------------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] 
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] [RequestE2EStats [request_id=chatcmpl-eac6b12cee4f45c4]]
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] +--------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] | Field        |      Value |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] +--------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] | e2e_total_ms | 19,772.583 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] +--------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] 
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] [StageRequestStats [request_id=chatcmpl-eac6b12cee4f45c4]]
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] +---------------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | Field               |          0 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] +---------------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | batch_size          |          1 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | image_num           |          1 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | postprocess_time_ms |      1,726 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | resolution          |        640 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | stage_gen_time_ms   | 19,742.906 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] +---------------------+------------+
(APIServer pid=665125) INFO 02-06 15:35:00 [serving_chat.py:2086] Diffusion chat completed for request chatcmpl-eac6b12cee4f45c4: 1 images

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

…lm-omni into opt_metrics_structure

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 443022b0a4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-01-22T02:07:18Z

vllm_omni/metrics/stats.py

+                batch_id=metrics.get("batch_id", -1),
+                batch_size=metrics.get("batch_size"),
+                stage_gen_time_ms=self.accumulated_gen_time_ms.pop(req_id, 0.0),


Preserve stage generation time for sync orchestrator metrics

In _as_stage_request_stats, stage_gen_time_ms is always taken from accumulated_gen_time_ms.pop(req_id, 0.0) and the value provided in the metrics dict is ignored. That accumulator is only updated in the async pipeline; the synchronous Omni path never adds to it, so per-request stage timing (and any derived rates) become zero in non-async runs. This is a regression in metrics accuracy for synchronous serving; consider falling back to metrics.get("stage_gen_time_ms") when the accumulator is empty or populating the accumulator in the sync path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-01-22T02:07:18Z

vllm_omni/entrypoints/async_omni.py

+                    # Derive inputs for the next stage, record preprocess time
+                    _prep_t0 = time.perf_counter()
                    next_inputs = next_stage.process_engine_inputs(self.stage_list, prompt)
+                    _prep_ms = (time.perf_counter() - _prep_t0) * 1000.0
+                    metrics.record_stage_preprocess_time(next_stage_id, req_id, _prep_ms)


Avoid dropping preprocess timing before stage stats exist

The preprocess time is recorded immediately after process_engine_inputs but before the next stage has produced any metrics. record_stage_preprocess_time only updates existing stage_events entries, so at this point there is no entry for next_stage_id, causing the value to be dropped and leaving preprocess_time_ms at 0 for all requests in async multi-stage runs. To make this metric usable, buffer it until on_stage_metrics creates the stage entry or move the recording to after metrics are emitted.

Useful? React with 👍 / 👎.

hsliuustc0106 · 2026-01-22T02:14:28Z

does this also apply to dit models?

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

docs/contributing/metrics.md

Bounty-hunter · 2026-02-06T09:05:03Z

docs/contributing/metrics.md

+| `size_kbytes`        | Total kbytes transferred.                                                 |
+| `tx_time_ms`         | Sender transfer time in ms.                                               |
+| `rx_decode_time_ms`  | Receiver decode time in ms.                                               |
+| `in_flight_time_ms`  | In-flight time in ms.                                                     |


I am confuse about this result. in_flight_times refers to the network transmission time ？ and tx_time_ms and rx_decode_time_ms is serialize/deserialize time? seems to take a lot of time.

please check it !

Yes, about 90% cost by deserialize/serialize and shm_write/shm_read

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

Bounty-hunter · 2026-02-06T09:14:35Z

vllm_omni/metrics/stats.py

+OVERALL_FIELDS: list[str] | None = None
+STAGE_FIELDS = _build_field_defs(StageRequestStats, STAGE_EXCLUDE, FIELD_TRANSFORMS)
+TRANSFER_FIELDS = _build_field_defs(TransferEdgeStats, TRANSFER_EXCLUDE, FIELD_TRANSFORMS)
+E2E_FIELDS = _build_field_defs(RequestE2EStats, E2E_EXCLUDE, FIELD_TRANSFORMS)


Above just put into StageRequestStats/TransferEdgeStats/RequestE2EStats maintenance？

I put it in vllm_omni\metrics\utils.py, becase this function is not related to XXXStats.

Bounty-hunter · 2026-02-06T09:16:03Z

vllm_omni/metrics/stats.py

+E2E_FIELDS = _build_field_defs(RequestE2EStats, E2E_EXCLUDE, FIELD_TRANSFORMS)
+
+
+def _get_or_create_transfer_event(


put into OrchestratorAggregator?

Bounty-hunter · 2026-02-06T10:00:34Z

vllm_omni/metrics/stats.py

+        if self.log_stats:
+            self.log_request_stats(stats, "stage_stats")
+            if stats.stage_stats is not None:
+                self.log_request_stats(stats, "stage_running_avg")


dosen't see any explain abot stage_running_avg.

deleted, just log in summary. No need log_request_stats any more

Bounty-hunter · 2026-02-06T10:09:04Z

vllm_omni/metrics/stats.py

+            tx_time_ms=tx_ms,
+            used_shm=used_shm,
+        )
+        if self.log_stats and evt is not None:


why need self.log_request_stats hear, isn't is log in build_and_log_summary?

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

…lm-omni into opt_metrics_structure

hsliuustc0106 · 2026-02-06T23:19:30Z

fix comments from @yenuo26

LJH-LBJ · 2026-02-06T23:38:58Z

fix comments from @yenuo26

already fixed. just one comment

Signed-off-by: Junhong Liu <ljh_lbj@163.com>

yenuo26

LGTM

hsliuustc0106 · 2026-02-09T07:14:00Z

@Bounty-hunter PTAL for final check

Copilot

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

vllm_omni/entrypoints/omni_stage.py:1401

make_request_stats constructs StageRequestStats with stage_stats=None, but StageRequestStats.stage_stats is currently a required (non-Optional) field. This can lead to runtime errors if any downstream code expects a StageStats instance. Either make stage_stats optional in the dataclass (with a default), or always pass a StageStats value here (e.g., a zero/default instance) and update later if needed.

    from vllm_omni.metrics import StageRequestStats

    num_tokens_in = count_prompt_tokens_from_outputs(req_output)
    num_tokens_out = count_tokens_from_outputs(req_output)
    return StageRequestStats(
        num_tokens_in=num_tokens_in,
        num_tokens_out=num_tokens_out,
        stage_gen_time_ms=stage_gen_time_ms,
        batch_id=batch_id,
        batch_size=batch_size,
        rx_decode_time_ms=rx_decode_time_ms,
        rx_transfer_bytes=rx_transfer_bytes,
        rx_in_flight_time_ms=rx_in_flight_time_ms,
        stage_stats=None,
    )

vllm_omni/diffusion/diffusion_engine.py:177

The metrics dict is created once and then passed to multiple OmniRequestOutput.from_diffusion(...) results. Since the dict is mutable and not copied, multiple outputs will share the same metrics object, so mutations on one output’s metrics will affect the others. Pass a per-output copy (or construct per-request metrics inside the loop) to keep outputs isolated.

        metrics = {
            "image_num": int(request.sampling_params.num_outputs_per_prompt),
            "resolution": int(request.sampling_params.resolution),
            "postprocess_time_ms": postprocess_time * 1000,
        }
        if self.pre_process_func is not None:
            metrics["preprocessing_time_ms"] = preprocess_time * 1000
        if output.trajectory_timesteps is not None:
            metrics["trajectory_timesteps"] = output.trajectory_timesteps
        # Handle single request or multiple requests
        if len(request.prompts) == 1:
            # Single request: return single OmniRequestOutput
            prompt = request.prompts[0]
            request_id = request.request_ids[0] if request.request_ids else ""

            if supports_audio_output(self.od_config.model_class_name):
                audio_payload = outputs[0] if len(outputs) == 1 else outputs
                return [
                    OmniRequestOutput.from_diffusion(
                        request_id=request_id,
                        images=[],
                        prompt=prompt,
                        metrics=metrics,
                        latents=output.trajectory_latents,
                        multimodal_output={"audio": audio_payload},
                        final_output_type="audio",
                    ),
                ]
            else:
                return [
                    OmniRequestOutput.from_diffusion(
                        request_id=request_id,
                        images=outputs,
                        prompt=prompt,
                        metrics=metrics,
                        latents=output.trajectory_latents,
                    ),
                ]
        else:
            # Multiple requests: return list of OmniRequestOutput
            # Split images based on num_outputs_per_prompt for each request
            results = []
            output_idx = 0

            for i, prompt in enumerate(request.prompts):
                request_id = request.request_ids[i] if i < len(request.request_ids) else ""

                # Get images for this request
                num_outputs = request.sampling_params.num_outputs_per_prompt
                request_outputs = outputs[output_idx : output_idx + num_outputs] if output_idx < len(outputs) else []
                output_idx += num_outputs

                if supports_audio_output(self.od_config.model_class_name):
                    audio_payload = request_outputs[0] if len(request_outputs) == 1 else request_outputs
                    results.append(
                        OmniRequestOutput.from_diffusion(
                            request_id=request_id,
                            images=[],
                            prompt=prompt,
                            metrics=metrics,
                            latents=output.trajectory_latents,
                            multimodal_output={"audio": audio_payload},
                            final_output_type="audio",
                        )
                    )
                else:
                    results.append(
                        OmniRequestOutput.from_diffusion(
                            request_id=request_id,
                            images=request_outputs,
                            prompt=prompt,
                            metrics=metrics,
                            latents=output.trajectory_latents,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_omni/metrics/stats.py

Copilot · 2026-02-09T07:40:11Z

docs/contributing/metrics.md

+#### Overall Summary
+
+| Field                       | Value        |
+|-----------------------------|--------------|
+| e2e_requests                | 1            |
+| e2e_wall_time_ms            | 41,299.190   |
+| e2e_total_tokens            | 5,202        |
+| e2e_avg_time_per_request_ms | 41,299.190   |
+| e2e_avg_tokens_per_s        | 125.959      |
+| e2e_stage_0_wall_time_ms    | 10,192.289   |
+| e2e_stage_1_wall_time_ms    | 30,541.409   |
+| e2e_stage_2_wall_time_ms    |    207.496   |
+


The tables in this doc are written with a double leading pipe (|| ... |), which renders as an extra empty first column in Markdown. Use the standard single-pipe table syntax (| Field | Value |) for the examples and parameter tables so they render correctly in GitHub/Docs builds.

Bounty-hunter · 2026-02-09T08:29:10Z

@Bounty-hunter PTAL for final check

LGTM

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

hsliuustc0106

lgtm

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: gerayking <399geray@gmail.com>

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

LJH-LBJ and others added 23 commits January 15, 2026 18:25

opt metrics structure

501c5ab

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

opt loggers

d261ba5

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

Merge branch 'vllm-project:main' into opt_metrics_structure

8538816

opt metrics structure

89493ac

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

fix bug

f69ed2d

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

fix bug

e3a44db

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

fix bug

da0ad3d

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

fix bug

cb85a3d

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

fix bug

371beae

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

opt loggers

2b98563

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

opt metrics structure

134a901

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

opt format

ee12352

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

Merge branch 'vllm-project:main' into opt_metrics_structure

0af170c

opt metrics structure

cf7e2c0

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

opt metrics structure

eb51d12

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

opt metrics structure

38785ed

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

opt test

aeb5fd6

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

opt loggers

924f747

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

Merge branch 'vllm-project:main' into opt_metrics_structure

8fd556b

fix bug

a78af4d

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…

44c9635

…lm-omni into opt_metrics_structure

fix bug

7c08073

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

fix bug

443022b

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

LJH-LBJ requested a review from hsliuustc0106 as a code owner January 22, 2026 02:02

LJH-LBJ mentioned this pull request Jan 22, 2026

[RFC]: Optimize the metric. JiusiServe/vllm-omni#12

Closed

1 task

fix bug

fbbce79

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

chatgpt-codex-connector bot reviewed Jan 22, 2026

View reviewed changes

LJH-LBJ added 2 commits January 22, 2026 10:31

fix bug

2b2edfc

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

opt metrics in offline

a42a656

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

rerun

7faa2e2

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

Bounty-hunter reviewed Feb 6, 2026

View reviewed changes

docs/contributing/metrics.md Show resolved Hide resolved

Bounty-hunter reviewed Feb 6, 2026

View reviewed changes

update doc

42f6f0f

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

Bounty-hunter reviewed Feb 6, 2026

View reviewed changes

LJH-LBJ and others added 4 commits February 6, 2026 23:43

opt

e7c502f

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

Merge branch 'main' into opt_metrics_structure

a71fa64

fix pre-commit

fd9d3d4

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

Merge branch 'opt_metrics_structure' of https://github.com/LJH-LBJ/vl…

00e7b78

…lm-omni into opt_metrics_structure

LJH-LBJ requested a review from yenuo26 February 6, 2026 23:33

Merge branch 'main' into opt_metrics_structure

5820564

Signed-off-by: Junhong Liu <ljh_lbj@163.com>

yenuo26 approved these changes Feb 9, 2026

View reviewed changes

yenuo26 mentioned this pull request Feb 9, 2026

[Test] L2 & L3 Test Case Stratification Design for Omni Model #1272

Merged

5 tasks

Merge branch 'main' into opt_metrics_structure

0bf1a8f

hsliuustc0106 requested a review from Copilot February 9, 2026 07:14

Copilot started reviewing on behalf of hsliuustc0106 February 9, 2026 07:31 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

LJH-LBJ added 3 commits February 9, 2026 17:23

Merge remote-tracking branch 'origin/main' into opt_metrics_structure

dc81e1d

update

636c3f6

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

fix pre-commit

5adbfec

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>

hsliuustc0106 approved these changes Feb 9, 2026

View reviewed changes

hsliuustc0106 merged commit 9af6fb9 into vllm-project:main Feb 9, 2026
7 checks passed

JuanPZuluaga mentioned this pull request Feb 9, 2026

[Bugfix][Qwen3TTS] #1289

Merged

5 tasks

linyueqian mentioned this pull request Feb 10, 2026

[Test] Add e2e tests for Qwen3-TTS speech endpoint #1206

Merged

5 tasks

		E2E_FIELDS = _build_field_defs(RequestE2EStats, E2E_EXCLUDE, FIELD_TRANSFORMS)


		def _get_or_create_transfer_event(

Conversation

LJH-LBJ commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Jan 22, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LJH-LBJ Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Feb 6, 2026

Uh oh!

LJH-LBJ commented Feb 6, 2026

Uh oh!

yenuo26 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Bounty-hunter commented Feb 9, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

LJH-LBJ commented Jan 22, 2026 •

edited

Loading

LJH-LBJ Feb 6, 2026 •

edited

Loading