Skip to content

[Feature] Opt metrics structure#891

Merged
hsliuustc0106 merged 143 commits intovllm-project:mainfrom
LJH-LBJ:opt_metrics_structure
Feb 9, 2026
Merged

[Feature] Opt metrics structure#891
hsliuustc0106 merged 143 commits intovllm-project:mainfrom
LJH-LBJ:opt_metrics_structure

Conversation

@LJH-LBJ
Copy link
Contributor

@LJH-LBJ LJH-LBJ commented Jan 22, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Resolves: #533
make metrics more clear and opt metrics's format
image

design doc:
https://docs.google.com/document/d/1St1tHMyp1kPwbYzHUFJYQHBGoWcQJA_dcGb9pemUZGI/edit?tab=t.0

Test Plan

Test 1
Omni online inference

vllm serve /workspace/models/Qwen3-Omni-30B-A3B-Instruct --omni --port 8014 --log-stats
python openai_chat_completion_client_for_multimodal_generation.py \
  --query-type use_video \
  --video-path t2v_out_1.mp4 \
  --model /workspace/models/Qwen3-Omni-30B-A3B-Instruct \
  --prompt "What are the main activities shown in this video?" 

Test 2
Omni offline inference
need to add --log-stats in run_multiple_prompts.sh

python end2end.py --output-wav output_audio \
                  --query-type text \
                  --txt-prompts text_prompts_10.txt \
                  --py-generator \
                  --log-stats
cd examples/offline_inference/qwen3_omni
bash run_multiple_prompts.sh

Test 3

vllm serve /workspace/models/Qwen-Image --omni --port 8014 --log-stats

curl -s http://localhost:8014/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "a cup of coffee on the table"}
    ],
    "extra_body": {
      "height": 1024,
      "width": 1024,
      "num_inference_steps": 50,
      "guidance_scale": 4.0,
      "seed": 42
    }
  }' \
  | jq -r '.choices[0].message.content[0].image_url.url' \
  | cut -d',' -f2 | base64 -d > coffee.png

Test Result

Test result 1

(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] [Overall Summary]
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] +-----------------------------+------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | Field                       |      Value |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] +-----------------------------+------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_requests                |          1 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_wall_time_ms            | 40,828.324 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_total_tokens            |      5,105 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_avg_time_per_request_ms | 40,828.324 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_avg_tokens_per_s        |    125.036 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_stage_0_wall_time_ms    | 10,659.139 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_stage_1_wall_time_ms    | 24,827.949 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] | e2e_stage_2_wall_time_ms    |    625.227 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:439] +-----------------------------+------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] 
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] [RequestE2EStats [request_id=chatcmpl-bd653d4b6bcdc00e]]
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] +-------------------------+-------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] | Field                   |       Value |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] +-------------------------+-------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] | e2e_total_ms            |  40,827.682 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] | e2e_total_tokens        |       5,105 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] | transfers_total_kbytes  | 137,606.358 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] | transfers_total_time_ms |     349.074 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:465] +-------------------------+-------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] 
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] [StageRequestStats [request_id=chatcmpl-bd653d4b6bcdc00e]]
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] +------------------------+-----------+-----------+---------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | Field                  |         0 |         1 |       2 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] +------------------------+-----------+-----------+---------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | audio_generated_frames |         0 |         0 | 362,325 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | batch_id               |        53 |       189 |       0 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | batch_size             |         1 |         1 |       1 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | num_tokens_in          |     4,860 |     4,826 |   3,024 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | num_tokens_out         |        55 |       190 |       0 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | postprocess_time_ms    | 4,523.629 |     0.533 |   0.000 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] | stage_gen_time_ms      |   120.209 | 1,010.551 | 582.322 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:518] +------------------------+-----------+-----------+---------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] 
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] [TransferEdgeStats [request_id=chatcmpl-bd653d4b6bcdc00e]]
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] +-------------------+-------------+------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] | Field             |        0->1 |       1->2 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] +-------------------+-------------+------------+
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] | in_flight_time_ms |       2.096 |      2.588 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] | rx_decode_time_ms |     125.193 |     30.728 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] | size_kbytes       | 108,797.315 | 28,809.043 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] | tx_time_ms        |     158.411 |     30.057 |
(APIServer pid=656024) INFO 02-06 15:01:49 [stats.py:558] +-------------------+-------------+------------+

Test 2 result

INFO 02-06 15:29:11 [stats.py:454] [Overall Summary]
INFO 02-06 15:29:11 [stats.py:454] +-----------------------------+------------+
INFO 02-06 15:29:11 [stats.py:454] | Field                       |      Value |
INFO 02-06 15:29:11 [stats.py:454] +-----------------------------+------------+
INFO 02-06 15:29:11 [stats.py:454] | e2e_requests                |         10 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_wall_time_ms            | 81,430.702 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_total_tokens            |      3,347 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_avg_time_per_request_ms |  8,143.070 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_avg_tokens_per_s        |     41.102 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_stage_0_wall_time_ms    | 19,442.824 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_stage_1_wall_time_ms    | 59,771.091 |
INFO 02-06 15:29:11 [stats.py:454] | e2e_stage_2_wall_time_ms    |  3,015.388 |
INFO 02-06 15:29:11 [stats.py:454] +-----------------------------+------------+
INFO 02-06 15:29:11 [stats.py:480] 
INFO 02-06 15:29:11 [stats.py:480] [RequestE2EStats [request_id=0_72e5beab-aa6d-447d-9727-a3ca66667ac0]]
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:480] | Field                   |      Value |
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:480] | e2e_total_ms            | 78,705.673 |
INFO 02-06 15:29:11 [stats.py:480] | e2e_total_tokens        |         89 |
INFO 02-06 15:29:11 [stats.py:480] | transfers_total_kbytes  |  1,187.154 |
INFO 02-06 15:29:11 [stats.py:480] | transfers_total_time_ms |     10.231 |
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:533] 
INFO 02-06 15:29:11 [stats.py:533] [StageRequestStats [request_id=0_72e5beab-aa6d-447d-9727-a3ca66667ac0]]
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+---------+
INFO 02-06 15:29:11 [stats.py:533] | Field               |         0 |          1 |       2 |
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+---------+
INFO 02-06 15:29:11 [stats.py:533] | batch_id            |         1 |          1 |       1 |
INFO 02-06 15:29:11 [stats.py:533] | batch_size          |        10 |         10 |       1 |
INFO 02-06 15:29:11 [stats.py:533] | num_tokens_in       |        55 |         21 |     400 |
INFO 02-06 15:29:11 [stats.py:533] | num_tokens_out      |         8 |         26 |       0 |
INFO 02-06 15:29:11 [stats.py:533] | postprocess_time_ms | 1,451.138 |      0.481 |   0.000 |
INFO 02-06 15:29:11 [stats.py:533] | stage_gen_time_ms   | 7,121.535 | 49,647.185 | 285.141 |
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+---------+
INFO 02-06 15:29:11 [stats.py:573] 
INFO 02-06 15:29:11 [stats.py:573] [TransferEdgeStats [request_id=0_72e5beab-aa6d-447d-9727-a3ca66667ac0]]
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+-------+
INFO 02-06 15:29:11 [stats.py:573] | Field             |      0->1 |  1->2 |
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+-------+
INFO 02-06 15:29:11 [stats.py:573] | in_flight_time_ms |     1.047 | 1.672 |
INFO 02-06 15:29:11 [stats.py:573] | rx_decode_time_ms |     2.749 | 1.676 |
INFO 02-06 15:29:11 [stats.py:573] | size_kbytes       | 1,185.429 | 1.726 |
INFO 02-06 15:29:11 [stats.py:573] | tx_time_ms        |     2.280 | 0.806 |
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+-------+
INFO 02-06 15:29:11 [stats.py:480] 
INFO 02-06 15:29:11 [stats.py:480] [RequestE2EStats [request_id=1_0184b448-9ab3-40a5-85a7-06b2aa1ffcfe]]
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:480] | Field                   |      Value |
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:480] | e2e_total_ms            | 79,877.905 |
INFO 02-06 15:29:11 [stats.py:480] | e2e_total_tokens        |        434 |
INFO 02-06 15:29:11 [stats.py:480] | transfers_total_kbytes  |  4,630.604 |
INFO 02-06 15:29:11 [stats.py:480] | transfers_total_time_ms |    298.846 |
INFO 02-06 15:29:11 [stats.py:480] +-------------------------+------------+
INFO 02-06 15:29:11 [stats.py:533] 
INFO 02-06 15:29:11 [stats.py:533] [StageRequestStats [request_id=1_0184b448-9ab3-40a5-85a7-06b2aa1ffcfe]]
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+-----------+
INFO 02-06 15:29:11 [stats.py:533] | Field               |         0 |          1 |         2 |
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+-----------+
INFO 02-06 15:29:11 [stats.py:533] | batch_id            |         1 |          1 |         2 |
INFO 02-06 15:29:11 [stats.py:533] | batch_size          |        10 |         10 |         1 |
INFO 02-06 15:29:11 [stats.py:533] | num_tokens_in       |        57 |         23 |     4,528 |
INFO 02-06 15:29:11 [stats.py:533] | num_tokens_out      |        93 |        284 |         0 |
INFO 02-06 15:29:11 [stats.py:533] | postprocess_time_ms |     8.152 |      0.377 |     0.000 |
INFO 02-06 15:29:11 [stats.py:533] | stage_gen_time_ms   | 7,121.535 | 49,647.185 | 1,161.031 |
INFO 02-06 15:29:11 [stats.py:533] +---------------------+-----------+------------+-----------+
INFO 02-06 15:29:11 [stats.py:573] 
INFO 02-06 15:29:11 [stats.py:573] [TransferEdgeStats [request_id=1_0184b448-9ab3-40a5-85a7-06b2aa1ffcfe]]
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+---------+
INFO 02-06 15:29:11 [stats.py:573] | Field             |      0->1 |    1->2 |
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+---------+
INFO 02-06 15:29:11 [stats.py:573] | in_flight_time_ms |     0.000 | 285.377 |
INFO 02-06 15:29:11 [stats.py:573] | rx_decode_time_ms |     3.297 |   3.658 |
INFO 02-06 15:29:11 [stats.py:573] | size_kbytes       | 4,617.656 |  12.948 |
INFO 02-06 15:29:11 [stats.py:573] | tx_time_ms        |     6.226 |   0.288 |
INFO 02-06 15:29:11 [stats.py:573] +-------------------+-----------+---------+
...

Test result 3

(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] [Overall Summary]
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] +-----------------------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] | Field                       |      Value |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] +-----------------------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] | e2e_requests                |          1 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] | e2e_wall_time_ms            | 19,773.057 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] | e2e_avg_time_per_request_ms | 19,773.057 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] | e2e_stage_0_wall_time_ms    | 19,772.584 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:454] +-----------------------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] 
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] [RequestE2EStats [request_id=chatcmpl-eac6b12cee4f45c4]]
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] +--------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] | Field        |      Value |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] +--------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] | e2e_total_ms | 19,772.583 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:480] +--------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] 
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] [StageRequestStats [request_id=chatcmpl-eac6b12cee4f45c4]]
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] +---------------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | Field               |          0 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] +---------------------+------------+
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | batch_size          |          1 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | image_num           |          1 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | postprocess_time_ms |      1,726 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | resolution          |        640 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] | stage_gen_time_ms   | 19,742.906 |
(APIServer pid=665125) INFO 02-06 15:34:59 [stats.py:533] +---------------------+------------+
(APIServer pid=665125) INFO 02-06 15:35:00 [serving_chat.py:2086] Diffusion chat completed for request chatcmpl-eac6b12cee4f45c4: 1 images

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

LJH-LBJ and others added 23 commits January 15, 2026 18:25
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 443022b0a4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 318 to 320
batch_id=metrics.get("batch_id", -1),
batch_size=metrics.get("batch_size"),
stage_gen_time_ms=self.accumulated_gen_time_ms.pop(req_id, 0.0),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve stage generation time for sync orchestrator metrics

In _as_stage_request_stats, stage_gen_time_ms is always taken from accumulated_gen_time_ms.pop(req_id, 0.0) and the value provided in the metrics dict is ignored. That accumulator is only updated in the async pipeline; the synchronous Omni path never adds to it, so per-request stage timing (and any derived rates) become zero in non-async runs. This is a regression in metrics accuracy for synchronous serving; consider falling back to metrics.get("stage_gen_time_ms") when the accumulator is empty or populating the accumulator in the sync path.

Useful? React with 👍 / 👎.

Comment on lines 426 to 430
# Derive inputs for the next stage, record preprocess time
_prep_t0 = time.perf_counter()
next_inputs = next_stage.process_engine_inputs(self.stage_list, prompt)
_prep_ms = (time.perf_counter() - _prep_t0) * 1000.0
metrics.record_stage_preprocess_time(next_stage_id, req_id, _prep_ms)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid dropping preprocess timing before stage stats exist

The preprocess time is recorded immediately after process_engine_inputs but before the next stage has produced any metrics. record_stage_preprocess_time only updates existing stage_events entries, so at this point there is no entry for next_stage_id, causing the value to be dropped and leaving preprocess_time_ms at 0 for all requests in async multi-stage runs. To make this metric usable, buffer it until on_stage_metrics creates the stage entry or move the recording to after metrics are emitted.

Useful? React with 👍 / 👎.

@hsliuustc0106
Copy link
Collaborator

does this also apply to dit models?

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
| `size_kbytes` | Total kbytes transferred. |
| `tx_time_ms` | Sender transfer time in ms. |
| `rx_decode_time_ms` | Receiver decode time in ms. |
| `in_flight_time_ms` | In-flight time in ms. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

I am confuse about this result. in_flight_times refers to the network transmission time ? and tx_time_ms and rx_decode_time_ms is serialize/deserialize time? seems to take a lot of time.

please check it !

Copy link
Contributor Author

@LJH-LBJ LJH-LBJ Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, about 90% cost by deserialize/serialize and shm_write/shm_read

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
OVERALL_FIELDS: list[str] | None = None
STAGE_FIELDS = _build_field_defs(StageRequestStats, STAGE_EXCLUDE, FIELD_TRANSFORMS)
TRANSFER_FIELDS = _build_field_defs(TransferEdgeStats, TRANSFER_EXCLUDE, FIELD_TRANSFORMS)
E2E_FIELDS = _build_field_defs(RequestE2EStats, E2E_EXCLUDE, FIELD_TRANSFORMS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above just put into StageRequestStats/TransferEdgeStats/RequestE2EStats maintenance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it in vllm_omni\metrics\utils.py, becase this function is not related to XXXStats.

E2E_FIELDS = _build_field_defs(RequestE2EStats, E2E_EXCLUDE, FIELD_TRANSFORMS)


def _get_or_create_transfer_event(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put into OrchestratorAggregator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if self.log_stats:
self.log_request_stats(stats, "stage_stats")
if stats.stage_stats is not None:
self.log_request_stats(stats, "stage_running_avg")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dosen't see any explain abot stage_running_avg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted, just log in summary. No need log_request_stats any more

tx_time_ms=tx_ms,
used_shm=used_shm,
)
if self.log_stats and evt is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need self.log_request_stats hear, isn't is log in build_and_log_summary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

LJH-LBJ and others added 4 commits February 6, 2026 23:43
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
@hsliuustc0106
Copy link
Collaborator

fix comments from @yenuo26

@LJH-LBJ LJH-LBJ requested a review from yenuo26 February 6, 2026 23:33
@LJH-LBJ
Copy link
Contributor Author

LJH-LBJ commented Feb 6, 2026

fix comments from @yenuo26

already fixed. just one comment
image

Signed-off-by: Junhong Liu <ljh_lbj@163.com>
Copy link
Contributor

@yenuo26 yenuo26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hsliuustc0106
Copy link
Collaborator

@Bounty-hunter PTAL for final check

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

vllm_omni/entrypoints/omni_stage.py:1401

  • make_request_stats constructs StageRequestStats with stage_stats=None, but StageRequestStats.stage_stats is currently a required (non-Optional) field. This can lead to runtime errors if any downstream code expects a StageStats instance. Either make stage_stats optional in the dataclass (with a default), or always pass a StageStats value here (e.g., a zero/default instance) and update later if needed.
    from vllm_omni.metrics import StageRequestStats

    num_tokens_in = count_prompt_tokens_from_outputs(req_output)
    num_tokens_out = count_tokens_from_outputs(req_output)
    return StageRequestStats(
        num_tokens_in=num_tokens_in,
        num_tokens_out=num_tokens_out,
        stage_gen_time_ms=stage_gen_time_ms,
        batch_id=batch_id,
        batch_size=batch_size,
        rx_decode_time_ms=rx_decode_time_ms,
        rx_transfer_bytes=rx_transfer_bytes,
        rx_in_flight_time_ms=rx_in_flight_time_ms,
        stage_stats=None,
    )

vllm_omni/diffusion/diffusion_engine.py:177

  • The metrics dict is created once and then passed to multiple OmniRequestOutput.from_diffusion(...) results. Since the dict is mutable and not copied, multiple outputs will share the same metrics object, so mutations on one output’s metrics will affect the others. Pass a per-output copy (or construct per-request metrics inside the loop) to keep outputs isolated.
        metrics = {
            "image_num": int(request.sampling_params.num_outputs_per_prompt),
            "resolution": int(request.sampling_params.resolution),
            "postprocess_time_ms": postprocess_time * 1000,
        }
        if self.pre_process_func is not None:
            metrics["preprocessing_time_ms"] = preprocess_time * 1000
        if output.trajectory_timesteps is not None:
            metrics["trajectory_timesteps"] = output.trajectory_timesteps
        # Handle single request or multiple requests
        if len(request.prompts) == 1:
            # Single request: return single OmniRequestOutput
            prompt = request.prompts[0]
            request_id = request.request_ids[0] if request.request_ids else ""

            if supports_audio_output(self.od_config.model_class_name):
                audio_payload = outputs[0] if len(outputs) == 1 else outputs
                return [
                    OmniRequestOutput.from_diffusion(
                        request_id=request_id,
                        images=[],
                        prompt=prompt,
                        metrics=metrics,
                        latents=output.trajectory_latents,
                        multimodal_output={"audio": audio_payload},
                        final_output_type="audio",
                    ),
                ]
            else:
                return [
                    OmniRequestOutput.from_diffusion(
                        request_id=request_id,
                        images=outputs,
                        prompt=prompt,
                        metrics=metrics,
                        latents=output.trajectory_latents,
                    ),
                ]
        else:
            # Multiple requests: return list of OmniRequestOutput
            # Split images based on num_outputs_per_prompt for each request
            results = []
            output_idx = 0

            for i, prompt in enumerate(request.prompts):
                request_id = request.request_ids[i] if i < len(request.request_ids) else ""

                # Get images for this request
                num_outputs = request.sampling_params.num_outputs_per_prompt
                request_outputs = outputs[output_idx : output_idx + num_outputs] if output_idx < len(outputs) else []
                output_idx += num_outputs

                if supports_audio_output(self.od_config.model_class_name):
                    audio_payload = request_outputs[0] if len(request_outputs) == 1 else request_outputs
                    results.append(
                        OmniRequestOutput.from_diffusion(
                            request_id=request_id,
                            images=[],
                            prompt=prompt,
                            metrics=metrics,
                            latents=output.trajectory_latents,
                            multimodal_output={"audio": audio_payload},
                            final_output_type="audio",
                        )
                    )
                else:
                    results.append(
                        OmniRequestOutput.from_diffusion(
                            request_id=request_id,
                            images=request_outputs,
                            prompt=prompt,
                            metrics=metrics,
                            latents=output.trajectory_latents,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +27 to +39
#### Overall Summary

| Field | Value |
|-----------------------------|--------------|
| e2e_requests | 1 |
| e2e_wall_time_ms | 41,299.190 |
| e2e_total_tokens | 5,202 |
| e2e_avg_time_per_request_ms | 41,299.190 |
| e2e_avg_tokens_per_s | 125.959 |
| e2e_stage_0_wall_time_ms | 10,192.289 |
| e2e_stage_1_wall_time_ms | 30,541.409 |
| e2e_stage_2_wall_time_ms | 207.496 |

Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tables in this doc are written with a double leading pipe (|| ... |), which renders as an extra empty first column in Markdown. Use the standard single-pipe table syntax (| Field | Value |) for the examples and parameter tables so they render correctly in GitHub/Docs builds.

Copilot uses AI. Check for mistakes.
@Bounty-hunter
Copy link
Contributor

@Bounty-hunter PTAL for final check

LGTM

Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 merged commit 9af6fb9 into vllm-project:main Feb 9, 2026
7 checks passed
@JuanPZuluaga JuanPZuluaga mentioned this pull request Feb 9, 2026
5 tasks
gerayking pushed a commit to gerayking/vllm-omni that referenced this pull request Feb 12, 2026
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <ljh_lbj@163.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: gerayking <399geray@gmail.com>
YanickSchraner pushed a commit to YanickSchraner/vllm-omni that referenced this pull request Feb 20, 2026
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <ljh_lbj@163.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC]: Optimize the metric.

8 participants