[Feature] Opt metrics structure#891
Conversation
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
…lm-omni into opt_metrics_structure
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 443022b0a4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
vllm_omni/metrics/stats.py
Outdated
| batch_id=metrics.get("batch_id", -1), | ||
| batch_size=metrics.get("batch_size"), | ||
| stage_gen_time_ms=self.accumulated_gen_time_ms.pop(req_id, 0.0), |
There was a problem hiding this comment.
Preserve stage generation time for sync orchestrator metrics
In _as_stage_request_stats, stage_gen_time_ms is always taken from accumulated_gen_time_ms.pop(req_id, 0.0) and the value provided in the metrics dict is ignored. That accumulator is only updated in the async pipeline; the synchronous Omni path never adds to it, so per-request stage timing (and any derived rates) become zero in non-async runs. This is a regression in metrics accuracy for synchronous serving; consider falling back to metrics.get("stage_gen_time_ms") when the accumulator is empty or populating the accumulator in the sync path.
Useful? React with 👍 / 👎.
vllm_omni/entrypoints/async_omni.py
Outdated
| # Derive inputs for the next stage, record preprocess time | ||
| _prep_t0 = time.perf_counter() | ||
| next_inputs = next_stage.process_engine_inputs(self.stage_list, prompt) | ||
| _prep_ms = (time.perf_counter() - _prep_t0) * 1000.0 | ||
| metrics.record_stage_preprocess_time(next_stage_id, req_id, _prep_ms) |
There was a problem hiding this comment.
Avoid dropping preprocess timing before stage stats exist
The preprocess time is recorded immediately after process_engine_inputs but before the next stage has produced any metrics. record_stage_preprocess_time only updates existing stage_events entries, so at this point there is no entry for next_stage_id, causing the value to be dropped and leaving preprocess_time_ms at 0 for all requests in async multi-stage runs. To make this metric usable, buffer it until on_stage_metrics creates the stage entry or move the recording to after metrics are emitted.
Useful? React with 👍 / 👎.
|
does this also apply to dit models? |
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
| | `size_kbytes` | Total kbytes transferred. | | ||
| | `tx_time_ms` | Sender transfer time in ms. | | ||
| | `rx_decode_time_ms` | Receiver decode time in ms. | | ||
| | `in_flight_time_ms` | In-flight time in ms. | |
There was a problem hiding this comment.
Yes, about 90% cost by deserialize/serialize and shm_write/shm_read
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
| OVERALL_FIELDS: list[str] | None = None | ||
| STAGE_FIELDS = _build_field_defs(StageRequestStats, STAGE_EXCLUDE, FIELD_TRANSFORMS) | ||
| TRANSFER_FIELDS = _build_field_defs(TransferEdgeStats, TRANSFER_EXCLUDE, FIELD_TRANSFORMS) | ||
| E2E_FIELDS = _build_field_defs(RequestE2EStats, E2E_EXCLUDE, FIELD_TRANSFORMS) |
There was a problem hiding this comment.
Above just put into StageRequestStats/TransferEdgeStats/RequestE2EStats maintenance?
There was a problem hiding this comment.
I put it in vllm_omni\metrics\utils.py, becase this function is not related to XXXStats.
vllm_omni/metrics/stats.py
Outdated
| E2E_FIELDS = _build_field_defs(RequestE2EStats, E2E_EXCLUDE, FIELD_TRANSFORMS) | ||
|
|
||
|
|
||
| def _get_or_create_transfer_event( |
There was a problem hiding this comment.
put into OrchestratorAggregator?
vllm_omni/metrics/stats.py
Outdated
| if self.log_stats: | ||
| self.log_request_stats(stats, "stage_stats") | ||
| if stats.stage_stats is not None: | ||
| self.log_request_stats(stats, "stage_running_avg") |
There was a problem hiding this comment.
dosen't see any explain abot stage_running_avg.
There was a problem hiding this comment.
deleted, just log in summary. No need log_request_stats any more
vllm_omni/metrics/stats.py
Outdated
| tx_time_ms=tx_ms, | ||
| used_shm=used_shm, | ||
| ) | ||
| if self.log_stats and evt is not None: |
There was a problem hiding this comment.
why need self.log_request_stats hear, isn't is log in build_and_log_summary?
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
…lm-omni into opt_metrics_structure
|
fix comments from @yenuo26 |
|
Signed-off-by: Junhong Liu <ljh_lbj@163.com>
|
@Bounty-hunter PTAL for final check |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
vllm_omni/entrypoints/omni_stage.py:1401
make_request_statsconstructsStageRequestStatswithstage_stats=None, butStageRequestStats.stage_statsis currently a required (non-Optional) field. This can lead to runtime errors if any downstream code expects aStageStatsinstance. Either makestage_statsoptional in the dataclass (with a default), or always pass aStageStatsvalue here (e.g., a zero/default instance) and update later if needed.
from vllm_omni.metrics import StageRequestStats
num_tokens_in = count_prompt_tokens_from_outputs(req_output)
num_tokens_out = count_tokens_from_outputs(req_output)
return StageRequestStats(
num_tokens_in=num_tokens_in,
num_tokens_out=num_tokens_out,
stage_gen_time_ms=stage_gen_time_ms,
batch_id=batch_id,
batch_size=batch_size,
rx_decode_time_ms=rx_decode_time_ms,
rx_transfer_bytes=rx_transfer_bytes,
rx_in_flight_time_ms=rx_in_flight_time_ms,
stage_stats=None,
)
vllm_omni/diffusion/diffusion_engine.py:177
- The
metricsdict is created once and then passed to multipleOmniRequestOutput.from_diffusion(...)results. Since the dict is mutable and not copied, multiple outputs will share the same metrics object, so mutations on one output’s metrics will affect the others. Pass a per-output copy (or construct per-request metrics inside the loop) to keep outputs isolated.
metrics = {
"image_num": int(request.sampling_params.num_outputs_per_prompt),
"resolution": int(request.sampling_params.resolution),
"postprocess_time_ms": postprocess_time * 1000,
}
if self.pre_process_func is not None:
metrics["preprocessing_time_ms"] = preprocess_time * 1000
if output.trajectory_timesteps is not None:
metrics["trajectory_timesteps"] = output.trajectory_timesteps
# Handle single request or multiple requests
if len(request.prompts) == 1:
# Single request: return single OmniRequestOutput
prompt = request.prompts[0]
request_id = request.request_ids[0] if request.request_ids else ""
if supports_audio_output(self.od_config.model_class_name):
audio_payload = outputs[0] if len(outputs) == 1 else outputs
return [
OmniRequestOutput.from_diffusion(
request_id=request_id,
images=[],
prompt=prompt,
metrics=metrics,
latents=output.trajectory_latents,
multimodal_output={"audio": audio_payload},
final_output_type="audio",
),
]
else:
return [
OmniRequestOutput.from_diffusion(
request_id=request_id,
images=outputs,
prompt=prompt,
metrics=metrics,
latents=output.trajectory_latents,
),
]
else:
# Multiple requests: return list of OmniRequestOutput
# Split images based on num_outputs_per_prompt for each request
results = []
output_idx = 0
for i, prompt in enumerate(request.prompts):
request_id = request.request_ids[i] if i < len(request.request_ids) else ""
# Get images for this request
num_outputs = request.sampling_params.num_outputs_per_prompt
request_outputs = outputs[output_idx : output_idx + num_outputs] if output_idx < len(outputs) else []
output_idx += num_outputs
if supports_audio_output(self.od_config.model_class_name):
audio_payload = request_outputs[0] if len(request_outputs) == 1 else request_outputs
results.append(
OmniRequestOutput.from_diffusion(
request_id=request_id,
images=[],
prompt=prompt,
metrics=metrics,
latents=output.trajectory_latents,
multimodal_output={"audio": audio_payload},
final_output_type="audio",
)
)
else:
results.append(
OmniRequestOutput.from_diffusion(
request_id=request_id,
images=request_outputs,
prompt=prompt,
metrics=metrics,
latents=output.trajectory_latents,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #### Overall Summary | ||
|
|
||
| | Field | Value | | ||
| |-----------------------------|--------------| | ||
| | e2e_requests | 1 | | ||
| | e2e_wall_time_ms | 41,299.190 | | ||
| | e2e_total_tokens | 5,202 | | ||
| | e2e_avg_time_per_request_ms | 41,299.190 | | ||
| | e2e_avg_tokens_per_s | 125.959 | | ||
| | e2e_stage_0_wall_time_ms | 10,192.289 | | ||
| | e2e_stage_1_wall_time_ms | 30,541.409 | | ||
| | e2e_stage_2_wall_time_ms | 207.496 | | ||
|
|
There was a problem hiding this comment.
The tables in this doc are written with a double leading pipe (|| ... |), which renders as an extra empty first column in Markdown. Use the standard single-pipe table syntax (| Field | Value |) for the examples and parameter tables so they render correctly in GitHub/Docs builds.
LGTM |
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: gerayking <399geray@gmail.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Signed-off-by: Junhong Liu <ljh_lbj@163.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>


PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Resolves: #533

make metrics more clear and opt metrics's format
design doc:
https://docs.google.com/document/d/1St1tHMyp1kPwbYzHUFJYQHBGoWcQJA_dcGb9pemUZGI/edit?tab=t.0
Test Plan
Test 1
Omni online inference
Test 2
Omni offline inference
need to add --log-stats in run_multiple_prompts.sh
Test 3
Test Result
Test result 1
Test 2 result
Test result 3
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)