[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline#668
[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline#668erfgss wants to merge 15 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a443eb8546
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def close(self) -> None: | ||
| self._finalizer() | ||
|
|
||
| def abort(self, request_id: str | Iterable[str]) -> None: | ||
| # TODO implement it | ||
| logger.warning("DiffusionEngine abort is not implemented yet") | ||
| pass | ||
| self._finalizer() No newline at end of file |
There was a problem hiding this comment.
Restore DiffusionEngine.abort used by async stages
The DiffusionEngine class no longer defines abort(), but AsyncOmniDiffusion.abort() (and the async stage worker’s ABORT handling) still calls self.engine.abort(...). When an abort task is issued, this will now raise AttributeError and fail to cancel requests. This is a regression for any deployment that uses abort (e.g., client cancellation or timeout handling) and should be fixed by reintroducing DiffusionEngine.abort() or updating callers to guard/route aborts.
Useful? React with 👍 / 👎.
vllm_omni/entrypoints/omni_stage.py
Outdated
| _recv_dequeue_ts = _time.time() | ||
| task_type = task.get("type", OmniStageTaskType.GENERATE) | ||
| if task_type == OmniStageTaskType.SHUTDOWN: | ||
| logger.info("Received shutdown signal") | ||
| logger.error("Received shutdown signal") |
There was a problem hiding this comment.
Handle profiler start/stop tasks in stage worker
Profiler control tasks are still submitted from omni.py (PROFILER_START/PROFILER_STOP), but the stage worker no longer handles them. As a result, these tasks fall through into the batching path, and the worker immediately accesses t["request_id"], which profiler tasks don’t include, causing KeyError and breaking profiling control. This is a functional regression for any user toggling profiling and should be addressed by reinstating the profiler-task handling or filtering those tasks before batching.
Useful? React with 👍 / 👎.
|
@gcanlin @lishunyang12 PTAL |
|
cc @SamitHuang @ZJY0516. |
|
Thanks for your contribution, Plz show the test results. |
vllm_omni/entrypoints/log_utils.py
Outdated
| sum_total_ms = float(agg.get("sum_total_ms", 0.0)) | ||
| samples_total = int(agg.get("total_count", 0.0)) | ||
| total_mbps = (sum_bytes * 8.0) / (max(sum_total_ms, 1e-6) * 1000.0) if sum_bytes > 0 else 0.0 | ||
| sum_vae_ms = float(agg.get("sum_vae_time_ms", 0.0)) |
There was a problem hiding this comment.
why put these into transfer_summary, maybe added in other func will be better
|
You should add your general design and the effect you want to achieve. also the test results. |
@erfgss LLM can help you generate these must-required infos, please work with them. next time please add your ideas first before submit your PR |
|
@codex Review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 16b1ebcedc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| metrics = { | ||
| "preprocess_time_ms": preprocess_ms, | ||
| "dit_time_ms": infer_ms, | ||
| "denoise_time_ms": infer_ms, | ||
| "vae_time_ms": postprocess_ms, |
There was a problem hiding this comment.
Avoid double-counting diffusion phase timings
Here both dit_time_ms and denoise_time_ms are set to the same infer_ms value, which already represents the full diffusion inference window. This makes per-phase profiling misleading (every request appears to spend the entire inference time in both phases) and inflates any summaries that interpret these as distinct phases. If per-phase timing isn’t available, consider leaving one field unset or only reporting a single aggregate to avoid double-counting.
Useful? React with 👍 / 👎.
|
@wuhang2014 PTAL |
| metrics = { | ||
| "preprocess_time_ms": preprocess_ms, | ||
| "dit_time_ms": infer_ms, | ||
| "denoise_time_ms": infer_ms, |
There was a problem hiding this comment.
why need this tow field with same infer_ms.
ZJY0516
left a comment
There was a problem hiding this comment.
I don't want to introduce this now honestly.
Given that the DiT component dominates runtime in diffusion models, I'd prefer to keep our focus on total end-to-end performance for now.
| metrics={}, | ||
| metrics={ | ||
| "preprocess_time_ms": preprocess_ms, | ||
| "dit_time_ms": infer_ms, |
There was a problem hiding this comment.
First, dit_time_ms seems to be duplicated with denoise_time_ms. And we'd better remove vae time since we can not get it
|
the Multi-Stage Pipeline logs are spamming the output in this PR |
|
Agree. We should focus on e2e proformance now. |
|
could you explain the purpose of this PR? a little bit confused |
|
Use contextlib to a elegant coding style, one example is https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/worker/model_runner_v1.py#L1496 |
In the vllm-omni project, the logs printed by the Diffusion/DiT Single diffusion Pipeline model lack some diffusion feature information. This PR supplements this information and improves the log printing format. |
d37f6c1 to
2f704e4
Compare
|
FYI — user feedback indicates the diffusion logs are excessive and feel like spam now(not this pr, main branch) |
|
|
@LJH-LBJ ptal thx |
There are two metrics in the result. Moreover, I think it will be better split the metrics from output and use another class to record all the metrics. |
I think we can start by providing simple metrics, and then you can refactor them in your PR. |
|
LGTM |
| "preprocess_time_ms": preprocess_ms, | ||
| "dit_time_ms": infer_ms, | ||
| "denoise_time_per_step_ms": per_step_ms, | ||
| "vae_time_ms": postprocess_ms, |
There was a problem hiding this comment.
postprocess time is not vae time
see
vllm-omni/vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py
Lines 801 to 802 in 9f552d0
|
what is the relatuonship with #533 |
|
@erfgss Hey, the improved metrics logging for diffusion pipelines would be really helpful for debugging — covering T2I, I2I, bagel and other inference paths. Is this still compatible with the current pipeline code? Would be good to get this in. |
|
Signed-off-by: Chen Yang <2082464740@qq.com>
0c1bb01 to
39fe587
Compare
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
|
I think it would be better to merge the metrics info into metrics summary when --log-stats is on. |
|
| metrics["preprocessing_time_ms"] = round(preprocess_time * 1000, 2) | ||
|
|
||
| # Handle single request or multiple requests | ||
| dit_time_seconds = metrics["dit_time_ms"] / 1000 |
There was a problem hiding this comment.
why not just use dit_time_ms? It seems like dit_time_seconds is useless
| total_denoise_time = dit_time_seconds | ||
| metrics["denoise_time_per_step_ms"] = round((total_denoise_time / num_steps) * 1000, 2) | ||
|
|
||
| metrics["vae_time_ms"] = round(dit_time_seconds * 1000, 2) |
There was a problem hiding this comment.
I think the vae_time_ms should measure the duration where the VAE is actually executed.
Signed-off-by: Chen Yang <2082464740@qq.com>
lishunyang12
left a comment
There was a problem hiding this comment.
Left a couple comments. The main issues are around vae_time_ms still being wrong and a silent exception swallow in omni.py.
|
|
||
| # Handle single request or multiple requests | ||
| metrics["postprocess_time_ms"] = round(postprocess_time * 1000, 2) | ||
| metrics["vae_time_ms"] = metrics["postprocess_time_ms"] |
There was a problem hiding this comment.
vae_time_ms is set to postprocess_time_ms but postprocess includes more than just the VAE decode. Same issue flagged on earlier revisions. If you can't isolate the actual VAE duration, drop this field rather than report a misleading number.
| try: | ||
| if stage.final_output_type == "text" or metrics.log_stats: | ||
| output_to_yield.metrics = metrics.build_output_metrics(stage_id, req_id) | ||
| except Exception as e: |
There was a problem hiding this comment.
Bare except Exception that silently swallows any bug in build_output_metrics. This makes metric issues very hard to debug. Why not let it propagate, or at minimum set output_to_yield.metrics = {} in the except block so the contract is explicit?
| ) | ||
|
|
||
| return results | ||
| return results |
There was a problem hiding this comment.
This indentation change moves return results out of the else branch -- was this intentional? Double-check the single-prompt case still returns correctly.
vllm_omni/metrics/stats.py
Outdated
| if not evt.diffusion_metrics: | ||
| continue | ||
| for key, value in evt.diffusion_metrics.items(): | ||
| merged[key] = merged.get(key, 0) + int(value) |
There was a problem hiding this comment.
Return type says dict[str, int] but values can be floats (e.g. denoise_time_per_step_ms). The int(value) cast truncates them. Should be float and float(value).
Signed-off-by: Chen Yang <2082464740@qq.com>
Adding profiling for vllm-omni
Purpose
In the vllm-omni project, the logs printed by the Diffusion/DiT Single diffusion Pipeline model lack some diffusion feature information. This PR supplements this information and improves the log printing format.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)