Skip to content

[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline#668

Open
erfgss wants to merge 15 commits intovllm-project:mainfrom
erfgss:feat/vllmomni_profiling
Open

[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline#668
erfgss wants to merge 15 commits intovllm-project:mainfrom
erfgss:feat/vllmomni_profiling

Conversation

@erfgss
Copy link
Contributor

@erfgss erfgss commented Jan 6, 2026

Adding profiling for vllm-omni

Purpose

In the vllm-omni project, the logs printed by the Diffusion/DiT Single diffusion Pipeline model lack some diffusion feature information. This PR supplements this information and improves the log printing format.

Test Plan

Test Result

python path/text_to_image/text_to_image.py --log-stats
Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:18<00:00, 18.15s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 03-02 09:11:19 [stats.py:538] ██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:18<00:00, 18.15s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 03-02 09:11:19 [stats.py:538] [Overall Summary]
INFO 03-02 09:11:19 [stats.py:538] +-----------------------------+------------+
INFO 03-02 09:11:19 [stats.py:538] | Field                       |      Value |
INFO 03-02 09:11:19 [stats.py:538] +-----------------------------+------------+
INFO 03-02 09:11:19 [stats.py:538] | e2e_requests                |          1 |
INFO 03-02 09:11:19 [stats.py:538] | e2e_wall_time_ms            | 18,156.183 |
INFO 03-02 09:11:19 [stats.py:538] | e2e_avg_time_per_request_ms | 18,156.183 |
INFO 03-02 09:11:19 [stats.py:538] | e2e_stage_0_wall_time_ms    | 18,153.808 |
INFO 03-02 09:11:19 [stats.py:538] +-----------------------------+------------+
INFO 03-02 09:11:19 [stats.py:564] 
INFO 03-02 09:11:19 [stats.py:564] [RequestE2EStats [request_id=0_a1198bcc-5334-48a8-ab97-c42bbf24241c]]
INFO 03-02 09:11:19 [stats.py:564] +--------------+------------+
INFO 03-02 09:11:19 [stats.py:564] | Field        |      Value |
INFO 03-02 09:11:19 [stats.py:564] +--------------+------------+
INFO 03-02 09:11:19 [stats.py:564] | e2e_total_ms | 18,152.964 |
INFO 03-02 09:11:19 [stats.py:564] +--------------+------------+
INFO 03-02 09:11:19 [stats.py:617] 
INFO 03-02 09:11:19 [stats.py:617] [StageRequestStats [request_id=0_a1198bcc-5334-48a8-ab97-c42bbf24241c]]
INFO 03-02 09:11:19 [stats.py:617] +-------------------------------+------------+
INFO 03-02 09:11:19 [stats.py:617] | Field                         |          0 |
INFO 03-02 09:11:19 [stats.py:617] +-------------------------------+------------+
INFO 03-02 09:11:19 [stats.py:617] | batch_id                      |          1 |
INFO 03-02 09:11:19 [stats.py:617] | batch_size                    |          1 |
INFO 03-02 09:11:19 [stats.py:617] | diffusion_engine_exec_time_ms | 18,131.630 |
INFO 03-02 09:11:19 [stats.py:617] | executor_time_ms              | 18,019.350 |
INFO 03-02 09:11:19 [stats.py:617] | image_num                     |      1.000 |
INFO 03-02 09:11:19 [stats.py:617] | num_inference_steps           |     50.000 |
INFO 03-02 09:11:19 [stats.py:617] | postprocess_time_ms           |    111.360 |
INFO 03-02 09:11:19 [stats.py:617] | resolution                    |    640.000 |
INFO 03-02 09:11:19 [stats.py:617] | stage_gen_time_ms             | 18,131.823 |
INFO 03-02 09:11:19 [stats.py:617] +-------------------------------+------------+

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@erfgss erfgss requested a review from hsliuustc0106 as a code owner January 6, 2026 09:16
@erfgss erfgss changed the title feat: add profiling for vllm-omni [Profile] Adding profiling for vllm-omni Jan 6, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a443eb8546

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +386 to +387
def close(self) -> None:
self._finalizer()

def abort(self, request_id: str | Iterable[str]) -> None:
# TODO implement it
logger.warning("DiffusionEngine abort is not implemented yet")
pass
self._finalizer() No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore DiffusionEngine.abort used by async stages

The DiffusionEngine class no longer defines abort(), but AsyncOmniDiffusion.abort() (and the async stage worker’s ABORT handling) still calls self.engine.abort(...). When an abort task is issued, this will now raise AttributeError and fail to cancel requests. This is a regression for any deployment that uses abort (e.g., client cancellation or timeout handling) and should be fixed by reintroducing DiffusionEngine.abort() or updating callers to guard/route aborts.

Useful? React with 👍 / 👎.

Comment on lines +633 to +636
_recv_dequeue_ts = _time.time()
task_type = task.get("type", OmniStageTaskType.GENERATE)
if task_type == OmniStageTaskType.SHUTDOWN:
logger.info("Received shutdown signal")
logger.error("Received shutdown signal")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle profiler start/stop tasks in stage worker

Profiler control tasks are still submitted from omni.py (PROFILER_START/PROFILER_STOP), but the stage worker no longer handles them. As a result, these tasks fall through into the batching path, and the worker immediately accesses t["request_id"], which profiler tasks don’t include, causing KeyError and breaking profiling control. This is a functional regression for any user toggling profiling and should be addressed by reinstating the profiler-task handling or filtering those tasks before batching.

Useful? React with 👍 / 👎.

@hsliuustc0106
Copy link
Collaborator

@gcanlin @lishunyang12 PTAL

@gcanlin
Copy link
Contributor

gcanlin commented Jan 6, 2026

cc @SamitHuang @ZJY0516.

@lishunyang12
Copy link
Contributor

Thanks for your contribution, Plz show the test results.

sum_total_ms = float(agg.get("sum_total_ms", 0.0))
samples_total = int(agg.get("total_count", 0.0))
total_mbps = (sum_bytes * 8.0) / (max(sum_total_ms, 1e-6) * 1000.0) if sum_bytes > 0 else 0.0
sum_vae_ms = float(agg.get("sum_vae_time_ms", 0.0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why put these into transfer_summary, maybe added in other func will be better

@david6666666
Copy link
Collaborator

david6666666 commented Jan 6, 2026

You should add your general design and the effect you want to achieve. also the test results.

@hsliuustc0106
Copy link
Collaborator

You should add your general design and the effect you want to achieve. also the test results.

@erfgss LLM can help you generate these must-required infos, please work with them. next time please add your ideas first before submit your PR

@erfgss
Copy link
Contributor Author

erfgss commented Jan 7, 2026

@codex Review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16b1ebcedc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +137 to +141
metrics = {
"preprocess_time_ms": preprocess_ms,
"dit_time_ms": infer_ms,
"denoise_time_ms": infer_ms,
"vae_time_ms": postprocess_ms,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid double-counting diffusion phase timings

Here both dit_time_ms and denoise_time_ms are set to the same infer_ms value, which already represents the full diffusion inference window. This makes per-phase profiling misleading (every request appears to spend the entire inference time in both phases) and inflates any summaries that interpret these as distinct phases. If per-phase timing isn’t available, consider leaving one field unset or only reporting a single aggregate to avoid double-counting.

Useful? React with 👍 / 👎.

@erfgss
Copy link
Contributor Author

erfgss commented Jan 7, 2026

@wuhang2014 PTAL

metrics = {
"preprocess_time_ms": preprocess_ms,
"dit_time_ms": infer_ms,
"denoise_time_ms": infer_ms,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need this tow field with same infer_ms.

Copy link
Collaborator

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to introduce this now honestly.

Given that the DiT component dominates runtime in diffusion models, I'd prefer to keep our focus on total end-to-end performance for now.

metrics={},
metrics={
"preprocess_time_ms": preprocess_ms,
"dit_time_ms": infer_ms,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, dit_time_ms seems to be duplicated with denoise_time_ms. And we'd better remove vae time since we can not get it

@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 7, 2026

the Multi-Stage Pipeline logs are spamming the output in this PR

@lishunyang12
Copy link
Contributor

Agree. We should focus on e2e proformance now.

@hsliuustc0106
Copy link
Collaborator

could you explain the purpose of this PR? a little bit confused

@wuhang2014
Copy link
Contributor

Use contextlib to a elegant coding style, one example is https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/worker/model_runner_v1.py#L1496

@erfgss
Copy link
Contributor Author

erfgss commented Jan 12, 2026

could you explain the purpose of this PR? a little bit confused

In the vllm-omni project, the logs printed by the Diffusion/DiT Single diffusion Pipeline model lack some diffusion feature information. This PR supplements this information and improves the log printing format.

@erfgss erfgss force-pushed the feat/vllmomni_profiling branch 2 times, most recently from d37f6c1 to 2f704e4 Compare January 13, 2026 07:38
@ZJY0516
Copy link
Collaborator

ZJY0516 commented Jan 15, 2026

FYI — user feedback indicates the diffusion logs are excessive and feel like spam now(not this pr, main branch)

@erfgss
Copy link
Contributor Author

erfgss commented Jan 15, 2026

FYI — user feedback indicates the diffusion logs are excessive and feel like spam now(not this pr, main branch)
Which information from the customer's tasks is the most valuable, and what information can we correct, so that we only retain the most valuable information? Thank you.

@david6666666
Copy link
Collaborator

@LJH-LBJ ptal thx

@LJH-LBJ
Copy link
Contributor

LJH-LBJ commented Jan 20, 2026

INFO 01-16 09:24:55 [text_to_image.py:196] metrics={'preprocess_time_ms': 0.0, 'dit_time_ms': 37358.25538635254, 'denoise_time_per_step_ms': 747.1651077270508, 'vae_time_ms': 92.57125854492188, 'total_time_ms': 37450.82664489746},
INFO 01-16 09:24:55 [text_to_image.py:196] )], images=[], prompt=None, latents=None, metrics={})]

There are two metrics in the result. Moreover, I think it will be better split the metrics from output and use another class to record all the metrics.

@david6666666
Copy link
Collaborator

INFO 01-16 09:24:55 [text_to_image.py:196] metrics={'preprocess_time_ms': 0.0, 'dit_time_ms': 37358.25538635254, 'denoise_time_per_step_ms': 747.1651077270508, 'vae_time_ms': 92.57125854492188, 'total_time_ms': 37450.82664489746},
INFO 01-16 09:24:55 [text_to_image.py:196] )], images=[], prompt=None, latents=None, metrics={})]

There are two metrics in the result. Moreover, I think it will be better split the metrics from output and use another class to record all the metrics.

I think we can start by providing simple metrics, and then you can refactor them in your PR.

@david6666666
Copy link
Collaborator

LGTM

@david6666666 david6666666 added the ready label to trigger buildkite CI label Jan 21, 2026
"preprocess_time_ms": preprocess_ms,
"dit_time_ms": infer_ms,
"denoise_time_per_step_ms": per_step_ms,
"vae_time_ms": postprocess_ms,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

postprocess time is not vae time

see

image = self.vae.decode(latents, return_dict=False)[0][:, :, 0]
# processed_image = self.image_processor.postprocess(image, output_type=output_type)

@hsliuustc0106 hsliuustc0106 removed the ready label to trigger buildkite CI label Jan 21, 2026
@hsliuustc0106
Copy link
Collaborator

what is the relatuonship with #533

@erfgss
Copy link
Contributor Author

erfgss commented Jan 22, 2026

what is the relatuonship with #533

There is no connection; this PR and #533 are on different research directions.

@LJH-LBJ LJH-LBJ mentioned this pull request Jan 23, 2026
5 tasks
@erfgss erfgss changed the title [Profile] Adding profiling for vllm-omni [Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline Jan 26, 2026
@lishunyang12
Copy link
Contributor

@erfgss Hey, the improved metrics logging for diffusion pipelines would be really helpful for debugging — covering T2I, I2I, bagel and other inference paths. Is this still compatible with the current pipeline code? Would be good to get this in.

@david6666666
Copy link
Collaborator

@erfgss cc @LJH-LBJ

@erfgss
Copy link
Contributor Author

erfgss commented Feb 25, 2026

@erfgss Hey, the improved metrics logging for diffusion pipelines would be really helpful for debugging — covering T2I, I2I, bagel and other inference paths. Is this still compatible with the current pipeline code? Would be good to get this in.
Of course, the current version is supported.

Signed-off-by: Chen Yang <2082464740@qq.com>
@erfgss erfgss force-pushed the feat/vllmomni_profiling branch from 0c1bb01 to 39fe587 Compare February 25, 2026 08:37
erfgss and others added 9 commits February 25, 2026 16:41
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
@LJH-LBJ
Copy link
Contributor

LJH-LBJ commented Feb 26, 2026

I think it would be better to merge the metrics info into metrics summary when --log-stats is on.

@erfgss
Copy link
Contributor Author

erfgss commented Feb 26, 2026

I think it would be better to merge the metrics info into metrics summary when --log-stats is on.
This is a good suggestion; I'll go and correct it.

metrics["preprocessing_time_ms"] = round(preprocess_time * 1000, 2)

# Handle single request or multiple requests
dit_time_seconds = metrics["dit_time_ms"] / 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use dit_time_ms? It seems like dit_time_seconds is useless

total_denoise_time = dit_time_seconds
metrics["denoise_time_per_step_ms"] = round((total_denoise_time / num_steps) * 1000, 2)

metrics["vae_time_ms"] = round(dit_time_seconds * 1000, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the vae_time_ms should measure the duration where the VAE is actually executed.

Copy link
Contributor

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments. The main issues are around vae_time_ms still being wrong and a silent exception swallow in omni.py.


# Handle single request or multiple requests
metrics["postprocess_time_ms"] = round(postprocess_time * 1000, 2)
metrics["vae_time_ms"] = metrics["postprocess_time_ms"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vae_time_ms is set to postprocess_time_ms but postprocess includes more than just the VAE decode. Same issue flagged on earlier revisions. If you can't isolate the actual VAE duration, drop this field rather than report a misleading number.

try:
if stage.final_output_type == "text" or metrics.log_stats:
output_to_yield.metrics = metrics.build_output_metrics(stage_id, req_id)
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except Exception that silently swallows any bug in build_output_metrics. This makes metric issues very hard to debug. Why not let it propagate, or at minimum set output_to_yield.metrics = {} in the except block so the contract is explicit?

)

return results
return results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indentation change moves return results out of the else branch -- was this intentional? Double-check the single-prompt case still returns correctly.

if not evt.diffusion_metrics:
continue
for key, value in evt.diffusion_metrics.items():
merged[key] = merged.get(key, 0) + int(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return type says dict[str, int] but values can be floats (e.g. denoise_time_per_step_ms). The int(value) cast truncates them. Should be float and float(value).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants