[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline by erfgss · Pull Request #668 · vllm-project/vllm-omni

erfgss · 2026-01-06T09:16:23Z

Adding profiling for vllm-omni

Purpose

In the vllm-omni project, the logs printed by the Diffusion/DiT Single diffusion Pipeline model lack some diffusion feature information. This PR supplements this information and improves the log printing format.

Test Plan

Diffusion/DiT Single diffusion Pipeline

Test Result

python path/text_to_image/text_to_image.py --log-stats

Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:18<00:00, 18.15s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 03-02 09:11:19 [stats.py:538] ██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:18<00:00, 18.15s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 03-02 09:11:19 [stats.py:538] [Overall Summary]
INFO 03-02 09:11:19 [stats.py:538] +-----------------------------+------------+
INFO 03-02 09:11:19 [stats.py:538] | Field                       |      Value |
INFO 03-02 09:11:19 [stats.py:538] +-----------------------------+------------+
INFO 03-02 09:11:19 [stats.py:538] | e2e_requests                |          1 |
INFO 03-02 09:11:19 [stats.py:538] | e2e_wall_time_ms            | 18,156.183 |
INFO 03-02 09:11:19 [stats.py:538] | e2e_avg_time_per_request_ms | 18,156.183 |
INFO 03-02 09:11:19 [stats.py:538] | e2e_stage_0_wall_time_ms    | 18,153.808 |
INFO 03-02 09:11:19 [stats.py:538] +-----------------------------+------------+
INFO 03-02 09:11:19 [stats.py:564] 
INFO 03-02 09:11:19 [stats.py:564] [RequestE2EStats [request_id=0_a1198bcc-5334-48a8-ab97-c42bbf24241c]]
INFO 03-02 09:11:19 [stats.py:564] +--------------+------------+
INFO 03-02 09:11:19 [stats.py:564] | Field        |      Value |
INFO 03-02 09:11:19 [stats.py:564] +--------------+------------+
INFO 03-02 09:11:19 [stats.py:564] | e2e_total_ms | 18,152.964 |
INFO 03-02 09:11:19 [stats.py:564] +--------------+------------+
INFO 03-02 09:11:19 [stats.py:617] 
INFO 03-02 09:11:19 [stats.py:617] [StageRequestStats [request_id=0_a1198bcc-5334-48a8-ab97-c42bbf24241c]]
INFO 03-02 09:11:19 [stats.py:617] +-------------------------------+------------+
INFO 03-02 09:11:19 [stats.py:617] | Field                         |          0 |
INFO 03-02 09:11:19 [stats.py:617] +-------------------------------+------------+
INFO 03-02 09:11:19 [stats.py:617] | batch_id                      |          1 |
INFO 03-02 09:11:19 [stats.py:617] | batch_size                    |          1 |
INFO 03-02 09:11:19 [stats.py:617] | diffusion_engine_exec_time_ms | 18,131.630 |
INFO 03-02 09:11:19 [stats.py:617] | executor_time_ms              | 18,019.350 |
INFO 03-02 09:11:19 [stats.py:617] | image_num                     |      1.000 |
INFO 03-02 09:11:19 [stats.py:617] | num_inference_steps           |     50.000 |
INFO 03-02 09:11:19 [stats.py:617] | postprocess_time_ms           |    111.360 |
INFO 03-02 09:11:19 [stats.py:617] | resolution                    |    640.000 |
INFO 03-02 09:11:19 [stats.py:617] | stage_gen_time_ms             | 18,131.823 |
INFO 03-02 09:11:19 [stats.py:617] +-------------------------------+------------+

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a443eb8546

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-06T09:19:04Z

vllm_omni/diffusion/diffusion_engine.py

    def close(self) -> None:
-        self._finalizer()
-
-    def abort(self, request_id: str | Iterable[str]) -> None:
-        # TODO implement it
-        logger.warning("DiffusionEngine abort is not implemented yet")
-        pass
+        self._finalizer()


Restore DiffusionEngine.abort used by async stages

The DiffusionEngine class no longer defines abort(), but AsyncOmniDiffusion.abort() (and the async stage worker’s ABORT handling) still calls self.engine.abort(...). When an abort task is issued, this will now raise AttributeError and fail to cancel requests. This is a regression for any deployment that uses abort (e.g., client cancellation or timeout handling) and should be fixed by reintroducing DiffusionEngine.abort() or updating callers to guard/route aborts.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-01-06T09:19:04Z

vllm_omni/entrypoints/omni_stage.py

        _recv_dequeue_ts = _time.time()
        task_type = task.get("type", OmniStageTaskType.GENERATE)
        if task_type == OmniStageTaskType.SHUTDOWN:
-            logger.info("Received shutdown signal")
+            logger.error("Received shutdown signal")


Handle profiler start/stop tasks in stage worker

Profiler control tasks are still submitted from omni.py (PROFILER_START/PROFILER_STOP), but the stage worker no longer handles them. As a result, these tasks fall through into the batching path, and the worker immediately accesses t["request_id"], which profiler tasks don’t include, causing KeyError and breaking profiling control. This is a functional regression for any user toggling profiling and should be addressed by reinstating the profiler-task handling or filtering those tasks before batching.

Useful? React with 👍 / 👎.

hsliuustc0106 · 2026-01-06T09:33:35Z

@gcanlin @lishunyang12 PTAL

gcanlin · 2026-01-06T09:54:30Z

cc @SamitHuang @ZJY0516.

lishunyang12 · 2026-01-06T11:06:35Z

Thanks for your contribution, Plz show the test results.

david6666666 · 2026-01-06T12:15:14Z

vllm_omni/entrypoints/log_utils.py

        sum_total_ms = float(agg.get("sum_total_ms", 0.0))
        samples_total = int(agg.get("total_count", 0.0))
        total_mbps = (sum_bytes * 8.0) / (max(sum_total_ms, 1e-6) * 1000.0) if sum_bytes > 0 else 0.0
+        sum_vae_ms = float(agg.get("sum_vae_time_ms", 0.0))


why put these into transfer_summary, maybe added in other func will be better

david6666666 · 2026-01-06T12:16:25Z

You should add your general design and the effect you want to achieve. also the test results.

hsliuustc0106 · 2026-01-07T00:46:42Z

You should add your general design and the effect you want to achieve. also the test results.

@erfgss LLM can help you generate these must-required infos, please work with them. next time please add your ideas first before submit your PR

erfgss · 2026-01-07T08:20:22Z

@codex Review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16b1ebcedc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-07T08:25:26Z

vllm_omni/diffusion/diffusion_engine.py

+                metrics = {
+                    "preprocess_time_ms": preprocess_ms,
+                    "dit_time_ms": infer_ms,
+                    "denoise_time_ms": infer_ms,
+                    "vae_time_ms": postprocess_ms,


Avoid double-counting diffusion phase timings

Here both dit_time_ms and denoise_time_ms are set to the same infer_ms value, which already represents the full diffusion inference window. This makes per-phase profiling misleading (every request appears to spend the entire inference time in both phases) and inflates any summaries that interpret these as distinct phases. If per-phase timing isn’t available, consider leaving one field unset or only reporting a single aggregate to avoid double-counting.

Useful? React with 👍 / 👎.

erfgss · 2026-01-07T09:57:10Z

@wuhang2014 PTAL

Bounty-hunter · 2026-01-07T08:28:25Z

vllm_omni/diffusion/diffusion_engine.py

+                    metrics = {
+                        "preprocess_time_ms": preprocess_ms,
+                        "dit_time_ms": infer_ms,
+                        "denoise_time_ms": infer_ms,


why need this tow field with same infer_ms.

ZJY0516

I don't want to introduce this now honestly.

Given that the DiT component dominates runtime in diffusion models, I'd prefer to keep our focus on total end-to-end performance for now.

ZJY0516 · 2026-01-07T12:43:26Z

vllm_omni/diffusion/diffusion_engine.py

-                        metrics={},
+                        metrics={
+                            "preprocess_time_ms": preprocess_ms,
+                            "dit_time_ms": infer_ms,


First, dit_time_ms seems to be duplicated with denoise_time_ms. And we'd better remove vae time since we can not get it

ZJY0516 · 2026-01-07T12:48:35Z

the Multi-Stage Pipeline logs are spamming the output in this PR

lishunyang12 · 2026-01-07T13:15:38Z

Agree. We should focus on e2e proformance now.

hsliuustc0106 · 2026-01-07T16:19:41Z

could you explain the purpose of this PR? a little bit confused

wuhang2014 · 2026-01-09T02:02:39Z

Use contextlib to a elegant coding style, one example is https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/worker/model_runner_v1.py#L1496

erfgss · 2026-01-12T08:02:45Z

could you explain the purpose of this PR? a little bit confused

In the vllm-omni project, the logs printed by the Diffusion/DiT Single diffusion Pipeline model lack some diffusion feature information. This PR supplements this information and improves the log printing format.

ZJY0516 · 2026-01-15T02:40:19Z

FYI — user feedback indicates the diffusion logs are excessive and feel like spam now(not this pr, main branch)

erfgss · 2026-01-15T02:50:03Z

FYI — user feedback indicates the diffusion logs are excessive and feel like spam now(not this pr, main branch)
Which information from the customer's tasks is the most valuable, and what information can we correct, so that we only retain the most valuable information? Thank you.

david6666666 · 2026-01-19T02:37:06Z

@LJH-LBJ ptal thx

LJH-LBJ · 2026-01-20T02:16:45Z

INFO 01-16 09:24:55 [text_to_image.py:196] metrics={'preprocess_time_ms': 0.0, 'dit_time_ms': 37358.25538635254, 'denoise_time_per_step_ms': 747.1651077270508, 'vae_time_ms': 92.57125854492188, 'total_time_ms': 37450.82664489746},
INFO 01-16 09:24:55 [text_to_image.py:196] )], images=[], prompt=None, latents=None, metrics={})]

There are two metrics in the result. Moreover, I think it will be better split the metrics from output and use another class to record all the metrics.

david6666666 · 2026-01-21T02:45:09Z

INFO 01-16 09:24:55 [text_to_image.py:196] metrics={'preprocess_time_ms': 0.0, 'dit_time_ms': 37358.25538635254, 'denoise_time_per_step_ms': 747.1651077270508, 'vae_time_ms': 92.57125854492188, 'total_time_ms': 37450.82664489746},
INFO 01-16 09:24:55 [text_to_image.py:196] )], images=[], prompt=None, latents=None, metrics={})]

There are two metrics in the result. Moreover, I think it will be better split the metrics from output and use another class to record all the metrics.

I think we can start by providing simple metrics, and then you can refactor them in your PR.

david6666666 · 2026-01-21T02:45:43Z

LGTM

ZJY0516 · 2026-01-21T03:39:51Z

vllm_omni/diffusion/diffusion_engine.py

+                        "preprocess_time_ms": preprocess_ms,
+                        "dit_time_ms": infer_ms,
+                        "denoise_time_per_step_ms": per_step_ms,
+                        "vae_time_ms": postprocess_ms,


postprocess time is not vae time

see

vllm-omni/vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py

Lines 801 to 802 in 9f552d0

image = self.vae.decode(latents, return_dict=False)[0][:, :, 0]

# processed_image = self.image_processor.postprocess(image, output_type=output_type)

hsliuustc0106 · 2026-01-21T16:28:27Z

what is the relatuonship with #533

erfgss · 2026-01-22T01:16:22Z

what is the relatuonship with #533

There is no connection; this PR and #533 are on different research directions.

vllm_omni/diffusion/diffusion_engine.py

lishunyang12 · 2026-02-21T08:00:26Z

@erfgss Hey, the improved metrics logging for diffusion pipelines would be really helpful for debugging — covering T2I, I2I, bagel and other inference paths. Is this still compatible with the current pipeline code? Would be good to get this in.

david6666666 · 2026-02-25T02:43:05Z

@erfgss cc @LJH-LBJ

erfgss · 2026-02-25T08:25:34Z

@erfgss Hey, the improved metrics logging for diffusion pipelines would be really helpful for debugging — covering T2I, I2I, bagel and other inference paths. Is this still compatible with the current pipeline code? Would be good to get this in.
Of course, the current version is supported.

Signed-off-by: Chen Yang <2082464740@qq.com>

LJH-LBJ · 2026-02-26T01:39:24Z

I think it would be better to merge the metrics info into metrics summary when --log-stats is on.

erfgss · 2026-02-26T01:42:54Z

I think it would be better to merge the metrics info into metrics summary when --log-stats is on.
This is a good suggestion; I'll go and correct it.

LJH-LBJ · 2026-02-26T04:13:19Z

vllm_omni/diffusion/diffusion_engine.py

+            metrics["preprocessing_time_ms"] = round(preprocess_time * 1000, 2)

        # Handle single request or multiple requests
+        dit_time_seconds = metrics["dit_time_ms"] / 1000


why not just use dit_time_ms? It seems like dit_time_seconds is useless

LJH-LBJ · 2026-02-26T06:29:45Z

vllm_omni/diffusion/diffusion_engine.py

+            total_denoise_time = dit_time_seconds
+            metrics["denoise_time_per_step_ms"] = round((total_denoise_time / num_steps) * 1000, 2)
+
+        metrics["vae_time_ms"] = round(dit_time_seconds * 1000, 2)


I think the vae_time_ms should measure the duration where the VAE is actually executed.

Signed-off-by: Chen Yang <2082464740@qq.com>

lishunyang12

Left a couple comments. The main issues are around vae_time_ms still being wrong and a silent exception swallow in omni.py.

lishunyang12 · 2026-02-27T18:12:27Z

vllm_omni/diffusion/diffusion_engine.py


        # Handle single request or multiple requests
+        metrics["postprocess_time_ms"] = round(postprocess_time * 1000, 2)
+        metrics["vae_time_ms"] = metrics["postprocess_time_ms"]


vae_time_ms is set to postprocess_time_ms but postprocess includes more than just the VAE decode. Same issue flagged on earlier revisions. If you can't isolate the actual VAE duration, drop this field rather than report a misleading number.

lishunyang12 · 2026-02-27T18:12:27Z

vllm_omni/entrypoints/omni.py

+                    try:
+                        if stage.final_output_type == "text" or metrics.log_stats:
+                            output_to_yield.metrics = metrics.build_output_metrics(stage_id, req_id)
+                    except Exception as e:


Bare except Exception that silently swallows any bug in build_output_metrics. This makes metric issues very hard to debug. Why not let it propagate, or at minimum set output_to_yield.metrics = {} in the except block so the contract is explicit?

lishunyang12 · 2026-02-27T18:12:27Z

vllm_omni/diffusion/diffusion_engine.py

                    )

-            return results
+        return results


This indentation change moves return results out of the else branch -- was this intentional? Double-check the single-prompt case still returns correctly.

lishunyang12 · 2026-02-27T18:12:27Z

vllm_omni/metrics/stats.py

+            if not evt.diffusion_metrics:
+                continue
+            for key, value in evt.diffusion_metrics.items():
+                merged[key] = merged.get(key, 0) + int(value)


Return type says dict[str, int] but values can be floats (e.g. denoise_time_per_step_ms). The int(value) cast truncates them. Should be float and float(value).

Signed-off-by: Chen Yang <2082464740@qq.com>

erfgss requested a review from hsliuustc0106 as a code owner January 6, 2026 09:16

erfgss changed the title ~~feat: add profiling for vllm-omni~~ [Profile] Adding profiling for vllm-omni Jan 6, 2026

chatgpt-codex-connector bot reviewed Jan 6, 2026

View reviewed changes

david6666666 reviewed Jan 6, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Jan 7, 2026

View reviewed changes

Bounty-hunter reviewed Jan 7, 2026

View reviewed changes

ZJY0516 reviewed Jan 7, 2026

View reviewed changes

erfgss force-pushed the feat/vllmomni_profiling branch 2 times, most recently from d37f6c1 to 2f704e4 Compare January 13, 2026 07:38

david6666666 added the ready label to trigger buildkite CI label Jan 21, 2026

david6666666 approved these changes Jan 21, 2026

View reviewed changes

ZJY0516 reviewed Jan 21, 2026

View reviewed changes

hsliuustc0106 removed the ready label to trigger buildkite CI label Jan 21, 2026

LJH-LBJ mentioned this pull request Jan 23, 2026

[Feature] Opt metrics structure #891

Merged

5 tasks

LJH-LBJ reviewed Jan 23, 2026

View reviewed changes

vllm_omni/diffusion/diffusion_engine.py Outdated Show resolved Hide resolved

erfgss changed the title ~~[Profile] Adding profiling for vllm-omni~~ [Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline Jan 26, 2026

Compatible with vllm-omni 0.16.0

39fe587

Signed-off-by: Chen Yang <2082464740@qq.com>

erfgss force-pushed the feat/vllmomni_profiling branch from 0c1bb01 to 39fe587 Compare February 25, 2026 08:37

erfgss and others added 9 commits February 25, 2026 16:41

Compatible with vllm-omni 0.16.0rc

76b1739

Signed-off-by: Chen Yang <2082464740@qq.com>

Compatible with vllm-omni 0.16.0rc

1464bc1

Signed-off-by: Chen Yang <2082464740@qq.com>

Compatible with vllm-omni 0.16.0rc

08e6f59

Signed-off-by: Chen Yang <2082464740@qq.com>

Merge branch 'main' into feat/vllmomni_profiling

265999d

Compatible with vllm-omni 0.16.0rc

a3bcdcb

Signed-off-by: Chen Yang <2082464740@qq.com>

Compatible with vllm-omni 0.16.0rc

d7175bf

Signed-off-by: Chen Yang <2082464740@qq.com>

Compatible with vllm-omni 0.16.0rc

a9a518b

Signed-off-by: Chen Yang <2082464740@qq.com>

Compatible with vllm-omni 0.16.0rc

02e0a82

Signed-off-by: Chen Yang <2082464740@qq.com>

Merge branch 'main' into feat/vllmomni_profiling

5d909d5

LJH-LBJ reviewed Feb 26, 2026

View reviewed changes

erfgss and others added 2 commits February 26, 2026 15:24

Compatible with vllm-omni 0.16.0rc

9c59a51

Signed-off-by: Chen Yang <2082464740@qq.com>

Merge branch 'main' into feat/vllmomni_profiling

6148be5

lishunyang12 reviewed Feb 27, 2026

View reviewed changes

erfgss and others added 3 commits March 2, 2026 09:49

Merge branch 'main' into feat/vllmomni_profiling

3fba139

Compatible with vllm-omni 0.16.0rc

b0564cb

Signed-off-by: Chen Yang <2082464740@qq.com>

Merge branch 'main' into feat/vllmomni_profiling

9ecc950

	image = self.vae.decode(latents, return_dict=False)[0][:, :, 0]
	# processed_image = self.image_processor.postprocess(image, output_type=output_type)

Conversation

erfgss commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Jan 6, 2026

Uh oh!

gcanlin commented Jan 6, 2026

Uh oh!

lishunyang12 commented Jan 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 commented Jan 7, 2026

Uh oh!

erfgss commented Jan 7, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

erfgss commented Jan 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZJY0516 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZJY0516 commented Jan 7, 2026

Uh oh!

lishunyang12 commented Jan 7, 2026

Uh oh!

hsliuustc0106 commented Jan 7, 2026

Uh oh!

wuhang2014 commented Jan 9, 2026

Uh oh!

erfgss commented Jan 12, 2026

Uh oh!

ZJY0516 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erfgss commented Jan 15, 2026

Uh oh!

david6666666 commented Jan 19, 2026

Uh oh!

LJH-LBJ commented Jan 20, 2026

Uh oh!

david6666666 commented Jan 21, 2026

Uh oh!

david6666666 commented Jan 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Jan 21, 2026

Uh oh!

erfgss commented Jan 22, 2026

Uh oh!

Uh oh!

lishunyang12 commented Feb 21, 2026

Uh oh!

erfgss commented Jan 6, 2026 •

edited

Loading

david6666666 commented Jan 6, 2026 •

edited

Loading

ZJY0516 commented Jan 15, 2026 •

edited

Loading