[Model] SupportHunyuanImage3 Diffusion Model in vllm-omni by ElleElleWu · Pull Request #1085 · vllm-project/vllm-omni

ElleElleWu · 2026-01-29T13:43:55Z

Co-authored-by: skf1999 13234016272@163.com
Co-authored-by: Just-it 1161406585@qq.com
Co-authored-by: Semmer2 semmer@live.cn

Purpose

Support HunyuanImage as a DiT model and integrating it into the Omni framework as a standalone stage, enabling the core image generation workflow.

Test Result

1. Test Environment

CUDA         Version: 12.9
torch        Version: 2.9.1
vllm         Version: 0.15.0
vllm-omni    Version: 0.15.0

2. Offline inference

- CMD

python3 vllm-omni/examples/offline_inference/text_to_image/text_to_image.py \
--model /data/HunyuanImage-3.0/ \
--prompt "Add a white art board written with colorful text vLLM-Omni on grassland.Add a paintbrush in the bears hands. position the bear standing in front of the art board as if painting" \
--output output_image_edit.png \
--num_inference_steps 50 \
--cfg_scale 4.0 \
--tensor_parallel_size 8

- Execution Result Output

[Stage-0] INFO 01-29 01:28:14 [diffusion_engine.py:104] Generation completed successfully.
[Stage-0] INFO 01-29 01:28:14 [diffusion_engine.py:127] Post-processing completed in 0.0000 seconds
INFO 01-29 01:28:14 [log_utils.py:550] {'type': 'request_level_metrics',
INFO 01-29 01:28:14 [log_utils.py:550]  'request_id': '0_ecebd709-8b6f-4f1d-9c85-2e9fd03ff0ba',
INFO 01-29 01:28:14 [log_utils.py:550]  'e2e_time_ms': 86126.50895118713,
INFO 01-29 01:28:14 [log_utils.py:550]  'e2e_tpt': 0.0,
INFO 01-29 01:28:14 [log_utils.py:550]  'e2e_total_tokens': 0,
INFO 01-29 01:28:14 [log_utils.py:550]  'transfers_total_time_ms': 0.0,
INFO 01-29 01:28:14 [log_utils.py:550]  'transfers_total_bytes': 0,
INFO 01-29 01:28:14 [log_utils.py:550]  'stages': {0: {'stage_gen_time_ms': 86079.94437217712,
INFO 01-29 01:28:14 [log_utils.py:550]                 'num_tokens_out': 0,
INFO 01-29 01:28:14 [log_utils.py:550]                 'num_tokens_in': 0}}}
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:26<00:00, 86.13s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-29 01:28:14 [omni.py:782] [Summary] {'e2e_requests': 1,█████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:26<00:00, 86.13s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-29 01:28:14 [omni.py:782]  'e2e_total_time_ms': 86127.75588035583,
INFO 01-29 01:28:14 [omni.py:782]  'e2e_sum_time_ms': 86126.50895118713,
INFO 01-29 01:28:14 [omni.py:782]  'e2e_total_tokens': 0,
INFO 01-29 01:28:14 [omni.py:782]  'e2e_avg_time_per_request_ms': 86126.50895118713,
INFO 01-29 01:28:14 [omni.py:782]  'e2e_avg_tokens_per_s': 0.0,
INFO 01-29 01:28:14 [omni.py:782]  'wall_time_ms': 86127.75588035583,
INFO 01-29 01:28:14 [omni.py:782]  'final_stage_id': {'0_ecebd709-8b6f-4f1d-9c85-2e9fd03ff0ba': 0},
INFO 01-29 01:28:14 [omni.py:782]  'stages': [{'stage_id': 0,
INFO 01-29 01:28:14 [omni.py:782]              'requests': 1,
INFO 01-29 01:28:14 [omni.py:782]              'tokens': 0,
INFO 01-29 01:28:14 [omni.py:782]              'total_time_ms': 86126.80411338806,
INFO 01-29 01:28:14 [omni.py:782]              'avg_time_per_request_ms': 86126.80411338806,
INFO 01-29 01:28:14 [omni.py:782]              'avg_tokens_per_s': 0.0}],
INFO 01-29 01:28:14 [omni.py:782]  'transfers': []}
[Stage-0] INFO 01-29 01:28:14 [omni_stage.py:677] Received shutdown signal
Total generation time: 89.6901 seconds (89690.07 ms)

3. Online Inference

- command

vllm serve "/data/HunyuanImage-3.0/" --omni --port "8091" --tensor_parallel_size 8

- Online Request

curl -X POST http://localhost:8091/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Add a white art board written with colorful text vLLM-Omni on grassland. Add a paintbrush in the bear hands. position the bear standing in front of the art board as if painting",
    "num_inference_steps": 50,
    "n": 4,
    "size": "1024x1024",
    "seed": 123
  }' | jq -r '.data[0].b64_json' | base64 -d > dragon.png

- Execution Result Output

[Stage-0] INFO 01-29 00:47:26 [diffusion_engine.py:104] Generation completed successfully.
[Stage-0] INFO 01-29 00:47:26 [diffusion_engine.py:127] Post-processing completed in 0.0000 seconds
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550] {'type': 'request_level_metrics',
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'request_id': 'img_gen_1769673990',
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'e2e_time_ms': 86665.24529457092,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'e2e_tpt': 0.0,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'e2e_total_tokens': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'transfers_total_time_ms': 0.0,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'transfers_total_bytes': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'stages': {0: {'stage_gen_time_ms': 86644.5779800415,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]                 'num_tokens_out': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]                 'num_tokens_in': 0}}}
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467] [Summary] {'e2e_requests': 1,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  ==**'e2e_total_time_ms': 86665.33899307251**,==
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  ==**'e2e_sum_time_ms': 86665.24529457092,**==
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'e2e_total_tokens': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'e2e_avg_time_per_request_ms': 86665.24529457092,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'e2e_avg_tokens_per_s': 0.0,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'wall_time_ms': 86665.33899307251,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'final_stage_id': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'stages': [{'stage_id': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]              'requests': 1,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]              'tokens': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]              'total_time_ms': 86665.29130935669,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]              'avg_time_per_request_ms': 86665.29130935669,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]              'avg_tokens_per_s': 0.0}],
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'transfers': []}
(APIServer pid=967459) INFO 01-29 00:07:57 [api_server.py:696] Successfully generated 1 image(s)
(APIServer pid=967459) INFO:     127.0.0.1:40300 - "POST /v1/images/generations HTTP/1.1" 200 OK

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

vllm-omni/examples/offline_inference/text_to_image/text_to_image.py

Lines 183 to 185 in 9f2fb36

    
           if profiler_enabled: 
        
               print("[Profiler] Starting profiling...") 
        
               omni.start_profile()

Restore profiler_enabled initialization

The script still branches on profiler_enabled, but this commit removed its initialization (profiler_enabled = bool(os.getenv("VLLM_TORCH_PROFILER_DIR"))). As a result, running the offline text-to-image example now raises a NameError at runtime before any generation happens, even when profiling is not requested. Reintroduce the initialization or fold the environment check directly into the if condition.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

david6666666 · 2026-01-30T02:40:50Z

What is the relationship between these PRs? [Model] Add Hunyuan Image3 AR Support #759 [New model] Support HY-Image3.0 DiT #794 @skf-1999 @usberkeley

Semmer2 · 2026-01-30T03:02:09Z

What is the relationship between these PRs? [Model] Add Hunyuan Image3 AR Support #759 [New model] Support HY-Image3.0 DiT #794 @skf-1999 @usberkeley

HunyuanImage3 model supports both AR and DiT, we designed the two stages together, the first part AR runs VL inference alone, and the second part supports DiT inference. In next few days, we will combine the two parts together, which means it can run AR + DiT.

david6666666 · 2026-01-30T03:05:48Z

What is the relationship between these PRs? [Model] Add Hunyuan Image3 AR Support #759 [New model] Support HY-Image3.0 DiT #794 @skf-1999 @usberkeley

HunyuanImage3 model supports both AR and DiT, we designed the two stages together, the first part AR runs VL inference alone, and the second part supports DiT inference. In next few days, we will combine the two parts together, which means it can run AR + DiT.

OK, Please provide additional details in the Purpose section of this pr to avoid ambiguity. Thank you.

Semmer2 · 2026-01-30T03:14:22Z

@princepride

hsliuustc0106 · 2026-01-30T03:18:42Z

please using benchmark diffusion to output the performance

Copilot

Pull request overview

Adds GPU support for the HunyuanImage-3 diffusion model and integrates it into the vLLM-Omni diffusion framework as a selectable pipeline.

Changes:

Extend diffusion entrypoints to infer model_class_name from config.json architectures.
Register and add a new HunyuanImage3 diffusion pipeline implementation (tokenizer/image/VAE/ViT utilities).
Adjust diffusion worker/context initialization and parallel-state exposure for integration.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 38 comments.

Show a summary per file

File	Description
vllm_omni/entrypoints/omni_diffusion.py	Adds fallback to use `architectures[0]` as `model_class_name` for non-diffusers models.
vllm_omni/entrypoints/async_omni_diffusion.py	Adds similar `architectures` fallback logic for async diffusion entrypoint.
vllm_omni/diffusion/worker/diffusion_worker.py	Updates worker init to set vLLM compilation config and current vLLM config context.
vllm_omni/diffusion/worker/diffusion_model_runner.py	Moves/adjusts model-loading context usage and minor whitespace changes.
vllm_omni/diffusion/registry.py	Registers the new HunyuanImage3 architecture → module/class mapping.
vllm_omni/diffusion/models/hunyuan/tokenizer_wrapper.py	Adds tokenizer + message/template utilities for HunyuanImage3.
vllm_omni/diffusion/models/hunyuan/siglip2.py	Adds a SigLIP2 vision transformer implementation used by the pipeline.
vllm_omni/diffusion/models/hunyuan/image_processor.py	Adds image preprocessing and image-info construction for the pipeline.
vllm_omni/diffusion/models/hunyuan/hunyuan_image_3.py	Implements the HunyuanImage3 pipeline integration with vLLM-Omni request flow.
vllm_omni/diffusion/models/hunyuan/hunyuan_image3_utils.py	Adds custom RoPE2D + attention/KV-cache utilities used by the model.
vllm_omni/diffusion/models/hunyuan/autoencoder_kl_3d.py	Adds a 3D-conv VAE implementation used by the model.
vllm_omni/diffusion/distributed/parallel_state.py	Exposes additional parallel groups via `vllm_parallel_state`.
examples/offline_inference/text_to_image/text_to_image.py	Updates example logging/output formatting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-30T03:27:26Z

vllm_omni/diffusion/worker/diffusion_model_runner.py

            and self.cache_backend.is_enabled()
        ):
            self.cache_backend.refresh(self.pipeline, req.sampling_params.num_inference_steps)
+


There is trailing whitespace / an empty indented line after the cache refresh block. Please remove it to satisfy linting (ruff/pycodestyle) and keep diffs clean.

Suggested change

Copilot · 2026-01-30T03:27:26Z

vllm_omni/diffusion/models/hunyuan/tokenizer_wrapper.py

+# Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://github.com/Tencent-Hunyuan/HunyuanImage-3.0/blob/main/LICENSE
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================


These newly added Hunyuan files are marked as licensed under the "TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT" (see header). Since this repository is Apache-2.0, please confirm license compatibility and add any required third-party notices or relicense/rewrite the code as needed before merging (otherwise this can block redistribution).

Copilot · 2026-01-30T03:27:26Z

vllm_omni/entrypoints/async_omni_diffusion.py

            model_type = cfg.get("model_type")
            architectures = cfg.get("architectures") or []
+            if architectures and len(architectures) == 1:
+                od_config.model_class_name = architectures[0]
+            else:
+                raise
+
            if model_type == "bagel" or "BagelForConditionalGeneration" in architectures:
                od_config.model_class_name = "BagelPipeline"
                od_config.tf_model_config = TransformerConfig()


In this config.json fallback, the new if architectures and len(architectures) == 1: ... else: raise runs before the Bagel detection below, so Bagel models with missing/empty architectures (or multiple entries) will re-raise the earlier exception and never reach the Bagel handling. Please mirror entrypoints/omni_diffusion.py by checking model_type == "bagel"/BagelForConditionalGeneration first, then fallback to the single-architecture case, and otherwise raise a clear ValueError.

Copilot · 2026-01-30T03:27:26Z

vllm_omni/diffusion/models/hunyuan_image_3/pipeline_hunyuan_image_3.py

+                elif isinstance(image_i, list):
+                    # time_embed needs a 1-D tensor as input
+                    t_i_emb = self.time_embed(t_i)  # n_{i} x d
+                    image_i_seq_list = [], []
+                    for j in range(len(image_i)):
+                        image_ij = image_i[j]
+                        if image_ij.dim() == 4:
+                            assert image_i[j].shape[0] == 1, "image_i[j] should have a batch dimension of 1"
+                        elif image_ij.dim() == 3:
+                            image_ij = image_ij.unsqueeze(0)
+                        else:
+                            raise ValueError(f"image_i[j] should have 3 or 4 dimensions, got {image_ij.dim()}")
+                        # 1 x one_image_seq_len_{j} x n_embd
+                        image_i_seq_j, _, _ = self.patch_embed(image_ij, t_i_emb[j:j + 1])
+                        image_i_seq_list.append(image_i_seq_j)
+                    # 1 x sum_{j}(one_image_seq_len_{j}) x n_embd
+                    image_i_seq = torch.cat(image_i_seq_list, dim=1)


image_i_seq_list = [], [] initializes a tuple of two lists, so the next line image_i_seq_list.append(...) will raise AttributeError. This should be a single list (e.g., image_i_seq_list = []) before appending patch embeddings.

Copilot · 2026-01-30T03:27:27Z

vllm_omni/diffusion/models/hunyuan/hunyuan_image_3.py

+            for section in sections_i:
+                if 'image' in section['type']:
+                    if isinstance(section['token_height'], list):
+                        assert len(section['token_height']) == len(section['token_height']), \


This assertion compares len(section['token_height']) to itself, so it can never fail even if token_width is a different length. It should validate len(token_height) == len(token_width) to avoid silently producing mismatched (h, w) pairs for RoPE image shapes.

Suggested change

assert len(section['token_height']) == len(section['token_height']), \

assert len(section['token_height']) == len(section['token_width']), \

Copilot · 2026-01-30T03:27:33Z

vllm_omni/diffusion/models/hunyuan/hunyuan_image_3_models.py

+        self.attn = Attention(
+            self.num_heads,
+            self.head_dim,
+            self.scaling,
+            num_kv_heads=self.num_kv_heads,
+            cache_config=cache_config,
+            quant_config=quant_config,
+            prefix=f"{prefix}.attn",
+            attn_type=AttentionType.ENCODER_DECODER,


Keyword argument 'cache_config' is not a supported parameter name of Attention.init.
Keyword argument 'quant_config' is not a supported parameter name of Attention.init.
Keyword argument 'attn_type' is not a supported parameter name of Attention.init.

Suggested change

self.attn = Attention(

self.num_heads,

self.head_dim,

self.scaling,

num_kv_heads=self.num_kv_heads,

cache_config=cache_config,

quant_config=quant_config,

prefix=f"{prefix}.attn",

attn_type=AttentionType.ENCODER_DECODER,

attn_kwargs = {}

attn_sig = inspect.signature(Attention)

if "cache_config" in attn_sig.parameters:

attn_kwargs["cache_config"] = cache_config

if "quant_config" in attn_sig.parameters:

attn_kwargs["quant_config"] = quant_config

if "attn_type" in attn_sig.parameters:

attn_kwargs["attn_type"] = AttentionType.ENCODER_DECODER

if "prefix" in attn_sig.parameters:

attn_kwargs["prefix"] = f"{prefix}.attn"

self.attn = Attention(

self.num_heads,

self.head_dim,

self.scaling,

num_kv_heads=self.num_kv_heads,

**attn_kwargs,

Copilot · 2026-01-30T03:27:33Z

vllm_omni/diffusion/models/hunyuan/hunyuan_image_3_models.py

+
+        params_dict = dict(self.named_parameters())
+        loaded_params: set[str] = set()
+        pass


Unnecessary 'pass' statement.

Suggested change

pass

Copilot · 2026-01-30T03:27:33Z

vllm_omni/diffusion/models/hunyuan/hunyuan_image_3_models.py

+    def __init__(self, config: HunyuanImage3Config, prefix: str = ""):
+        super().__init__()
+
+        config = config


This assignment assigns a variable to itself.

Suggested change

config = config

Copilot · 2026-01-30T03:27:33Z

vllm_omni/diffusion/models/hunyuan/hunyuan_image_3_models.py

+            return_all_pos=return_all_pos,
+        )
+        if return_all_pos:
+            cos, sin, all_pos = res


Left hand side of assignment contains 3 variables, but right hand side is a tuple of length 2.

Suggested change

cos, sin, all_pos = res

# Be robust to both 2-tuple and 3-tuple returns from build_2d_rope

if isinstance(res, tuple) and len(res) == 3:

cos, sin, all_pos = res

elif isinstance(res, tuple) and len(res) == 2:

cos, sin = res

all_pos = None

else:

raise ValueError(

"build_2d_rope must return a tuple of length 2 or 3 "

f"when return_all_pos={return_all_pos}, got: {type(res)} with length "

f"{len(res) if isinstance(res, tuple) else 'N/A'}"

)

Copilot · 2026-01-30T03:27:33Z

vllm_omni/diffusion/models/hunyuan/hunyuan_image_3_models.py

+        if return_all_pos:
+            cos, sin, all_pos = res
+        else:
+            cos, sin = res


Left hand side of assignment contains 2 variables, but right hand side is a tuple of length 3.

Suggested change

cos, sin = res

if len(res) == 3:

cos, sin, _ = res

else:

cos, sin = res

Semmer2

please check the redundant codes carefully

Semmer2 · 2026-01-30T03:21:03Z

vllm_omni/diffusion/worker/diffusion_worker.py

+            )
+            self.model_runner.load_model(
+                memory_pool_context_fn=self._maybe_get_memory_pool_context,
+            )


Add more model inference verify

Semmer2 · 2026-01-30T03:21:42Z

vllm_omni/diffusion/models/hunyuan/siglip2.py

+        return setattr(self, key, value)
+
+
+class Siglip2VisionEmbeddings(nn.Module):


Check if this class can be replaced by vllm module
And can other classes can replaced.

Semmer2 · 2026-01-30T03:25:28Z

vllm_omni/diffusion/models/hunyuan/image_processor.py

@@ -0,0 +1,125 @@
+# Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT (the "License");


image_processor.py move to model file.

Semmer2 · 2026-01-30T03:27:59Z

vllm_omni/diffusion/models/hunyuan/tokenizer_wrapper.py

@@ -0,0 +1,1426 @@
+# Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT (the "License");


move some the pre-process code to process

Semmer2 · 2026-01-30T03:33:23Z

vllm_omni/diffusion/models/hunyuan/hunyuan_image_3_models.py

+        return self.weight * hidden_states.to(input_dtype)
+
+
+class HunyuanImage3SDPAAttention(nn.Module):


remove these unnecessary codes.

Semmer2 · 2026-01-30T03:40:09Z

vllm_omni/diffusion/models/hunyuan/hunyuan_image_3_models.py

+            return [], {}
+
+    # rename for delay load
+    def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]):


check if those load_weight can be vllm load_weights

Semmer2 · 2026-01-30T03:41:10Z

vllm_omni/diffusion/models/hunyuan/hunyuan_image_3_models.py

+    prev_sample: torch.FloatTensor
+
+
+class FlowMatchDiscreteScheduler(SchedulerMixin, ConfigMixin):


check if theres scheduler can be replaced by diffuser scheduler

Semmer2 · 2026-01-30T03:50:12Z

vllm_omni/diffusion/distributed/parallel_state.py

        backend=backend,
        parallel_mode="data",
    )
+    vllm_parallel_state._DP = _DP


double check if this code necessary

hsliuustc0106 · 2026-02-03T16:18:36Z

is there any acc test?
what type of parallelism are you using?

Semmer2 · 2026-02-05T00:50:31Z

This is the omni benchmark report:

serve cmdline: vllm serve /data/HunyuanImage-3.0/ --omni --port 8080 --tensor_parallel_size 8

result

python3 benchmarks/diffusion/diffusion_benchmark_serving.py         --dataset vbench --task t2i --num-prompts 10         --height 1024 --width 1024 --port 8080
Downloading VBench T2V prompts to /root/.cache/vllm-omni/vbench_subject_consistency.txt...
Failed to download VBench prompts: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /Vchitect/VBench/master/prompts/prompts_per_dimension/subject_consistency.txt (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1010)')))
Loading requests...
Prepared 50 requests from vbench dataset.
  0%|                                                                                                                                                                                                                             | 0/50 [00:00<?, ?it/s]Running 1 warmup request(s)                 with num_inference_steps=1...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [1:13:31<00:00, 88.23s/it]

================= Serving Benchmark Result =================
Model:                                   default        
Dataset:                                 vbench         
Task:                                    t2i            
--------------------------------------------------
Benchmark duration (s):                  4403.81        
Request rate:                            inf            
Max request concurrency:                 1              
Successful requests:                     50/50             
--------------------------------------------------
Request throughput (req/s):              0.01           
Latency Mean (s):                        88.0760        
Latency Median (s):                      88.1351        
Latency P99 (s):                         88.6123        

============================================================

Semmer2 · 2026-02-05T01:47:05Z

is there any acc test? what type of parallelism are you using?

We did not run any full acc test, but we did compare the result with HunyuanImage3(vllm-omni) with HunyuanImage3(vllm),

and fully aligned(with aligned topk func and output tensor cos sim=1.0) the result with layer-wise/step-wise output.

And we run all the test above with Nvidia A100 80GB * 8 with TP=8.

princepride · 2026-02-05T04:45:15Z

vllm_omni/diffusion/models/hunyuan_image_3/autoencoder.py

+        result_row = []
+        for i, tile in enumerate(row):
+            if i > 0:
+                tile = self.blend_t(row[i - 1], tile, blend_extent)


Seems blend_t method is never defined in the AutoencoderKLConv3D class

princepride · 2026-02-05T04:47:03Z

vllm_omni/diffusion/models/hunyuan_image_3/autoencoder.py

+            if self.use_temporal_tiling and x.shape[-3] > self.tile_sample_min_tsize:
+                return self.temporal_tiled_encode(x)
+            if self.use_spatial_tiling and (
+                x.shape[-1] > self.tile_sample_min_size or x.shape[-2] > self.tile_sample_min_size
+            ):
+                return self.spatial_tiled_encode(x)


Seems the self.temporal_tiled_encode(x) and self.spatial_tiled_encode(x) , are not defined in the AutoencoderKLConv3D class.

princepride · 2026-02-05T04:48:23Z

vllm_omni/diffusion/models/hunyuan_image_3/pipeline_hunyuan_image_3.py

+        if batch_cond_image_info is not None and len(batch_cond_image_info[0]) > 0:
+            cond_vae_images, cond_timestep, cond_vit_images = self._encode_cond_image(
+                batch_cond_image_info, cfg_factor[mode]
+            )


self._encode_cond_image never defined

princepride · 2026-02-05T04:50:37Z

vllm_omni/diffusion/models/hunyuan_image_3/pipeline_hunyuan_image_3.py

+        height = req.sampling_params.height or height or self.default_sample_size * self.vae_scale_factor
+        width = req.sampling_params.width or width or self.default_sample_size * self.vae_scale_factor


self.default_sample_size and self.vae_scale_factor are never defined

self.default_sample_size and self.vae_scale_factor are never defined

Sure, we do not fully support i2i for now, so some code maybe incomplete, all the missing funcs ans vars have been added.

princepride · 2026-02-05T04:52:11Z

@Semmer2 PTAL, and pre-commit failed.

princepride · 2026-02-05T04:57:23Z

@Semmer2 you need add init.py under the model's folder, otherwise the docs can't be created.

Semmer2 · 2026-02-06T00:49:37Z

@Semmer2 PTAL, and pre-commit failed.

Hi, all the PR checks passed, and codes have been rebased on latest main branch. Any further comments are welcomed.

ZJY0516 · 2026-02-06T06:36:51Z

vllm_omni/diffusion/models/hunyuan_image_3/autoencoder.py

+        return module(*inputs)
+
+
+class Conv3d(nn.Conv3d):


For your information, conv3d of torch 2.9 has critical performance bug, see #982

Got it, thank you.

ZJY0516 · 2026-02-06T06:39:07Z

vllm_omni/diffusion/models/hunyuan_image_3/hunyuan_image_3_transformer.py

+    return torch.cat((-x2, x1), dim=-1)
+
+
+def apply_rotary_pos_emb(


Could you please try to use rope layer implemnted in vllm-omni instaead

Sure, replaced with Omni inner class RotaryEmbedding

ZJY0516 · 2026-02-06T06:40:31Z

vllm_omni/diffusion/models/hunyuan_image_3/hunyuan_image_3_transformer.py

+    sep: str = "\n\n"
+
+
+class TokenizerWrapper:


Let's put this into another seperate file.

Sure, moved it to hunyuan_image_3_tokenizer.py

vllm_omni/diffusion/models/hunyuan_image_3/pipeline_hunyuan_image_3.py

hsliuustc0106 · 2026-02-09T11:38:49Z

please add to supported models list

Co-authored-by: ElleElleWu <1608928702@qq.com> Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Signed-off-by: Semmer2 <semmer@live.cn>

…ct#1085) Signed-off-by: Semmer2 <semmer@live.cn>

ElleElleWu requested a review from hsliuustc0106 as a code owner January 29, 2026 13:43

chatgpt-codex-connector bot reviewed Jan 29, 2026

View reviewed changes

hsliuustc0106 requested a review from Copilot January 30, 2026 03:16

Copilot started reviewing on behalf of hsliuustc0106 January 30, 2026 03:17 View session

Copilot AI reviewed Jan 30, 2026

View reviewed changes

Semmer2 reviewed Jan 30, 2026

View reviewed changes

skf-1999 force-pushed the HunyuanImage3IntergrationGPU branch 8 times, most recently from 3ea33b9 to f0f984c Compare February 2, 2026 09:09

hsliuustc0106 mentioned this pull request Feb 3, 2026

[RFC]: vLLM-Omni 2026 Q1 Roadmap #677

Open

41 tasks

skf-1999 force-pushed the HunyuanImage3IntergrationGPU branch 4 times, most recently from dd9e888 to 1b38e43 Compare February 3, 2026 16:01

skf-1999 force-pushed the HunyuanImage3IntergrationGPU branch 2 times, most recently from 0216c48 to 632b83e Compare February 4, 2026 06:46

natureofnature mentioned this pull request Feb 4, 2026

[RFC]: Omni Connector for Full Disaggregation Architecture 2026 Q1 Roadmap #1192

Open

1 task

skf-1999 force-pushed the HunyuanImage3IntergrationGPU branch from 0d7c4c0 to 7daca33 Compare February 4, 2026 08:07

Semmer2 force-pushed the HunyuanImage3IntergrationGPU branch from 7daca33 to 0d9878c Compare February 4, 2026 11:58

princepride requested changes Feb 5, 2026

View reviewed changes

Semmer2 force-pushed the HunyuanImage3IntergrationGPU branch 9 times, most recently from d2513f2 to f72f476 Compare February 5, 2026 11:48

hsliuustc0106 added the ready label to trigger buildkite CI label Feb 5, 2026

hsliuustc0106 changed the title ~~[Model] SupportHunyuanImage3 Diffusion Model in GPU~~ [Model] SupportHunyuanImage3 Diffusion Model in vllm-omni Feb 6, 2026

ZJY0516 reviewed Feb 6, 2026

View reviewed changes

Semmer2 force-pushed the HunyuanImage3IntergrationGPU branch 2 times, most recently from 5fb88fe to 703d750 Compare February 8, 2026 08:52

[Model] SupportHunyuanImage3 Diffusion Model in GPU

d1b5088

Co-authored-by: ElleElleWu <1608928702@qq.com> Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Signed-off-by: Semmer2 <semmer@live.cn>

Semmer2 force-pushed the HunyuanImage3IntergrationGPU branch from a9ca3de to d1b5088 Compare February 9, 2026 12:01

hsliuustc0106 requested a review from princepride February 9, 2026 12:24

hsliuustc0106 merged commit 5fea482 into vllm-project:main Feb 9, 2026
7 checks passed

YanickSchraner pushed a commit to YanickSchraner/vllm-omni that referenced this pull request Feb 20, 2026

[Model] SupportHunyuanImage3 Diffusion Model in vllm-omni (vllm-proje…

8c2b3f2

…ct#1085) Signed-off-by: Semmer2 <semmer@live.cn>

nussejzz mentioned this pull request Feb 25, 2026

[RFC]: HunyuanImage 3.0 deployment #1483

Open

7 tasks

fhfuih mentioned this pull request Feb 26, 2026

[Bug]: Hunyuan Image 3.0 output image is noise #1489

Open

1 task

	if profiler_enabled:
	print("[Profiler] Starting profiling...")
	omni.start_profile()

	assert len(section['token_height']) == len(section['token_height']), \
	assert len(section['token_height']) == len(section['token_width']), \

-            cos, sin, all_pos = res
+            # Be robust to both 2-tuple and 3-tuple returns from build_2d_rope
+            if isinstance(res, tuple) and len(res) == 3:
+                cos, sin, all_pos = res
+            elif isinstance(res, tuple) and len(res) == 2:
+                cos, sin = res
+                all_pos = None
+            else:
+                raise ValueError(
+                    "build_2d_rope must return a tuple of length 2 or 3 "
+                    f"when return_all_pos={return_all_pos}, got: {type(res)} with length "
+                    f"{len(res) if isinstance(res, tuple) else 'N/A'}"
+                )

		return setattr(self, key, value)


		class Siglip2VisionEmbeddings(nn.Module):

		@@ -0,0 +1,125 @@
		# Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT (the "License");

		@@ -0,0 +1,1426 @@
		# Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT (the "License");

		return self.weight * hidden_states.to(input_dtype)


		class HunyuanImage3SDPAAttention(nn.Module):

		prev_sample: torch.FloatTensor


		class FlowMatchDiscreteScheduler(SchedulerMixin, ConfigMixin):

		height = req.sampling_params.height or height or self.default_sample_size * self.vae_scale_factor
		width = req.sampling_params.width or width or self.default_sample_size * self.vae_scale_factor

		return torch.cat((-x2, x1), dim=-1)


		def apply_rotary_pos_emb(

Conversation

ElleElleWu commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Result

1. Test Environment

2. Offline inference

- CMD

- Execution Result Output

3. Online Inference

- command

- Online Request

- Execution Result Output

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

david6666666 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Semmer2 commented Jan 30, 2026

Uh oh!

david6666666 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Semmer2 commented Jan 30, 2026

Uh oh!

hsliuustc0106 commented Jan 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Semmer2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ElleElleWu commented Jan 29, 2026 •

edited

Loading

david6666666 commented Jan 30, 2026 •

edited

Loading

david6666666 commented Jan 30, 2026 •

edited

Loading

Semmer2 commented Feb 5, 2026 •

edited

Loading