[Model] SupportHunyuanImage3 Diffusion Model in vllm-omni#1085
[Model] SupportHunyuanImage3 Diffusion Model in vllm-omni#1085hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
vllm-omni/examples/offline_inference/text_to_image/text_to_image.py
Lines 183 to 185 in 9f2fb36
The script still branches on profiler_enabled, but this commit removed its initialization (profiler_enabled = bool(os.getenv("VLLM_TORCH_PROFILER_DIR"))). As a result, running the offline text-to-image example now raises a NameError at runtime before any generation happens, even when profiling is not requested. Reintroduce the initialization or fold the environment check directly into the if condition.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
HunyuanImage3 model supports both AR and DiT, we designed the two stages together, the first part AR runs VL inference alone, and the second part supports DiT inference. In next few days, we will combine the two parts together, which means it can run AR + DiT. |
OK, Please provide additional details in the Purpose section of this pr to avoid ambiguity. Thank you. |
|
please using benchmark diffusion to output the performance |
There was a problem hiding this comment.
Pull request overview
Adds GPU support for the HunyuanImage-3 diffusion model and integrates it into the vLLM-Omni diffusion framework as a selectable pipeline.
Changes:
- Extend diffusion entrypoints to infer
model_class_namefromconfig.jsonarchitectures. - Register and add a new HunyuanImage3 diffusion pipeline implementation (tokenizer/image/VAE/ViT utilities).
- Adjust diffusion worker/context initialization and parallel-state exposure for integration.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 38 comments.
Show a summary per file
| File | Description |
|---|---|
| vllm_omni/entrypoints/omni_diffusion.py | Adds fallback to use architectures[0] as model_class_name for non-diffusers models. |
| vllm_omni/entrypoints/async_omni_diffusion.py | Adds similar architectures fallback logic for async diffusion entrypoint. |
| vllm_omni/diffusion/worker/diffusion_worker.py | Updates worker init to set vLLM compilation config and current vLLM config context. |
| vllm_omni/diffusion/worker/diffusion_model_runner.py | Moves/adjusts model-loading context usage and minor whitespace changes. |
| vllm_omni/diffusion/registry.py | Registers the new HunyuanImage3 architecture → module/class mapping. |
| vllm_omni/diffusion/models/hunyuan/tokenizer_wrapper.py | Adds tokenizer + message/template utilities for HunyuanImage3. |
| vllm_omni/diffusion/models/hunyuan/siglip2.py | Adds a SigLIP2 vision transformer implementation used by the pipeline. |
| vllm_omni/diffusion/models/hunyuan/image_processor.py | Adds image preprocessing and image-info construction for the pipeline. |
| vllm_omni/diffusion/models/hunyuan/hunyuan_image_3.py | Implements the HunyuanImage3 pipeline integration with vLLM-Omni request flow. |
| vllm_omni/diffusion/models/hunyuan/hunyuan_image3_utils.py | Adds custom RoPE2D + attention/KV-cache utilities used by the model. |
| vllm_omni/diffusion/models/hunyuan/autoencoder_kl_3d.py | Adds a 3D-conv VAE implementation used by the model. |
| vllm_omni/diffusion/distributed/parallel_state.py | Exposes additional parallel groups via vllm_parallel_state. |
| examples/offline_inference/text_to_image/text_to_image.py | Updates example logging/output formatting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| and self.cache_backend.is_enabled() | ||
| ): | ||
| self.cache_backend.refresh(self.pipeline, req.sampling_params.num_inference_steps) | ||
|
|
There was a problem hiding this comment.
There is trailing whitespace / an empty indented line after the cache refresh block. Please remove it to satisfy linting (ruff/pycodestyle) and keep diffs clean.
| # Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # https://github.com/Tencent-Hunyuan/HunyuanImage-3.0/blob/main/LICENSE | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| # ============================================================================== |
There was a problem hiding this comment.
These newly added Hunyuan files are marked as licensed under the "TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT" (see header). Since this repository is Apache-2.0, please confirm license compatibility and add any required third-party notices or relicense/rewrite the code as needed before merging (otherwise this can block redistribution).
| model_type = cfg.get("model_type") | ||
| architectures = cfg.get("architectures") or [] | ||
| if architectures and len(architectures) == 1: | ||
| od_config.model_class_name = architectures[0] | ||
| else: | ||
| raise | ||
|
|
||
| if model_type == "bagel" or "BagelForConditionalGeneration" in architectures: | ||
| od_config.model_class_name = "BagelPipeline" | ||
| od_config.tf_model_config = TransformerConfig() |
There was a problem hiding this comment.
In this config.json fallback, the new if architectures and len(architectures) == 1: ... else: raise runs before the Bagel detection below, so Bagel models with missing/empty architectures (or multiple entries) will re-raise the earlier exception and never reach the Bagel handling. Please mirror entrypoints/omni_diffusion.py by checking model_type == "bagel"/BagelForConditionalGeneration first, then fallback to the single-architecture case, and otherwise raise a clear ValueError.
| elif isinstance(image_i, list): | ||
| # time_embed needs a 1-D tensor as input | ||
| t_i_emb = self.time_embed(t_i) # n_{i} x d | ||
| image_i_seq_list = [], [] | ||
| for j in range(len(image_i)): | ||
| image_ij = image_i[j] | ||
| if image_ij.dim() == 4: | ||
| assert image_i[j].shape[0] == 1, "image_i[j] should have a batch dimension of 1" | ||
| elif image_ij.dim() == 3: | ||
| image_ij = image_ij.unsqueeze(0) | ||
| else: | ||
| raise ValueError(f"image_i[j] should have 3 or 4 dimensions, got {image_ij.dim()}") | ||
| # 1 x one_image_seq_len_{j} x n_embd | ||
| image_i_seq_j, _, _ = self.patch_embed(image_ij, t_i_emb[j:j + 1]) | ||
| image_i_seq_list.append(image_i_seq_j) | ||
| # 1 x sum_{j}(one_image_seq_len_{j}) x n_embd | ||
| image_i_seq = torch.cat(image_i_seq_list, dim=1) |
There was a problem hiding this comment.
image_i_seq_list = [], [] initializes a tuple of two lists, so the next line image_i_seq_list.append(...) will raise AttributeError. This should be a single list (e.g., image_i_seq_list = []) before appending patch embeddings.
| for section in sections_i: | ||
| if 'image' in section['type']: | ||
| if isinstance(section['token_height'], list): | ||
| assert len(section['token_height']) == len(section['token_height']), \ |
There was a problem hiding this comment.
This assertion compares len(section['token_height']) to itself, so it can never fail even if token_width is a different length. It should validate len(token_height) == len(token_width) to avoid silently producing mismatched (h, w) pairs for RoPE image shapes.
| assert len(section['token_height']) == len(section['token_height']), \ | |
| assert len(section['token_height']) == len(section['token_width']), \ |
| self.attn = Attention( | ||
| self.num_heads, | ||
| self.head_dim, | ||
| self.scaling, | ||
| num_kv_heads=self.num_kv_heads, | ||
| cache_config=cache_config, | ||
| quant_config=quant_config, | ||
| prefix=f"{prefix}.attn", | ||
| attn_type=AttentionType.ENCODER_DECODER, |
There was a problem hiding this comment.
Keyword argument 'cache_config' is not a supported parameter name of Attention.init.
Keyword argument 'quant_config' is not a supported parameter name of Attention.init.
Keyword argument 'attn_type' is not a supported parameter name of Attention.init.
| self.attn = Attention( | |
| self.num_heads, | |
| self.head_dim, | |
| self.scaling, | |
| num_kv_heads=self.num_kv_heads, | |
| cache_config=cache_config, | |
| quant_config=quant_config, | |
| prefix=f"{prefix}.attn", | |
| attn_type=AttentionType.ENCODER_DECODER, | |
| attn_kwargs = {} | |
| attn_sig = inspect.signature(Attention) | |
| if "cache_config" in attn_sig.parameters: | |
| attn_kwargs["cache_config"] = cache_config | |
| if "quant_config" in attn_sig.parameters: | |
| attn_kwargs["quant_config"] = quant_config | |
| if "attn_type" in attn_sig.parameters: | |
| attn_kwargs["attn_type"] = AttentionType.ENCODER_DECODER | |
| if "prefix" in attn_sig.parameters: | |
| attn_kwargs["prefix"] = f"{prefix}.attn" | |
| self.attn = Attention( | |
| self.num_heads, | |
| self.head_dim, | |
| self.scaling, | |
| num_kv_heads=self.num_kv_heads, | |
| **attn_kwargs, |
|
|
||
| params_dict = dict(self.named_parameters()) | ||
| loaded_params: set[str] = set() | ||
| pass |
There was a problem hiding this comment.
Unnecessary 'pass' statement.
| pass |
| def __init__(self, config: HunyuanImage3Config, prefix: str = ""): | ||
| super().__init__() | ||
|
|
||
| config = config |
There was a problem hiding this comment.
This assignment assigns a variable to itself.
| config = config |
| return_all_pos=return_all_pos, | ||
| ) | ||
| if return_all_pos: | ||
| cos, sin, all_pos = res |
There was a problem hiding this comment.
Left hand side of assignment contains 3 variables, but right hand side is a tuple of length 2.
| cos, sin, all_pos = res | |
| # Be robust to both 2-tuple and 3-tuple returns from build_2d_rope | |
| if isinstance(res, tuple) and len(res) == 3: | |
| cos, sin, all_pos = res | |
| elif isinstance(res, tuple) and len(res) == 2: | |
| cos, sin = res | |
| all_pos = None | |
| else: | |
| raise ValueError( | |
| "build_2d_rope must return a tuple of length 2 or 3 " | |
| f"when return_all_pos={return_all_pos}, got: {type(res)} with length " | |
| f"{len(res) if isinstance(res, tuple) else 'N/A'}" | |
| ) |
| if return_all_pos: | ||
| cos, sin, all_pos = res | ||
| else: | ||
| cos, sin = res |
Semmer2
left a comment
There was a problem hiding this comment.
please check the redundant codes carefully
| ) | ||
| self.model_runner.load_model( | ||
| memory_pool_context_fn=self._maybe_get_memory_pool_context, | ||
| ) |
There was a problem hiding this comment.
Add more model inference verify
| return setattr(self, key, value) | ||
|
|
||
|
|
||
| class Siglip2VisionEmbeddings(nn.Module): |
There was a problem hiding this comment.
Check if this class can be replaced by vllm module
And can other classes can replaced.
| @@ -0,0 +1,125 @@ | |||
| # Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT (the "License"); | |||
There was a problem hiding this comment.
image_processor.py move to model file.
| @@ -0,0 +1,1426 @@ | |||
| # Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT (the "License"); | |||
There was a problem hiding this comment.
move some the pre-process code to process
| return self.weight * hidden_states.to(input_dtype) | ||
|
|
||
|
|
||
| class HunyuanImage3SDPAAttention(nn.Module): |
There was a problem hiding this comment.
remove these unnecessary codes.
| return [], {} | ||
|
|
||
| # rename for delay load | ||
| def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]): |
There was a problem hiding this comment.
check if those load_weight can be vllm load_weights
| prev_sample: torch.FloatTensor | ||
|
|
||
|
|
||
| class FlowMatchDiscreteScheduler(SchedulerMixin, ConfigMixin): |
There was a problem hiding this comment.
check if theres scheduler can be replaced by diffuser scheduler
| backend=backend, | ||
| parallel_mode="data", | ||
| ) | ||
| vllm_parallel_state._DP = _DP |
There was a problem hiding this comment.
double check if this code necessary
3ea33b9 to
f0f984c
Compare
dd9e888 to
1b38e43
Compare
|
is there any acc test? |
0216c48 to
632b83e
Compare
0d7c4c0 to
7daca33
Compare
7daca33 to
0d9878c
Compare
|
This is the omni benchmark report: serve cmdline: result python3 benchmarks/diffusion/diffusion_benchmark_serving.py --dataset vbench --task t2i --num-prompts 10 --height 1024 --width 1024 --port 8080
Downloading VBench T2V prompts to /root/.cache/vllm-omni/vbench_subject_consistency.txt...
Failed to download VBench prompts: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /Vchitect/VBench/master/prompts/prompts_per_dimension/subject_consistency.txt (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1010)')))
Loading requests...
Prepared 50 requests from vbench dataset.
0%| | 0/50 [00:00<?, ?it/s]Running 1 warmup request(s) with num_inference_steps=1...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [1:13:31<00:00, 88.23s/it]
================= Serving Benchmark Result =================
Model: default
Dataset: vbench
Task: t2i
--------------------------------------------------
Benchmark duration (s): 4403.81
Request rate: inf
Max request concurrency: 1
Successful requests: 50/50
--------------------------------------------------
Request throughput (req/s): 0.01
Latency Mean (s): 88.0760
Latency Median (s): 88.1351
Latency P99 (s): 88.6123
============================================================ |
| result_row = [] | ||
| for i, tile in enumerate(row): | ||
| if i > 0: | ||
| tile = self.blend_t(row[i - 1], tile, blend_extent) |
There was a problem hiding this comment.
Seems blend_t method is never defined in the AutoencoderKLConv3D class
| if self.use_temporal_tiling and x.shape[-3] > self.tile_sample_min_tsize: | ||
| return self.temporal_tiled_encode(x) | ||
| if self.use_spatial_tiling and ( | ||
| x.shape[-1] > self.tile_sample_min_size or x.shape[-2] > self.tile_sample_min_size | ||
| ): | ||
| return self.spatial_tiled_encode(x) |
There was a problem hiding this comment.
Seems the self.temporal_tiled_encode(x) and self.spatial_tiled_encode(x) , are not defined in the AutoencoderKLConv3D class.
| if batch_cond_image_info is not None and len(batch_cond_image_info[0]) > 0: | ||
| cond_vae_images, cond_timestep, cond_vit_images = self._encode_cond_image( | ||
| batch_cond_image_info, cfg_factor[mode] | ||
| ) |
There was a problem hiding this comment.
self._encode_cond_image never defined
| height = req.sampling_params.height or height or self.default_sample_size * self.vae_scale_factor | ||
| width = req.sampling_params.width or width or self.default_sample_size * self.vae_scale_factor |
There was a problem hiding this comment.
self.default_sample_size and self.vae_scale_factor are never defined
There was a problem hiding this comment.
self.default_sample_size and self.vae_scale_factor are never defined
Sure, we do not fully support i2i for now, so some code maybe incomplete, all the missing funcs ans vars have been added.
|
@Semmer2 PTAL, and pre-commit failed. |
|
@Semmer2 you need add init.py under the model's folder, otherwise the docs can't be created. |
d2513f2 to
f72f476
Compare
Hi, all the PR checks passed, and codes have been rebased on latest main branch. Any further comments are welcomed. |
| return module(*inputs) | ||
|
|
||
|
|
||
| class Conv3d(nn.Conv3d): |
There was a problem hiding this comment.
For your information, conv3d of torch 2.9 has critical performance bug, see #982
| return torch.cat((-x2, x1), dim=-1) | ||
|
|
||
|
|
||
| def apply_rotary_pos_emb( |
There was a problem hiding this comment.
Could you please try to use rope layer implemnted in vllm-omni instaead
There was a problem hiding this comment.
Sure, replaced with Omni inner class RotaryEmbedding
| sep: str = "\n\n" | ||
|
|
||
|
|
||
| class TokenizerWrapper: |
There was a problem hiding this comment.
Let's put this into another seperate file.
There was a problem hiding this comment.
Sure, moved it to hunyuan_image_3_tokenizer.py
vllm_omni/diffusion/models/hunyuan_image_3/pipeline_hunyuan_image_3.py
Outdated
Show resolved
Hide resolved
5fb88fe to
703d750
Compare
|
please add to supported models list |
Co-authored-by: ElleElleWu <1608928702@qq.com> Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Signed-off-by: Semmer2 <semmer@live.cn>
a9ca3de to
d1b5088
Compare
…ct#1085) Signed-off-by: Semmer2 <semmer@live.cn>

Co-authored-by: skf1999 13234016272@163.com
Co-authored-by: Just-it 1161406585@qq.com
Co-authored-by: Semmer2 semmer@live.cn
Purpose
Support HunyuanImage as a DiT model and integrating it into the Omni framework as a standalone stage, enabling the core image generation workflow.
Test Result
1. Test Environment
2. Offline inference
- CMD
- Execution Result Output
3. Online Inference
- command
- Online Request
- Execution Result Output
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)