[Model] Add Hunyuan Image3 AR Support#759
[Model] Add Hunyuan Image3 AR Support#759usberkeley wants to merge 18 commits intovllm-project:mainfrom
Conversation
ed4d687 to
bb011f2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bb011f27c7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0.py
Outdated
Show resolved
Hide resolved
|
please paste your test example command |
8e586d1 to
274518b
Compare
Note: The default configuration in |
There was a problem hiding this comment.
Pull request overview
Adds initial vLLM-Omni autoregressive (AR) integration for Tencent’s Hunyuan Image3 model, including model registration and a default stage config.
Changes:
- Updates AR GPU runner postprocessing to use a shared multimodal-output extraction helper.
- Registers
HunyuanImage3ForCausalMMin the Omni model registry. - Introduces a new Hunyuan Image3 model implementation + utilities and a new stage config YAML.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
vllm_omni/worker/gpu_ar_model_runner.py |
Switches to extract_multimodal_outputs for postprocessing model outputs. |
vllm_omni/model_executor/stage_configs/hunyuan_image_3_moe.yaml |
Adds a default stage config for running Hunyuan Image3 with the AR worker/scheduler. |
vllm_omni/model_executor/models/registry.py |
Registers the Hunyuan Image3 model architecture for lazy loading. |
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0_utils.py |
Adds Hunyuan-specific RoPE2D + image KV cache helper utilities. |
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0.py |
Adds the main Hunyuan Image3 model implementation (decoder, attention, MoE, weight loading). |
vllm_omni/model_executor/models/hunyuan_image3_0/__init__.py |
Exposes the new model class for import. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
vllm_omni/model_executor/stage_configs/hunyuan_image_3_moe.yaml
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0_utils.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0_utils.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0.py
Outdated
Show resolved
Hide resolved
|
any updated? Tencent has just released https://huggingface.co/tencent/HunyuanImage-3.0-Instruct https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil |
Got it. we are working on image encoder and will follow up the update of new release |
5fabd63 to
eb269b8
Compare
|
Hi @princepride When you have a moment, please review this code. thanks! |
|
@usberkeley Can you rebase your code first, we have changed some code in ar_model_runner. |
71570e7 to
b8d58b5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b8d58b560e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py
Outdated
Show resolved
Hide resolved
|
@usberkeley pre-commit failed, PTAL |
princepride
left a comment
There was a problem hiding this comment.
@usberkeley good job! just a little advice.
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py
Outdated
Show resolved
Hide resolved
|
@usberkeley I think as for AR model, it should have text as the output |
@usberkeley I used transformers execute your prompt
|
| "hunyuan_image3", | ||
| "hunyuan_image3", | ||
| "HunyuanImage3ForConditionalGeneration", | ||
| ), |
There was a problem hiding this comment.
This PR adds 3057 lines of new model code with ZERO test coverage. Add tests to verify: (1) model loads correctly, (2) forward pass produces expected output shapes, (3) memory usage is reasonable, (4) integration with vllm-omni pipeline works. Without tests, we cannot validate correctness or prevent regressions.
| # The following config has been verified on 8x L40S-48G GPU. | ||
| stage_args: | ||
| - stage_id: 0 | ||
| stage_type: llm # Use llm stage type to launch OmniLLM |
There was a problem hiding this comment.
This config file has no schema validation or documentation. Add comments explaining each parameter's purpose, valid ranges, and default values. Consider adding a schema validator to catch configuration errors early.
|
@vllm-omni-reviewer |
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
251d076 to
ed2eaf5
Compare
|
@vllm-omni-reviewer |
lishunyang12
left a comment
There was a problem hiding this comment.
Left a couple more comments on the latest revision. The mRoPE addition looks solid. Main remaining concern is the load_weights indentation bug and dead code.
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3/autoencoder_kl_3d.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py
Outdated
Show resolved
Hide resolved
Removed the load_sharded_safetensors function that manually loads sharded safetensors files. Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Removed unused imports for cleaner code. Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
这个哥们好像不干活了,我明天看看为啥 |
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
There was a problem hiding this comment.
Please remove it, we don't need.

Purpose
This PR adds support for the Hunyuan Image3 model to vLLM-Omni. Hunyuan Image3 is a multimodal image generation model developed by Tencent, supporting text-to-image generation tasks.
Test Plan
Note: The default configuration in
hunyuan_image_3_moe.yamlistensor_parallel_size: 8.TODO
Test Result
TODO
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)