Skip to content

[Model] Add Hunyuan Image3 AR Support#759

Open
usberkeley wants to merge 18 commits intovllm-project:mainfrom
usberkeley:hunyuan-image3
Open

[Model] Add Hunyuan Image3 AR Support#759
usberkeley wants to merge 18 commits intovllm-project:mainfrom
usberkeley:hunyuan-image3

Conversation

@usberkeley
Copy link

@usberkeley usberkeley commented Jan 13, 2026

Purpose

This PR adds support for the Hunyuan Image3 model to vLLM-Omni. Hunyuan Image3 is a multimodal image generation model developed by Tencent, supporting text-to-image generation tasks.

Test Plan

  1. Text input test
  • GPU: 8 x L40S (48GB)
  • TP: 8

Note: The default configuration in hunyuan_image_3_moe.yaml is tensor_parallel_size: 8.

from vllm_omni.entrypoints.omni import Omni

if __name__ == "__main__":
    omni = Omni(model="tencent/HunyuanImage-3.0")
    prompts = [
    {
        "prompt": "<|im_start|>system\nYou are Qwen.<|im_end|>\n<|im_start|>user\nExplain the system architecture for a scalable audio generation pipeline. Answer in 15 words.<|im_end|>\n<|im_start|>assistant\n",
        "modalities": ["text"]
    }
    ]
    omni_outputs = omni.generate(prompts)
    print(omni_outputs[0].request_output[0].outputs[0].text)
a68c8020-1416-4f58-98b1-73d7bbd61ee8
  1. Multimodal input test
    TODO

Test Result

TODO


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@usberkeley usberkeley force-pushed the hunyuan-image3 branch 2 times, most recently from ed4d687 to bb011f2 Compare January 14, 2026 09:09
@usberkeley usberkeley marked this pull request as ready for review January 15, 2026 03:21
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb011f27c7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@hsliuustc0106
Copy link
Collaborator

please paste your test example command

@usberkeley
Copy link
Author

usberkeley commented Jan 15, 2026

please paste your test example command

Hi @hsliuustc0106

  1. Text input test
  • GPU: 8 x L40S (48GB)
  • TP: 8

Note: The default configuration in hunyuan_image_3_moe.yaml is tensor_parallel_size: 8.

from vllm_omni.entrypoints.omni import Omni

if __name__ == "__main__":
    omni = Omni(model="tencent/HunyuanImage-3.0")
    prompts = [
    {
        "prompt": "<|im_start|>system\nYou are Qwen.<|im_end|>\n<|im_start|>user\nExplain the system architecture for a scalable audio generation pipeline. Answer in 15 words.<|im_end|>\n<|im_start|>assistant\n",
        "modalities": ["text"]
    }
    ]
    omni_outputs = omni.generate(prompts)
    print(omni_outputs[0].request_output[0].outputs[0].text)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial vLLM-Omni autoregressive (AR) integration for Tencent’s Hunyuan Image3 model, including model registration and a default stage config.

Changes:

  • Updates AR GPU runner postprocessing to use a shared multimodal-output extraction helper.
  • Registers HunyuanImage3ForCausalMM in the Omni model registry.
  • Introduces a new Hunyuan Image3 model implementation + utilities and a new stage config YAML.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
vllm_omni/worker/gpu_ar_model_runner.py Switches to extract_multimodal_outputs for postprocessing model outputs.
vllm_omni/model_executor/stage_configs/hunyuan_image_3_moe.yaml Adds a default stage config for running Hunyuan Image3 with the AR worker/scheduler.
vllm_omni/model_executor/models/registry.py Registers the Hunyuan Image3 model architecture for lazy loading.
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0_utils.py Adds Hunyuan-specific RoPE2D + image KV cache helper utilities.
vllm_omni/model_executor/models/hunyuan_image3_0/hunyuan_image3_0.py Adds the main Hunyuan Image3 model implementation (decoder, attention, MoE, weight loading).
vllm_omni/model_executor/models/hunyuan_image3_0/__init__.py Exposes the new model class for import.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@david6666666
Copy link
Collaborator

@usberkeley
Copy link
Author

any updated? Tencent has just released https://huggingface.co/tencent/HunyuanImage-3.0-Instruct https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil

Got it. we are working on image encoder and will follow up the update of new release

@usberkeley
Copy link
Author

Hi @princepride

When you have a moment, please review this code. thanks!

@princepride
Copy link
Collaborator

@usberkeley Can you rebase your code first, we have changed some code in ar_model_runner.

@usberkeley usberkeley marked this pull request as draft February 2, 2026 10:14
@usberkeley usberkeley force-pushed the hunyuan-image3 branch 3 times, most recently from 71570e7 to b8d58b5 Compare February 4, 2026 03:17
@usberkeley usberkeley marked this pull request as ready for review February 4, 2026 03:19
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8d58b560e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@princepride
Copy link
Collaborator

@usberkeley pre-commit failed, PTAL

Copy link
Collaborator

@princepride princepride left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@usberkeley good job! just a little advice.

@princepride
Copy link
Collaborator

@usberkeley I think as for AR model, it should have text as the output

@usberkeley usberkeley marked this pull request as draft February 4, 2026 09:48
@hsliuustc0106 hsliuustc0106 requested a review from Copilot February 5, 2026 15:31
@princepride
Copy link
Collaborator

image @usberkeley I used transformers execute your prompt

"hunyuan_image3",
"hunyuan_image3",
"HunyuanImage3ForConditionalGeneration",
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds 3057 lines of new model code with ZERO test coverage. Add tests to verify: (1) model loads correctly, (2) forward pass produces expected output shapes, (3) memory usage is reasonable, (4) integration with vllm-omni pipeline works. Without tests, we cannot validate correctness or prevent regressions.

# The following config has been verified on 8x L40S-48G GPU.
stage_args:
- stage_id: 0
stage_type: llm # Use llm stage type to launch OmniLLM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config file has no schema validation or documentation. Add comments explaining each parameter's purpose, valid ranges, and default values. Consider adding a schema validator to catch configuration errors early.

@princepride
Copy link
Collaborator

@vllm-omni-reviewer

usberkeley and others added 12 commits February 27, 2026 22:42
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
@usberkeley
Copy link
Author

@vllm-omni-reviewer

Copy link
Contributor

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple more comments on the latest revision. The mRoPE addition looks solid. Main remaining concern is the load_weights indentation bug and dead code.

Removed the load_sharded_safetensors function that manually loads sharded safetensors files.

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Removed unused imports for cleaner code.

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
@hsliuustc0106
Copy link
Collaborator

@vllm-omni-reviewer

这个哥们好像不干活了,我明天看看为啥

Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove it, we don't need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants