Skip to content

[Model] SupportHunyuanImage3 Diffusion Model in vllm-omni#1085

Merged
hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Semmer2:HunyuanImage3IntergrationGPU
Feb 9, 2026
Merged

[Model] SupportHunyuanImage3 Diffusion Model in vllm-omni#1085
hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Semmer2:HunyuanImage3IntergrationGPU

Conversation

@ElleElleWu
Copy link
Contributor

@ElleElleWu ElleElleWu commented Jan 29, 2026

Co-authored-by: skf1999 13234016272@163.com
Co-authored-by: Just-it 1161406585@qq.com
Co-authored-by: Semmer2 semmer@live.cn

Purpose

Support HunyuanImage as a DiT model and integrating it into the Omni framework as a standalone stage, enabling the core image generation workflow.

Test Result

1. Test Environment

CUDA         Version: 12.9
torch        Version: 2.9.1
vllm         Version: 0.15.0
vllm-omni    Version: 0.15.0

2. Offline inference

- CMD

python3 vllm-omni/examples/offline_inference/text_to_image/text_to_image.py \
--model /data/HunyuanImage-3.0/ \
--prompt "Add a white art board written with colorful text vLLM-Omni on grassland.Add a paintbrush in the bears hands. position the bear standing in front of the art board as if painting" \
--output output_image_edit.png \
--num_inference_steps 50 \
--cfg_scale 4.0 \
--tensor_parallel_size 8

- Execution Result Output

[Stage-0] INFO 01-29 01:28:14 [diffusion_engine.py:104] Generation completed successfully.
[Stage-0] INFO 01-29 01:28:14 [diffusion_engine.py:127] Post-processing completed in 0.0000 seconds
INFO 01-29 01:28:14 [log_utils.py:550] {'type': 'request_level_metrics',
INFO 01-29 01:28:14 [log_utils.py:550]  'request_id': '0_ecebd709-8b6f-4f1d-9c85-2e9fd03ff0ba',
INFO 01-29 01:28:14 [log_utils.py:550]  'e2e_time_ms': 86126.50895118713,
INFO 01-29 01:28:14 [log_utils.py:550]  'e2e_tpt': 0.0,
INFO 01-29 01:28:14 [log_utils.py:550]  'e2e_total_tokens': 0,
INFO 01-29 01:28:14 [log_utils.py:550]  'transfers_total_time_ms': 0.0,
INFO 01-29 01:28:14 [log_utils.py:550]  'transfers_total_bytes': 0,
INFO 01-29 01:28:14 [log_utils.py:550]  'stages': {0: {'stage_gen_time_ms': 86079.94437217712,
INFO 01-29 01:28:14 [log_utils.py:550]                 'num_tokens_out': 0,
INFO 01-29 01:28:14 [log_utils.py:550]                 'num_tokens_in': 0}}}
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:26<00:00, 86.13s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-29 01:28:14 [omni.py:782] [Summary] {'e2e_requests': 1,█████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:26<00:00, 86.13s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]
INFO 01-29 01:28:14 [omni.py:782]  'e2e_total_time_ms': 86127.75588035583,
INFO 01-29 01:28:14 [omni.py:782]  'e2e_sum_time_ms': 86126.50895118713,
INFO 01-29 01:28:14 [omni.py:782]  'e2e_total_tokens': 0,
INFO 01-29 01:28:14 [omni.py:782]  'e2e_avg_time_per_request_ms': 86126.50895118713,
INFO 01-29 01:28:14 [omni.py:782]  'e2e_avg_tokens_per_s': 0.0,
INFO 01-29 01:28:14 [omni.py:782]  'wall_time_ms': 86127.75588035583,
INFO 01-29 01:28:14 [omni.py:782]  'final_stage_id': {'0_ecebd709-8b6f-4f1d-9c85-2e9fd03ff0ba': 0},
INFO 01-29 01:28:14 [omni.py:782]  'stages': [{'stage_id': 0,
INFO 01-29 01:28:14 [omni.py:782]              'requests': 1,
INFO 01-29 01:28:14 [omni.py:782]              'tokens': 0,
INFO 01-29 01:28:14 [omni.py:782]              'total_time_ms': 86126.80411338806,
INFO 01-29 01:28:14 [omni.py:782]              'avg_time_per_request_ms': 86126.80411338806,
INFO 01-29 01:28:14 [omni.py:782]              'avg_tokens_per_s': 0.0}],
INFO 01-29 01:28:14 [omni.py:782]  'transfers': []}
[Stage-0] INFO 01-29 01:28:14 [omni_stage.py:677] Received shutdown signal
Total generation time: 89.6901 seconds (89690.07 ms)

3. Online Inference

- command

vllm serve "/data/HunyuanImage-3.0/" --omni --port "8091" --tensor_parallel_size 8

- Online Request

curl -X POST http://localhost:8091/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Add a white art board written with colorful text vLLM-Omni on grassland. Add a paintbrush in the bear hands. position the bear standing in front of the art board as if painting",
    "num_inference_steps": 50,
    "n": 4,
    "size": "1024x1024",
    "seed": 123
  }' | jq -r '.data[0].b64_json' | base64 -d > dragon.png

- Execution Result Output

[Stage-0] INFO 01-29 00:47:26 [diffusion_engine.py:104] Generation completed successfully.
[Stage-0] INFO 01-29 00:47:26 [diffusion_engine.py:127] Post-processing completed in 0.0000 seconds
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550] {'type': 'request_level_metrics',
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'request_id': 'img_gen_1769673990',
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'e2e_time_ms': 86665.24529457092,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'e2e_tpt': 0.0,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'e2e_total_tokens': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'transfers_total_time_ms': 0.0,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'transfers_total_bytes': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]  'stages': {0: {'stage_gen_time_ms': 86644.5779800415,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]                 'num_tokens_out': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [log_utils.py:550]                 'num_tokens_in': 0}}}
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467] [Summary] {'e2e_requests': 1,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  ==**'e2e_total_time_ms': 86665.33899307251**,==
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  ==**'e2e_sum_time_ms': 86665.24529457092,**==
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'e2e_total_tokens': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'e2e_avg_time_per_request_ms': 86665.24529457092,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'e2e_avg_tokens_per_s': 0.0,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'wall_time_ms': 86665.33899307251,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'final_stage_id': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'stages': [{'stage_id': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]              'requests': 1,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]              'tokens': 0,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]              'total_time_ms': 86665.29130935669,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]              'avg_time_per_request_ms': 86665.29130935669,
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]              'avg_tokens_per_s': 0.0}],
(APIServer pid=967459) INFO 01-29 00:07:57 [async_omni.py:467]  'transfers': []}
(APIServer pid=967459) INFO 01-29 00:07:57 [api_server.py:696] Successfully generated 1 image(s)
(APIServer pid=967459) INFO:     127.0.0.1:40300 - "POST /v1/images/generations HTTP/1.1" 200 OK

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

if profiler_enabled:
print("[Profiler] Starting profiling...")
omni.start_profile()

P2 Badge Restore profiler_enabled initialization

The script still branches on profiler_enabled, but this commit removed its initialization (profiler_enabled = bool(os.getenv("VLLM_TORCH_PROFILER_DIR"))). As a result, running the offline text-to-image example now raises a NameError at runtime before any generation happens, even when profiling is not requested. Reintroduce the initialization or fold the environment check directly into the if condition.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@david6666666
Copy link
Collaborator

david6666666 commented Jan 30, 2026

@Semmer2
Copy link
Contributor

Semmer2 commented Jan 30, 2026

HunyuanImage3 model supports both AR and DiT, we designed the two stages together, the first part AR runs VL inference alone, and the second part supports DiT inference. In next few days, we will combine the two parts together, which means it can run AR + DiT.

@david6666666
Copy link
Collaborator

david6666666 commented Jan 30, 2026

HunyuanImage3 model supports both AR and DiT, we designed the two stages together, the first part AR runs VL inference alone, and the second part supports DiT inference. In next few days, we will combine the two parts together, which means it can run AR + DiT.

OK, Please provide additional details in the Purpose section of this pr to avoid ambiguity. Thank you.

@Semmer2
Copy link
Contributor

Semmer2 commented Jan 30, 2026

@princepride

@hsliuustc0106
Copy link
Collaborator

please using benchmark diffusion to output the performance

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds GPU support for the HunyuanImage-3 diffusion model and integrates it into the vLLM-Omni diffusion framework as a selectable pipeline.

Changes:

  • Extend diffusion entrypoints to infer model_class_name from config.json architectures.
  • Register and add a new HunyuanImage3 diffusion pipeline implementation (tokenizer/image/VAE/ViT utilities).
  • Adjust diffusion worker/context initialization and parallel-state exposure for integration.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 38 comments.

Show a summary per file
File Description
vllm_omni/entrypoints/omni_diffusion.py Adds fallback to use architectures[0] as model_class_name for non-diffusers models.
vllm_omni/entrypoints/async_omni_diffusion.py Adds similar architectures fallback logic for async diffusion entrypoint.
vllm_omni/diffusion/worker/diffusion_worker.py Updates worker init to set vLLM compilation config and current vLLM config context.
vllm_omni/diffusion/worker/diffusion_model_runner.py Moves/adjusts model-loading context usage and minor whitespace changes.
vllm_omni/diffusion/registry.py Registers the new HunyuanImage3 architecture → module/class mapping.
vllm_omni/diffusion/models/hunyuan/tokenizer_wrapper.py Adds tokenizer + message/template utilities for HunyuanImage3.
vllm_omni/diffusion/models/hunyuan/siglip2.py Adds a SigLIP2 vision transformer implementation used by the pipeline.
vllm_omni/diffusion/models/hunyuan/image_processor.py Adds image preprocessing and image-info construction for the pipeline.
vllm_omni/diffusion/models/hunyuan/hunyuan_image_3.py Implements the HunyuanImage3 pipeline integration with vLLM-Omni request flow.
vllm_omni/diffusion/models/hunyuan/hunyuan_image3_utils.py Adds custom RoPE2D + attention/KV-cache utilities used by the model.
vllm_omni/diffusion/models/hunyuan/autoencoder_kl_3d.py Adds a 3D-conv VAE implementation used by the model.
vllm_omni/diffusion/distributed/parallel_state.py Exposes additional parallel groups via vllm_parallel_state.
examples/offline_inference/text_to_image/text_to_image.py Updates example logging/output formatting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

and self.cache_backend.is_enabled()
):
self.cache_backend.refresh(self.pipeline, req.sampling_params.num_inference_steps)

Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is trailing whitespace / an empty indented line after the cache refresh block. Please remove it to satisfy linting (ruff/pycodestyle) and keep diffs clean.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +12
# Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://github.com/Tencent-Hunyuan/HunyuanImage-3.0/blob/main/LICENSE
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These newly added Hunyuan files are marked as licensed under the "TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT" (see header). Since this repository is Apache-2.0, please confirm license compatibility and add any required third-party notices or relicense/rewrite the code as needed before merging (otherwise this can block redistribution).

Copilot uses AI. Check for mistakes.
Comment on lines 95 to 104
model_type = cfg.get("model_type")
architectures = cfg.get("architectures") or []
if architectures and len(architectures) == 1:
od_config.model_class_name = architectures[0]
else:
raise

if model_type == "bagel" or "BagelForConditionalGeneration" in architectures:
od_config.model_class_name = "BagelPipeline"
od_config.tf_model_config = TransformerConfig()
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this config.json fallback, the new if architectures and len(architectures) == 1: ... else: raise runs before the Bagel detection below, so Bagel models with missing/empty architectures (or multiple entries) will re-raise the earlier exception and never reach the Bagel handling. Please mirror entrypoints/omni_diffusion.py by checking model_type == "bagel"/BagelForConditionalGeneration first, then fallback to the single-architecture case, and otherwise raise a clear ValueError.

Copilot uses AI. Check for mistakes.
Comment on lines +264 to +280
elif isinstance(image_i, list):
# time_embed needs a 1-D tensor as input
t_i_emb = self.time_embed(t_i) # n_{i} x d
image_i_seq_list = [], []
for j in range(len(image_i)):
image_ij = image_i[j]
if image_ij.dim() == 4:
assert image_i[j].shape[0] == 1, "image_i[j] should have a batch dimension of 1"
elif image_ij.dim() == 3:
image_ij = image_ij.unsqueeze(0)
else:
raise ValueError(f"image_i[j] should have 3 or 4 dimensions, got {image_ij.dim()}")
# 1 x one_image_seq_len_{j} x n_embd
image_i_seq_j, _, _ = self.patch_embed(image_ij, t_i_emb[j:j + 1])
image_i_seq_list.append(image_i_seq_j)
# 1 x sum_{j}(one_image_seq_len_{j}) x n_embd
image_i_seq = torch.cat(image_i_seq_list, dim=1)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image_i_seq_list = [], [] initializes a tuple of two lists, so the next line image_i_seq_list.append(...) will raise AttributeError. This should be a single list (e.g., image_i_seq_list = []) before appending patch embeddings.

Copilot uses AI. Check for mistakes.
for section in sections_i:
if 'image' in section['type']:
if isinstance(section['token_height'], list):
assert len(section['token_height']) == len(section['token_height']), \
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion compares len(section['token_height']) to itself, so it can never fail even if token_width is a different length. It should validate len(token_height) == len(token_width) to avoid silently producing mismatched (h, w) pairs for RoPE image shapes.

Suggested change
assert len(section['token_height']) == len(section['token_height']), \
assert len(section['token_height']) == len(section['token_width']), \

Copilot uses AI. Check for mistakes.
Comment on lines +1292 to +1300
self.attn = Attention(
self.num_heads,
self.head_dim,
self.scaling,
num_kv_heads=self.num_kv_heads,
cache_config=cache_config,
quant_config=quant_config,
prefix=f"{prefix}.attn",
attn_type=AttentionType.ENCODER_DECODER,
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keyword argument 'cache_config' is not a supported parameter name of Attention.init.
Keyword argument 'quant_config' is not a supported parameter name of Attention.init.
Keyword argument 'attn_type' is not a supported parameter name of Attention.init.

Suggested change
self.attn = Attention(
self.num_heads,
self.head_dim,
self.scaling,
num_kv_heads=self.num_kv_heads,
cache_config=cache_config,
quant_config=quant_config,
prefix=f"{prefix}.attn",
attn_type=AttentionType.ENCODER_DECODER,
attn_kwargs = {}
attn_sig = inspect.signature(Attention)
if "cache_config" in attn_sig.parameters:
attn_kwargs["cache_config"] = cache_config
if "quant_config" in attn_sig.parameters:
attn_kwargs["quant_config"] = quant_config
if "attn_type" in attn_sig.parameters:
attn_kwargs["attn_type"] = AttentionType.ENCODER_DECODER
if "prefix" in attn_sig.parameters:
attn_kwargs["prefix"] = f"{prefix}.attn"
self.attn = Attention(
self.num_heads,
self.head_dim,
self.scaling,
num_kv_heads=self.num_kv_heads,
**attn_kwargs,

Copilot uses AI. Check for mistakes.

params_dict = dict(self.named_parameters())
loaded_params: set[str] = set()
pass
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary 'pass' statement.

Suggested change
pass

Copilot uses AI. Check for mistakes.
def __init__(self, config: HunyuanImage3Config, prefix: str = ""):
super().__init__()

config = config
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment assigns a variable to itself.

Suggested change
config = config

Copilot uses AI. Check for mistakes.
return_all_pos=return_all_pos,
)
if return_all_pos:
cos, sin, all_pos = res
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left hand side of assignment contains 3 variables, but right hand side is a tuple of length 2.

Suggested change
cos, sin, all_pos = res
# Be robust to both 2-tuple and 3-tuple returns from build_2d_rope
if isinstance(res, tuple) and len(res) == 3:
cos, sin, all_pos = res
elif isinstance(res, tuple) and len(res) == 2:
cos, sin = res
all_pos = None
else:
raise ValueError(
"build_2d_rope must return a tuple of length 2 or 3 "
f"when return_all_pos={return_all_pos}, got: {type(res)} with length "
f"{len(res) if isinstance(res, tuple) else 'N/A'}"
)

Copilot uses AI. Check for mistakes.
if return_all_pos:
cos, sin, all_pos = res
else:
cos, sin = res
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left hand side of assignment contains 2 variables, but right hand side is a tuple of length 3.

Suggested change
cos, sin = res
if len(res) == 3:
cos, sin, _ = res
else:
cos, sin = res

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@Semmer2 Semmer2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check the redundant codes carefully

)
self.model_runner.load_model(
memory_pool_context_fn=self._maybe_get_memory_pool_context,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more model inference verify

return setattr(self, key, value)


class Siglip2VisionEmbeddings(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if this class can be replaced by vllm module
And can other classes can replaced.

@@ -0,0 +1,125 @@
# Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT (the "License");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image_processor.py move to model file.

@@ -0,0 +1,1426 @@
# Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT (the "License");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move some the pre-process code to process

return self.weight * hidden_states.to(input_dtype)


class HunyuanImage3SDPAAttention(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these unnecessary codes.

return [], {}

# rename for delay load
def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if those load_weight can be vllm load_weights

prev_sample: torch.FloatTensor


class FlowMatchDiscreteScheduler(SchedulerMixin, ConfigMixin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if theres scheduler can be replaced by diffuser scheduler

backend=backend,
parallel_mode="data",
)
vllm_parallel_state._DP = _DP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double check if this code necessary

@skf-1999 skf-1999 force-pushed the HunyuanImage3IntergrationGPU branch 8 times, most recently from 3ea33b9 to f0f984c Compare February 2, 2026 09:09
@skf-1999 skf-1999 force-pushed the HunyuanImage3IntergrationGPU branch 4 times, most recently from dd9e888 to 1b38e43 Compare February 3, 2026 16:01
@hsliuustc0106
Copy link
Collaborator

is there any acc test?
what type of parallelism are you using?

@skf-1999 skf-1999 force-pushed the HunyuanImage3IntergrationGPU branch 2 times, most recently from 0216c48 to 632b83e Compare February 4, 2026 06:46
@skf-1999 skf-1999 force-pushed the HunyuanImage3IntergrationGPU branch from 0d7c4c0 to 7daca33 Compare February 4, 2026 08:07
@Semmer2 Semmer2 force-pushed the HunyuanImage3IntergrationGPU branch from 7daca33 to 0d9878c Compare February 4, 2026 11:58
@Semmer2
Copy link
Contributor

Semmer2 commented Feb 5, 2026

This is the omni benchmark report:

serve cmdline: vllm serve /data/HunyuanImage-3.0/ --omni --port 8080 --tensor_parallel_size 8

result

python3 benchmarks/diffusion/diffusion_benchmark_serving.py         --dataset vbench --task t2i --num-prompts 10         --height 1024 --width 1024 --port 8080
Downloading VBench T2V prompts to /root/.cache/vllm-omni/vbench_subject_consistency.txt...
Failed to download VBench prompts: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /Vchitect/VBench/master/prompts/prompts_per_dimension/subject_consistency.txt (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1010)')))
Loading requests...
Prepared 50 requests from vbench dataset.
  0%|                                                                                                                                                                                                                             | 0/50 [00:00<?, ?it/s]Running 1 warmup request(s)                 with num_inference_steps=1...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [1:13:31<00:00, 88.23s/it]

================= Serving Benchmark Result =================
Model:                                   default        
Dataset:                                 vbench         
Task:                                    t2i            
--------------------------------------------------
Benchmark duration (s):                  4403.81        
Request rate:                            inf            
Max request concurrency:                 1              
Successful requests:                     50/50             
--------------------------------------------------
Request throughput (req/s):              0.01           
Latency Mean (s):                        88.0760        
Latency Median (s):                      88.1351        
Latency P99 (s):                         88.6123        

============================================================

@Semmer2
Copy link
Contributor

Semmer2 commented Feb 5, 2026

is there any acc test? what type of parallelism are you using?

We did not run any full acc test, but we did compare the result with HunyuanImage3(vllm-omni) with HunyuanImage3(vllm),
image

and fully aligned(with aligned topk func and output tensor cos sim=1.0) the result with layer-wise/step-wise output.

And we run all the test above with Nvidia A100 80GB * 8 with TP=8.

result_row = []
for i, tile in enumerate(row):
if i > 0:
tile = self.blend_t(row[i - 1], tile, blend_extent)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems blend_t method is never defined in the AutoencoderKLConv3D class

Comment on lines +549 to +554
if self.use_temporal_tiling and x.shape[-3] > self.tile_sample_min_tsize:
return self.temporal_tiled_encode(x)
if self.use_spatial_tiling and (
x.shape[-1] > self.tile_sample_min_size or x.shape[-2] > self.tile_sample_min_size
):
return self.spatial_tiled_encode(x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the self.temporal_tiled_encode(x) and self.spatial_tiled_encode(x) , are not defined in the AutoencoderKLConv3D class.

Comment on lines +523 to +526
if batch_cond_image_info is not None and len(batch_cond_image_info[0]) > 0:
cond_vae_images, cond_timestep, cond_vit_images = self._encode_cond_image(
batch_cond_image_info, cfg_factor[mode]
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._encode_cond_image never defined

Comment on lines +953 to +954
height = req.sampling_params.height or height or self.default_sample_size * self.vae_scale_factor
width = req.sampling_params.width or width or self.default_sample_size * self.vae_scale_factor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.default_sample_size and self.vae_scale_factor are never defined

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.default_sample_size and self.vae_scale_factor are never defined

Sure, we do not fully support i2i for now, so some code maybe incomplete, all the missing funcs ans vars have been added.

@princepride
Copy link
Collaborator

@Semmer2 PTAL, and pre-commit failed.

@princepride
Copy link
Collaborator

@Semmer2 you need add init.py under the model's folder, otherwise the docs can't be created.

@Semmer2 Semmer2 force-pushed the HunyuanImage3IntergrationGPU branch 9 times, most recently from d2513f2 to f72f476 Compare February 5, 2026 11:48
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Feb 5, 2026
@Semmer2
Copy link
Contributor

Semmer2 commented Feb 6, 2026

@Semmer2 PTAL, and pre-commit failed.

Hi, all the PR checks passed, and codes have been rebased on latest main branch. Any further comments are welcomed.

@hsliuustc0106 hsliuustc0106 changed the title [Model] SupportHunyuanImage3 Diffusion Model in GPU [Model] SupportHunyuanImage3 Diffusion Model in vllm-omni Feb 6, 2026
return module(*inputs)


class Conv3d(nn.Conv3d):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your information, conv3d of torch 2.9 has critical performance bug, see #982

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thank you.

return torch.cat((-x2, x1), dim=-1)


def apply_rotary_pos_emb(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please try to use rope layer implemnted in vllm-omni instaead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, replaced with Omni inner class RotaryEmbedding

sep: str = "\n\n"


class TokenizerWrapper:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this into another seperate file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, moved it to hunyuan_image_3_tokenizer.py

@Semmer2 Semmer2 force-pushed the HunyuanImage3IntergrationGPU branch 2 times, most recently from 5fb88fe to 703d750 Compare February 8, 2026 08:52
@hsliuustc0106
Copy link
Collaborator

please add to supported models list

Co-authored-by: ElleElleWu <1608928702@qq.com>
Co-authored-by: skf1999 <13234016272@163.com>
Co-authored-by: Just-it <1161406585@qq.com>
Co-authored-by: Semmer2 <semmer@live.cn>

Signed-off-by: Semmer2 <semmer@live.cn>
@Semmer2 Semmer2 force-pushed the HunyuanImage3IntergrationGPU branch from a9ca3de to d1b5088 Compare February 9, 2026 12:01
@hsliuustc0106 hsliuustc0106 merged commit 5fea482 into vllm-project:main Feb 9, 2026
7 checks passed
YanickSchraner pushed a commit to YanickSchraner/vllm-omni that referenced this pull request Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants