add support for MammothModa2 model#336
add support for MammothModa2 model#336HonestDeng wants to merge 125 commits intovllm-project:mainfrom
Conversation
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
For simplicity, most code of DiT stage is copied from https://github.com/bytedance/mammothmoda. These code will be simplified and reviewd after the pipeline running successfully. Signed-off-by: HonestDeng <2958906959@qq.com>
because preview version of mammothmoda2 only use last hidden state Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
|
Hi, will the model be ready before 1230 release? |
|
Yes. The MammothModa2-Preview is combined Qwen25-VL(with extra gen-experts in MLP layers) with an DiT module for image generation. Now I have already implemented the Qwen25-VL part of MammothModa2-Preview by reusing vllm code, such as Now I'm currently working on DiT parts. Hopefully I will finish DiT parts in this weekend and review my code before 1230. I'm not quite familiar in supporting new models. If there is any problem in my code, please correct me. Thanks! |
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
the model seems quite similar to Qwen-Image strcuture with a qwen-vl for encoding and a DiT module for image generation. |
8e2db46 to
c6deeb1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 982e321f6d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
vllm_omni/model_executor/models/mammoth_moda2/mammoth_moda2_ar.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/mammoth_moda2/mammoth_moda2_ar.py
Outdated
Show resolved
Hide resolved
vllm_omni/model_executor/models/mammoth_moda2/mammoth_moda2_ar.py
Outdated
Show resolved
Hide resolved
Thanks! I will review it tmr. |
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
you can comment an issue at the original project |
any performance speed up update? |
There was a problem hiding this comment.
Why we can't directly use: vllm_omni/model_executor/stage_configs/mammoth_moda2.yaml, maybe you can refer to qwen3-omni, let final_output can be output in different stage.
There was a problem hiding this comment.
The two tasks need different stage topologies: summarization uses engine_output_type: text and terminates at Stage 0, while T2I uses engine_output_type: latent and must continue to Stage 1's ar2dit processor — routing a comprehension request through the two-stage config would break ar2dit on the incompatible format.
The Qwen3-Omni pattern works because every request always goes through all stages sequentially. MammothModa2 needs a true branch (stop at Stage 0 for text, continue to Stage 1 for image), which requires per-request dynamic stage skipping.
Therefore, we can't directly use vllm_omni/model_executor/stage_configs/mammoth_moda2.yaml for summarize task.
I've moved examples/offline_inference/mammothmodal2_preview/mammoth_moda2_t2i.yaml and examples/offline_inference/mammothmodal2_preview/mammoth_moda2_image_summarize.yaml to vllm_omni/model_executor/stage_configs/mammoth_moda2.yaml and vllm_omni/model_executor/stage_configs/mammoth_moda2_ar.yaml for simplicity.
There was a problem hiding this comment.
Now we use vllm_omni/model_executor/stage_configs/mammoth_moda2.yaml for t2i task.
There was a problem hiding this comment.
I don't think we need provide example that one stage deployed on two device
There was a problem hiding this comment.
I've deleted this config file.
There was a problem hiding this comment.
I think we can directly use dit model under diffusion folder.
There was a problem hiding this comment.
Thanks for the suggestion! The real implementation is already in mammoth_moda2_dit.py — the file under model_executor/ is just a thin re-export shim.
The shim is needed because OmniModelRegistry in registry.py hardcodes the prefix vllm_omni.model_executor.models when resolving module paths, so a model living under vllm_omni.diffusion can't be registered there directly without the shim.
The DiffusionModelRegistry in registry.py does use the correct vllm_omni.diffusion.models. prefix, but it's a separate registry for pipeline-style models instantiated with OmniDiffusionConfig. MammothModa2DiTForConditionalGeneration is a vLLM nn.Module loaded with VllmConfig, so it can't be plugged into that registry either.
There was a problem hiding this comment.
Plz create a new folder like: https://github.com/vllm-project/vllm/tree/main/vllm/transformers_utils/configs and put customer config under it.
There was a problem hiding this comment.
Typically, in vLLM, we put processor and model implement in same file, please combine mammoth_moda2_ar.py and mammoth_moda2.py
| from .registry import OmniModelRegistry # noqa: F401 | ||
|
|
||
| __all__ = ["Qwen3OmniMoeForConditionalGeneration"] | ||
| __all__ = ["Qwen3OmniMoeForConditionalGeneration", "Mammothmoda2Config"] |
There was a problem hiding this comment.
After put Mammothmoda2Config under transformers_utils/configs, we can remove it from here.
There was a problem hiding this comment.
I've move Mammothmoda2Config to transformers_utils/configs and deleted code in vllm_omni/model_executor/models/__init__.py that imports Mammothmoda2Config.
However, we need an 'eagerly import' to register Mammothmoda2Config to model_type mammothmoda2. Therefore, I add some code in vllm_omni/__init__.py to import these configs.
| from .registry import OmniModelRegistry # noqa: F401 | ||
|
|
||
| __all__ = ["Qwen3OmniMoeForConditionalGeneration"] | ||
| __all__ = ["Qwen3OmniMoeForConditionalGeneration", "Mammothmoda2Config"] |
There was a problem hiding this comment.
@hsliuustc0106 @ZJY0516 Why we have Qwen3OmniMoeForConditionalGeneration in this file? Is it have something special?
There was a problem hiding this comment.
I think we don't need a folder mammothmoda2_dit_layer to store model's module file. You can refer other dit model's file structure.
|
@HonestDeng PTAL |
hsliuustc0106
left a comment
There was a problem hiding this comment.
PR #336 Review: Add support for MammothModa2 model
Overview
This PR adds support for MammothModa2, a multi-modal image generation model with a two-stage architecture:
- AR Stage: Based on Qwen2.5-VL with MoE (Mixture of Experts) for dual vocabulary handling
- DiT Stage: Diffusion transformer for image generation via flow-matching
Scale: 4,151 additions across 27 files
Critical Issues: 0 found ✓
Important Issues: 4 found
1. Potential Shape Mismatch in moe_forward Not Fully Validated
File: vllm_omni/model_executor/models/mammoth_moda2/mammoth_moda2_ar.py:126-131
The flat_mask.reshape(-1) operation could silently produce incorrect results if the original shape has different dimensions that happen to have the same product.
Suggestion: Validate gen_token_mask.numel() != total_tokens before the reshape operation.
2. Chinese Comment in Production Code
File: vllm_omni/diffusion/models/mammoth_moda2/mammoth_moda2_dit.py:196
text_hidden_states=inputs_embeds, # 占位,runner 不会用到Please replace with English: # placeholder, not used by runner
3. Unused Parameter
File: vllm_omni/model_executor/stage_input_processors/mammoth_moda2.py:13
The requires_multimodal_data parameter is accepted but never used. Either use it or remove it.
4. Hardcoded num_reqs=1 Silently Ignores Caller's Argument
File: vllm_omni/diffusion/models/mammoth_moda2/mammoth_moda2_dit.py:84
def get_dummy_runtime_additional_information(self, num_reqs: int) -> list[dict[str, object]]:
num_reqs = 1 # TODO: support num_reqs > 1Consider raising NotImplementedError for num_reqs > 1 instead of silently ignoring.
Strengths
- Well-structured architecture: Clean separation between AR and DiT stages
- Comprehensive config design: Proper handling of dual vocabulary
- Good test coverage: Unit tests for config parsing, stage processor, and e2e tests
- Proper weight loading: Weight mappers correctly filter stage-specific weights
- Token constraints:
_apply_t2i_token_constraintsproperly constrains sampling
MRO Pattern Check ✓
All model classes follow proper inheritance order (nn.Module before mixins). No MRO issues detected.
Recommendation
After addressing the 4 important issues (especially the Chinese comment and the num_reqs handling), this PR should be ready for merge.
| raise ValueError(f"Unexpected hidden_states shape: {tuple(hidden_states.shape)}") | ||
|
|
||
| # mask: [num_tokens] or [B, L] -> flatten to [total_tokens] | ||
| flat_mask = gen_token_mask.reshape(-1) # type: ignore[union-attr] |
There was a problem hiding this comment.
Shape validation order
Consider validating gen_token_mask.numel() != total_tokens before the reshape operation. The current reshape(-1) could succeed but produce semantically incorrect results if the original shape has different dimensions that happen to have the same product.
| # Prepare negative prompt (for CFG). If none provided, fall back to unconditional. | ||
| negative_prompt_embeds = None | ||
| negative_prompt_attention_mask = None | ||
| if text_guidance_scale > 1.0: |
There was a problem hiding this comment.
Chinese comment in production code
Please replace with English: # placeholder, not used by runner
|
|
||
| def ar2dit( | ||
| stage_list: list[Any], | ||
| engine_input_source: list[int], |
There was a problem hiding this comment.
Unused parameter
The requires_multimodal_data parameter is accepted but never used in the function body. Either use this parameter or remove it to match the interface contract.
There was a problem hiding this comment.
It's an interface param. Cannot be deleted.
| theta=10000, | ||
| ) | ||
|
|
||
| # vLLM PP interface compatibility |
There was a problem hiding this comment.
Silently ignores caller argument
This hardcoded num_reqs = 1 silently ignores the callers num_reqs argument, which could cause issues in batched scenarios. Consider raising NotImplementedError for num_reqs > 1 instead.
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
|
I've addressed the requested changes. PTAL when you have a moment. |
Code reviewFound 1 issue:
vllm-omni/vllm_omni/model_executor/models/mammoth_moda2/tokenizer.py Lines 123 to 125 in 5329e53 Fix: Use 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Signed-off-by: HonestDeng <2958906959@qq.com>
Done |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Resolve #314 , add support for MammothModa2 model https://github.com/bytedance/mammothmoda
Test Plan
Machine:
Parallel:
Image:
Machine:
Parallel:
Image:
Test Result
Image in the left side is generated by MammothModa2 official implementation while the right side from vllm-omni:

This table shows performance in two implementations:
Transfer time: 4.012ms
We get better performance.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)