add support for MammothModa2 model by HonestDeng · Pull Request #336 · vllm-project/vllm-omni

HonestDeng · 2025-12-16T13:15:00Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Resolve #314 , add support for MammothModa2 model https://github.com/bytedance/mammothmoda

Test Plan

Machine:

H200(140GB) x 1

Parallel:

TP: None

Image:

Size: 1024 x 1024
DiT Step: 50

Image Summery

Machine:

H200(140GB) x 1

Parallel:

TP: None

Image:

Size: 1024 x 1024

Test Result

Image in the left side is generated by MammothModa2 official implementation while the right side from vllm-omni:

This table shows performance in two implementations:

Stages	official-impl	vllm-omni
AR stage	83.529s	74.06s
DiT stage	10.320s	9.65s

Transfer time: 4.012ms

We get better performance.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: HonestDeng <2958906959@qq.com>

For simplicity, most code of DiT stage is copied from https://github.com/bytedance/mammothmoda. These code will be simplified and reviewd after the pipeline running successfully. Signed-off-by: HonestDeng <2958906959@qq.com>

because preview version of mammothmoda2 only use last hidden state Signed-off-by: HonestDeng <2958906959@qq.com>

Signed-off-by: HonestDeng <2958906959@qq.com>

hsliuustc0106 · 2025-12-19T15:16:31Z

Hi, will the model be ready before 1230 release?

HonestDeng · 2025-12-20T01:40:21Z

Yes.

The MammothModa2-Preview is combined Qwen25-VL(with extra gen-experts in MLP layers) with an DiT module for image generation. Now I have already implemented the Qwen25-VL part of MammothModa2-Preview by reusing vllm code, such as Qwen2Attention, Qwen2MLP, and we can takes text and image as input to generate text token.

Now I'm currently working on DiT parts. Hopefully I will finish DiT parts in this weekend and review my code before 1230.

I'm not quite familiar in supporting new models. If there is any problem in my code, please correct me. Thanks!

Signed-off-by: HonestDeng <2958906959@qq.com>

hsliuustc0106 · 2025-12-21T08:41:55Z

Yes.

The MammothModa2-Preview is combined Qwen25-VL(with extra gen-experts in MLP layers) with an DiT module for image generation. Now I have already implemented the Qwen25-VL part of MammothModa2-Preview by reusing vllm code, such as Qwen2Attention, Qwen2MLP, and we can takes text and image as input to generate text token.

Now I'm currently working on DiT parts. Hopefully I will finish DiT parts in this weekend and review my code before 1230.

I'm not quite familiar in supporting new models. If there is any problem in my code, please correct me. Thanks!

the model seems quite similar to Qwen-Image strcuture with a qwen-vl for encoding and a DiT module for image generation.

…upport

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 982e321f6d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vllm_omni/model_executor/models/mammoth_moda2/mammoth_moda2_ar.py

princepride · 2026-03-01T11:44:24Z

I've run the test_mammoth_moda2.py in my local machine and pass all the test cases. Is it OK?

Thanks! I will review it tmr.

Signed-off-by: HonestDeng <2958906959@qq.com>

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

hsliuustc0106 · 2026-03-02T06:27:59Z

Thanks for reviewing. I've fixed the these 5 issues. And it is strange for the first issue: answer_start_index = max(L - 10, 0).

Actually, I just followed the original project to use the last 10 token. I didn't figure out the reason. It's quite weird.

Anyway, I've fixed the problem by using all generated token as answer.

you can comment an issue at the original project

hsliuustc0106 · 2026-03-02T06:28:40Z

I've run the test_mammoth_moda2.py in my local machine and pass all the test cases. Is it OK?

any performance speed up update？

examples/offline_inference/mammothmodal2_preview/image.png

princepride · 2026-03-02T06:06:29Z

vllm_omni/model_executor/stage_configs/mammoth_moda2_ar.yaml

Why we can't directly use: vllm_omni/model_executor/stage_configs/mammoth_moda2.yaml, maybe you can refer to qwen3-omni, let final_output can be output in different stage.

The two tasks need different stage topologies: summarization uses engine_output_type: text and terminates at Stage 0, while T2I uses engine_output_type: latent and must continue to Stage 1's ar2dit processor — routing a comprehension request through the two-stage config would break ar2dit on the incompatible format.

The Qwen3-Omni pattern works because every request always goes through all stages sequentially. MammothModa2 needs a true branch (stop at Stage 0 for text, continue to Stage 1 for image), which requires per-request dynamic stage skipping.

Therefore, we can't directly use vllm_omni/model_executor/stage_configs/mammoth_moda2.yaml for summarize task.

I've moved examples/offline_inference/mammothmodal2_preview/mammoth_moda2_t2i.yaml and examples/offline_inference/mammothmodal2_preview/mammoth_moda2_image_summarize.yaml to vllm_omni/model_executor/stage_configs/mammoth_moda2.yaml and vllm_omni/model_executor/stage_configs/mammoth_moda2_ar.yaml for simplicity.

princepride · 2026-03-02T06:06:43Z

examples/offline_inference/mammothmodal2_preview/mammoth_moda2_t2i.yaml

Now we use vllm_omni/model_executor/stage_configs/mammoth_moda2.yaml for t2i task.

princepride · 2026-03-02T06:15:31Z

examples/offline_inference/mammothmodal2_preview/mammoth_moda2_t2i_pp.yaml

I don't think we need provide example that one stage deployed on two device

I've deleted this config file.

princepride · 2026-03-02T06:16:54Z

vllm_omni/model_executor/models/mammoth_moda2/mammoth_moda2_dit.py

I think we can directly use dit model under diffusion folder.

Thanks for the suggestion! The real implementation is already in mammoth_moda2_dit.py — the file under model_executor/ is just a thin re-export shim.

The shim is needed because OmniModelRegistry in registry.py hardcodes the prefix vllm_omni.model_executor.models when resolving module paths, so a model living under vllm_omni.diffusion can't be registered there directly without the shim.

The DiffusionModelRegistry in registry.py does use the correct vllm_omni.diffusion.models. prefix, but it's a separate registry for pipeline-style models instantiated with OmniDiffusionConfig. MammothModa2DiTForConditionalGeneration is a vLLM nn.Module loaded with VllmConfig, so it can't be plugged into that registry either.

princepride · 2026-03-02T06:19:36Z

vllm_omni/transformers_utils/configs/mammoth_moda2.py

Plz create a new folder like: https://github.com/vllm-project/vllm/tree/main/vllm/transformers_utils/configs and put customer config under it.

princepride · 2026-03-02T06:21:51Z

vllm_omni/model_executor/models/mammoth_moda2/mammoth_moda2_ar.py

Typically, in vLLM, we put processor and model implement in same file, please combine mammoth_moda2_ar.py and mammoth_moda2.py

princepride · 2026-03-02T06:26:36Z

vllm_omni/model_executor/models/__init__.py

 from .registry import OmniModelRegistry  # noqa: F401

-__all__ = ["Qwen3OmniMoeForConditionalGeneration"]
+__all__ = ["Qwen3OmniMoeForConditionalGeneration", "Mammothmoda2Config"]


After put Mammothmoda2Config under transformers_utils/configs, we can remove it from here.

I've move Mammothmoda2Config to transformers_utils/configs and deleted code in vllm_omni/model_executor/models/__init__.py that imports Mammothmoda2Config.

However, we need an 'eagerly import' to register Mammothmoda2Config to model_type mammothmoda2. Therefore, I add some code in vllm_omni/__init__.py to import these configs.

princepride · 2026-03-02T06:28:10Z

vllm_omni/model_executor/models/__init__.py

 from .registry import OmniModelRegistry  # noqa: F401

-__all__ = ["Qwen3OmniMoeForConditionalGeneration"]
+__all__ = ["Qwen3OmniMoeForConditionalGeneration", "Mammothmoda2Config"]


@hsliuustc0106 @ZJY0516 Why we have Qwen3OmniMoeForConditionalGeneration in this file? Is it have something special?

princepride · 2026-03-02T06:33:38Z

vllm_omni/diffusion/models/mammoth_moda2/mammothmoda2_dit_layer/__init__.py

I think we don't need a folder mammothmoda2_dit_layer to store model's module file. You can refer other dit model's file structure.

princepride · 2026-03-02T06:36:58Z

@HonestDeng PTAL

hsliuustc0106

PR #336 Review: Add support for MammothModa2 model

Overview

This PR adds support for MammothModa2, a multi-modal image generation model with a two-stage architecture:

AR Stage: Based on Qwen2.5-VL with MoE (Mixture of Experts) for dual vocabulary handling
DiT Stage: Diffusion transformer for image generation via flow-matching

Scale: 4,151 additions across 27 files

Critical Issues: 0 found ✓

Important Issues: 4 found

1. Potential Shape Mismatch in moe_forward Not Fully Validated

File: vllm_omni/model_executor/models/mammoth_moda2/mammoth_moda2_ar.py:126-131

The flat_mask.reshape(-1) operation could silently produce incorrect results if the original shape has different dimensions that happen to have the same product.

Suggestion: Validate gen_token_mask.numel() != total_tokens before the reshape operation.

2. Chinese Comment in Production Code

File: vllm_omni/diffusion/models/mammoth_moda2/mammoth_moda2_dit.py:196

text_hidden_states=inputs_embeds,  # 占位，runner 不会用到

Please replace with English: # placeholder, not used by runner

3. Unused Parameter

File: vllm_omni/model_executor/stage_input_processors/mammoth_moda2.py:13

The requires_multimodal_data parameter is accepted but never used. Either use it or remove it.

4. Hardcoded num_reqs=1 Silently Ignores Caller's Argument

File: vllm_omni/diffusion/models/mammoth_moda2/mammoth_moda2_dit.py:84

def get_dummy_runtime_additional_information(self, num_reqs: int) -> list[dict[str, object]]:
    num_reqs = 1  # TODO: support num_reqs > 1

Consider raising NotImplementedError for num_reqs > 1 instead of silently ignoring.

Strengths

Well-structured architecture: Clean separation between AR and DiT stages
Comprehensive config design: Proper handling of dual vocabulary
Good test coverage: Unit tests for config parsing, stage processor, and e2e tests
Proper weight loading: Weight mappers correctly filter stage-specific weights
Token constraints: _apply_t2i_token_constraints properly constrains sampling

MRO Pattern Check ✓

All model classes follow proper inheritance order (nn.Module before mixins). No MRO issues detected.

Recommendation

After addressing the 4 important issues (especially the Chinese comment and the num_reqs handling), this PR should be ready for merge.

hsliuustc0106 · 2026-03-02T06:46:30Z

vllm_omni/model_executor/models/mammoth_moda2/mammoth_moda2_ar.py

+        raise ValueError(f"Unexpected hidden_states shape: {tuple(hidden_states.shape)}")
+
+    # mask: [num_tokens] or [B, L] -> flatten to [total_tokens]
+    flat_mask = gen_token_mask.reshape(-1)  # type: ignore[union-attr]


Shape validation order

Consider validating gen_token_mask.numel() != total_tokens before the reshape operation. The current reshape(-1) could succeed but produce semantically incorrect results if the original shape has different dimensions that happen to have the same product.

hsliuustc0106 · 2026-03-02T06:46:32Z

vllm_omni/diffusion/models/mammoth_moda2/mammoth_moda2_dit.py

+        # Prepare negative prompt (for CFG). If none provided, fall back to unconditional.
+        negative_prompt_embeds = None
+        negative_prompt_attention_mask = None
+        if text_guidance_scale > 1.0:


Chinese comment in production code

Please replace with English: # placeholder, not used by runner

hsliuustc0106 · 2026-03-02T06:46:34Z

vllm_omni/model_executor/stage_input_processors/mammoth_moda2.py

+
+def ar2dit(
+    stage_list: list[Any],
+    engine_input_source: list[int],


Unused parameter

The requires_multimodal_data parameter is accepted but never used in the function body. Either use this parameter or remove it to match the interface contract.

It's an interface param. Cannot be deleted.

hsliuustc0106 · 2026-03-02T06:46:36Z

vllm_omni/diffusion/models/mammoth_moda2/mammoth_moda2_dit.py

+            theta=10000,
+        )
+
+        # vLLM PP interface compatibility


Silently ignores caller argument

This hardcoded num_reqs = 1 silently ignores the callers num_reqs argument, which could cause issues in batched scenarios. Consider raising NotImplementedError for num_reqs > 1 instead.

Signed-off-by: HonestDeng <2958906959@qq.com>

HonestDeng · 2026-03-02T11:59:39Z

I've addressed the requested changes. PTAL when you have a moment.

hsliuustc0106 · 2026-03-02T14:34:37Z

Code review

Found 1 issue:

Unclosed file handle - open(special_tokens_file) is called without using a context manager, causing a file handle leak. While Python's garbage collector will eventually close it, the timing is non-deterministic. In long-running inference servers or with repeated model loads, unclosed handles can accumulate.

vllm-omni/vllm_omni/model_executor/models/mammoth_moda2/tokenizer.py

Lines 123 to 125 in 5329e53

    
           self.mergeable_ranks = _load_tiktoken_bpe(vocab_file) 
        
           vision_tokens = [t.strip() for t in open(special_tokens_file).readlines() if len(t.strip()) > 0] 
        
           SPECIAL_TOKENS = tuple(

Fix: Use with open(special_tokens_file) as f: vision_tokens = [t.strip() for t in f.readlines() if len(t.strip()) > 0]

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Signed-off-by: HonestDeng <2958906959@qq.com>

HonestDeng · 2026-03-02T14:48:29Z

Code review

Found 1 issue:

Unclosed file handle - open(special_tokens_file) is called without using a context manager, causing a file handle leak. While Python's garbage collector will eventually close it, the timing is non-deterministic. In long-running inference servers or with repeated model loads, unclosed handles can accumulate.

vllm-omni/vllm_omni/model_executor/models/mammoth_moda2/tokenizer.py

Lines 123 to 125 in 5329e53

self.mergeable_ranks = _load_tiktoken_bpe(vocab_file)

vision_tokens = [t.strip() for t in open(special_tokens_file).readlines() if len(t.strip()) > 0]

SPECIAL_TOKENS = tuple(

Fix: Use with open(special_tokens_file) as f: vision_tokens = [t.strip() for t in f.readlines() if len(t.strip()) > 0]

🤖 Generated with Claude Code

If this code review was useful, please react with 👍. Otherwise, react with 👎.

Done

HonestDeng added 23 commits December 16, 2025 21:08

register MammothModa2 model in registry.py

cdaef14

Signed-off-by: HonestDeng <2958906959@qq.com>

add code skeleton

f95173e

Signed-off-by: HonestDeng <2958906959@qq.com>

add skeleton of ar and dit stage

0c0b611

Signed-off-by: HonestDeng <2958906959@qq.com>

constructs ar model

59ba5a1

Signed-off-by: HonestDeng <2958906959@qq.com>

capture hidden states using hook

fb513ce

Signed-off-by: HonestDeng <2958906959@qq.com>

add input processors

7baa5e5

Signed-off-by: HonestDeng <2958906959@qq.com>

implement DiT stage

4f25a05

For simplicity, most code of DiT stage is copied from https://github.com/bytedance/mammothmoda. These code will be simplified and reviewd after the pipeline running successfully. Signed-off-by: HonestDeng <2958906959@qq.com>

remove code of capturing history hidden state

a68cdc0

because preview version of mammothmoda2 only use last hidden state Signed-off-by: HonestDeng <2958906959@qq.com>

delete redundant code

0e007c0

Signed-off-by: HonestDeng <2958906959@qq.com>

implement MammothModa2ARForConditionalGeneration using qwen2

b6c8802

Signed-off-by: HonestDeng <2958906959@qq.com>

delete useless entry

a3e28ad

Signed-off-by: HonestDeng <2958906959@qq.com>

Fix MammothModa2 processor/tokenizer in spawn workers

20a8a87

Signed-off-by: HonestDeng <2958906959@qq.com>

Fix AutoConfig mapping for Mammoth VL subconfigs

7a40266

Signed-off-by: HonestDeng <2958906959@qq.com>

Load config.json successfully

890ff4c

Signed-off-by: HonestDeng <2958906959@qq.com>

Add minimal Mammoth text token step debug script

0d535f6

Signed-off-by: HonestDeng <2958906959@qq.com>

Make Mammoth token-step script fail fast on missing vLLM platform

7371f98

Signed-off-by: HonestDeng <2958906959@qq.com>

Handle OmniOutput in Mammoth compute_logits

e653884

Signed-off-by: HonestDeng <2958906959@qq.com>

Fix MammothModa2 wrapper load_weights prefix and AR LM compat

8eab22b

Signed-off-by: HonestDeng <2958906959@qq.com>

Handle vLLM passing input_ids=None in Mammoth LM

e3b7a7b

Signed-off-by: HonestDeng <2958906959@qq.com>

Use omni AR worker in Mammoth token-step; fix logits and OmniOutput

392d683

Signed-off-by: HonestDeng <2958906959@qq.com>

Expose VL token ids on Mammothmoda2Config for mrope

299fe59

Signed-off-by: HonestDeng <2958906959@qq.com>

Add MammothModa2 Omni pipeline runner and text decode

7fd44f9

Signed-off-by: HonestDeng <2958906959@qq.com>

Add image input support to MammothModa2 Omni example

c889d5d

Signed-off-by: HonestDeng <2958906959@qq.com>

HonestDeng added 4 commits December 20, 2025 09:43

Add MammothModa2 unified entry + t2i pipeline scaffold

2a8081b

Signed-off-by: HonestDeng <2958906959@qq.com>

Limit MammothModa2 AR max_model_len to reduce KV cache

2ea2b78

Signed-off-by: HonestDeng <2958906959@qq.com>

Fix MammothModa2 MoE helper for 2D hidden_states

0c52878

Signed-off-by: HonestDeng <2958906959@qq.com>

Now we can generate image, but still bugs exist

0f56070

Signed-off-by: HonestDeng <2958906959@qq.com>

HonestDeng force-pushed the add-mammoth-moda2-support branch from 8e2db46 to c6deeb1 Compare March 1, 2026 10:54

Merge remote-tracking branch 'upstream/main' into add-mammoth-moda2-s…

982e321

…upport

HonestDeng marked this pull request as ready for review March 1, 2026 11:00

chatgpt-codex-connector bot reviewed Mar 1, 2026

View reviewed changes

HonestDeng and others added 2 commits March 1, 2026 20:31

fix PP bug

dd47583

Signed-off-by: HonestDeng <2958906959@qq.com>

Merge branch 'main' into add-mammoth-moda2-support

7cbca14

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

princepride requested changes Mar 2, 2026

View reviewed changes

hsliuustc0106 reviewed Mar 2, 2026

View reviewed changes

HonestDeng added 10 commits March 2, 2026 19:31

remove image

eca6400

Signed-off-by: HonestDeng <2958906959@qq.com>

mv stage config file to stage_configs

4aabccc

Signed-off-by: HonestDeng <2958906959@qq.com>

mv Mammothmoda2Config to transformers_utils/configs

e9e095f

Signed-off-by: HonestDeng <2958906959@qq.com>

eagerly import Mammothmoda2Config

559d9b0

Signed-off-by: HonestDeng <2958906959@qq.com>

combine mammoth_moda2_ar.py and mammoth_moda2.py

029b5c2

Signed-off-by: HonestDeng <2958906959@qq.com>

rm mammothmoda2_dit_layer

2ea219a

Signed-off-by: HonestDeng <2958906959@qq.com>

Validate before the reshape operation

6dec4ec

Signed-off-by: HonestDeng <2958906959@qq.com>

remove Chinese comment

9f5af84

Signed-off-by: HonestDeng <2958906959@qq.com>

add comment for unused param

77e8d8c

Signed-off-by: HonestDeng <2958906959@qq.com>

raise NotImplementedError for num_reqs > 1 and run precommit

5329e53

Signed-off-by: HonestDeng <2958906959@qq.com>

open file safely

78b23f9

Signed-off-by: HonestDeng <2958906959@qq.com>

Conversation

HonestDeng commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

hsliuustc0106 commented Dec 19, 2025

Uh oh!

HonestDeng commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 commented Dec 21, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

princepride commented Mar 1, 2026

Uh oh!

hsliuustc0106 commented Mar 2, 2026

Uh oh!

hsliuustc0106 commented Mar 2, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HonestDeng Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

princepride commented Mar 2, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

PR #336 Review: Add support for MammothModa2 model

Overview

Critical Issues: 0 found ✓

Important Issues: 4 found

1. Potential Shape Mismatch in moe_forward Not Fully Validated

2. Chinese Comment in Production Code

3. Unused Parameter

4. Hardcoded num_reqs=1 Silently Ignores Caller's Argument

Strengths

MRO Pattern Check ✓

Recommendation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HonestDeng commented Dec 16, 2025 •

edited

Loading

HonestDeng commented Dec 20, 2025 •

edited

Loading

HonestDeng Mar 2, 2026 •

edited

Loading

HonestDeng commented Mar 2, 2026 •

edited

Loading