Added support for qwen3-next quantization and export #323

kinjalpatel27 · 2025-09-15T22:52:12Z

What does this PR do?

Support for Qwen3-Next quantization and hf export

Overview:

Added support for the new Qwen3-Next model quantization and export in huggingface compatible format

Usage

See example/llm_ptq/hf_ptq.py

Testing

Tested by quantizing and exporting the Qwen/Qwen3-Next-80B-A3B-Instruct and Qwen/Qwen3-Next-80B-A3B-Thinking models

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: NA
Did you add or update any necessary documentation?: NA
Did you update Changelog?: NA

Additional Information

Summary by CodeRabbit

New Features
- Added support for additional Qwen3Next MOE variants, enabling automatic detection of experts and MOE blocks.
Enhancements
- Broadened model compatibility across MOE/Qwen families without user code changes.
Stability
- Optional, graceful registration ensures environments lacking Qwen3Next modules continue to function.
Documentation
- No user-facing documentation updates.
Breaking Changes
- None; no public API or signature changes.

copy-pr-bot · 2025-09-15T22:52:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-15T22:52:18Z

Walkthrough

Extends recognition and handling for Qwen3Next MoE blocks: layer utilities now detect Qwen3Next model types and MOE block names; the HuggingFace quantization plugin optionally imports and registers Qwen3NextSparseMoeBlock. No public API signatures were changed.

Changes

Cohort / File(s)	Summary
MOE recognition in layer utils `modelopt/torch/export/layer_utils.py`	Detects `qwen3next*` model types in `get_experts_list`; recognizes `Qwen3NextSparseMoeBlock` in `is_moe`; adds it to MOE-associated linear name detection in `get_expert_linear_names`.
HuggingFace quantization plugin `modelopt/torch/quantization/plugins/huggingface.py`	Attempts to import `Qwen3NextSparseMoeBlock` from `transformers.models.qwen3_next.modeling_qwen3_next`; if available and not registered, registers it in `QuantModuleRegistry` as `"hf.Qwen3NextSparseMoeBlock"` mapped to `_QuantMoeSparseMoe`. Import wrapped in `try/except ImportError`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Loader as ModelLoader
  participant LU as layer_utils
  participant HF as HF_Quant_Plugin
  participant QR as QuantModuleRegistry
  participant Q as Quantizer

  Loader->>LU: get_experts_list(model_type)
  LU-->>Loader: returns gate_proj/down_proj/up_proj for qwen3next*
  Loader->>LU: is_moe(block)
  LU-->>Loader: true for Qwen3NextSparseMoeBlock

  note over HF,QR: Optional registration at import time
  HF->>HF: try import Qwen3NextSparseMoeBlock
  alt import succeeds
    HF->>QR: register "hf.Qwen3NextSparseMoeBlock" -> _QuantMoeSparseMoe
    QR-->>HF: registered
  else import fails
    HF-->>HF: skip registration
  end

  Loader->>Q: request quantization
  Q->>QR: resolve handler for block type
  QR-->>Q: _QuantMoeSparseMoe (if registered)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my whiskers at new code bright,
Qwen3Next tunnels found in night.
I hop through imports, register with care,
Gate, down, up—each pathway there.
A carrot nod—MoE now aware. 🥕🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Added support for qwen3-next quantization and export" succinctly and accurately summarizes the primary change—adding Qwen3‑Next quantization and export support—and aligns with the raw_summary and PR objectives describing MOE/Qwen recognition and plugin registration. It is concise, specific to the main change, and clear for a reviewer scanning history.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kinjal/qwen3-next

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ceb40e and 49c23e9.

📒 Files selected for processing (2)

modelopt/torch/export/layer_utils.py (3 hunks)
modelopt/torch/quantization/plugins/huggingface.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt/torch/export/layer_utils.py
modelopt/torch/quantization/plugins/huggingface.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: code-quality
GitHub Check: build-docs

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-09-15T23:04:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.88%. Comparing base (76e8ce2) to head (49c23e9).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #323      +/-   ##
==========================================
- Coverage   73.88%   73.88%   -0.01%     
==========================================
  Files         172      172              
  Lines       17444    17444              
==========================================
- Hits        12889    12888       -1     
- Misses       4555     4556       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

modelopt/torch/export/layer_utils.py (1)

88-103: Qwen3-Next detection: good; normalize model_type and de-duplicate variant list.

Works as intended. To harden against caller casing and avoid future copy-paste, normalize once and keep a single tuple of supported Qwen MoE variants.

Apply:

 def get_experts_list(module: torch.nn.Module, model_type: str):
     """Returns list of grouped experts by linear name for given module."""
     experts_list = []
 
     # Define linear layer names for different model types
-    if "mixtralforcausallm" in model_type:
+    model_type = model_type.lower()
+    if "mixtralforcausallm" in model_type:
         linear_names = ["w1", "w2", "w3"]
-    elif any(
-        qwen_variant in model_type
-        for qwen_variant in [
-            "qwenmoeforcausallm",
-            "qwen2moeforcausallm",
-            "qwen3moeforcausallm",
-            "qwen3nextforcausallm",
-        ]
-    ):
+    elif any(qv in model_type for qv in (
+        "qwenmoeforcausallm",
+        "qwen2moeforcausallm",
+        "qwen3moeforcausallm",
+        "qwen3nextforcausallm",
+    )):
         linear_names = ["gate_proj", "down_proj", "up_proj"]
     else:
         raise NotImplementedError(f" {model_type} not supported")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76e8ce2 and 1ceb40e.

📒 Files selected for processing (2)

modelopt/torch/export/layer_utils.py (3 hunks)
modelopt/torch/quantization/plugins/huggingface.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

modelopt/torch/quantization/plugins/huggingface.py (1)

modelopt/torch/quantization/conversion.py (1)

register (326-367)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: multi-torch (27)
GitHub Check: windows

🔇 Additional comments (3)

modelopt/torch/export/layer_utils.py (2)
329-343: Include Qwen3NextSparseMoeBlock in is_moe — LGTM.

Adds the new block to MOE detection without altering existing behavior.

996-1005: LGTM — verify HF Qwen3NextSparseMoeBlock exposes gate_proj / down_proj / up_proj.

Consistent with Qwen2/3 MoE. Verification here failed (ModuleNotFoundError: No module named 'transformers'). Run locally and paste output:
#!/bin/bash
python - <<'PY'
import importlib, inspect
m = importlib.import_module("transformers.models.qwen3_next.modeling_qwen3_next")
blk = getattr(m, "Qwen3NextSparseMoeBlock", None)
print("Has Qwen3NextSparseMoeBlock:", blk is not None)
if blk:
    print("Constructor signature:", inspect.signature(blk.__init__))
    print("Class attrs (proj/gate/expert):", [a for a in dir(blk) if any(k in a for k in ("proj","gate","expert"))])
PY
modelopt/torch/quantization/plugins/huggingface.py (1)

562-571: Optional registration for Qwen3NextSparseMoeBlock — keep try/except; confirm HF export
modelopt/torch/quantization/plugins/huggingface.py (≈lines 562–571) contains the registration. Runtime verification failed here because 'transformers' is not installed and the modelopt package import failed — confirm that transformers.models.qwen3_next.modeling_qwen3_next exports Qwen3NextSparseMoeBlock in your target Transformers release and that QuantModuleRegistry.get(Qwen3NextSparseMoeBlock) returns the registered entry.

realAsma

This works for now.
We need to look into how the deployment framework is handling expert quantization (whether the experts are quantized in isolation or whether they are quantized at once, whether the quantization parameters like per-tensor scale are shared between the experts etc.).

Figuring out these details will be critical for QAT support. Cc @cjluo-nv @RalphMao

Please share if you have any particular thoughts:

Signed-off-by: Kinjal Patel <[email protected]>

Signed-off-by: Kinjal Patel <[email protected]> Signed-off-by: Ye Yu <[email protected]>

kinjalpatel27 self-assigned this Sep 15, 2025

kinjalpatel27 requested a review from realAsma September 15, 2025 23:13

kinjalpatel27 marked this pull request as ready for review September 15, 2025 23:13

kinjalpatel27 requested review from a team as code owners September 15, 2025 23:13

kinjalpatel27 requested review from Edwardf0t1, ChenhanYu and RalphMao September 15, 2025 23:13

coderabbitai bot reviewed Sep 15, 2025

View reviewed changes

realAsma approved these changes Sep 16, 2025

View reviewed changes

Added support for qwen3-next quantization and export

49c23e9

Signed-off-by: Kinjal Patel <[email protected]>

kinjalpatel27 force-pushed the kinjal/qwen3-next branch from 1ceb40e to 49c23e9 Compare September 16, 2025 01:37

cjluo-nv approved these changes Sep 16, 2025

View reviewed changes

Edwardf0t1 approved these changes Sep 16, 2025

View reviewed changes

kevalmorabia97 merged commit d406aa1 into main Sep 16, 2025
22 checks passed

kevalmorabia97 deleted the kinjal/qwen3-next branch September 16, 2025 06:36

yeyu-nvidia pushed a commit that referenced this pull request Sep 18, 2025

Added support for qwen3-next quantization and export (#323)

a26a5a6

Signed-off-by: Kinjal Patel <[email protected]> Signed-off-by: Ye Yu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added support for qwen3-next quantization and export #323

Added support for qwen3-next quantization and export #323

Uh oh!

kinjalpatel27 commented Sep 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Sep 15, 2025

Uh oh!

coderabbitai bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

realAsma left a comment

Uh oh!

Uh oh!

Uh oh!

Added support for qwen3-next quantization and export #323

Added support for qwen3-next quantization and export #323

Uh oh!

Conversation

kinjalpatel27 commented Sep 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 15, 2025

Uh oh!

coderabbitai bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

realAsma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kinjalpatel27 commented Sep 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 15, 2025 •

edited

Loading

codecov bot commented Sep 15, 2025 •

edited

Loading