Add MoE (e.g. Qwen3-30B-A3B, Mamba hybrid) pruning support in Minitron #467

JRD971000 · 2025-10-27T16:31:25Z

What does this PR do?

Type of change: New feature

Support pruning num_moe_experts, moe_ffn_hidden_size, and moe_shared_expert_intermediate_size in mcore_minitron pruning

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: Yes

Summary by CodeRabbit

Release Notes

New Features
- Added Mixture of Experts (MoE) pruning support with new configurable dimensions for expert count and intermediate sizes
- Extended NAS architecture search capabilities to include MoE model parameters
Documentation
- Updated support matrix and pruning documentation for MoE-compatible models
- Clarified available pruning dimensions and parameters for MoE architectures

copy-pr-bot · 2025-10-27T16:31:29Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot · 2025-11-03T08:26:37Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

codecov · 2025-11-03T08:39:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.54%. Comparing base (5adb9ba) to head (ebf60b7).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #467      +/-   ##
==========================================
+ Coverage   73.52%   73.54%   +0.01%     
==========================================
  Files         181      181              
  Lines       18207    18220      +13     
==========================================
+ Hits        13387    13400      +13     
  Misses       4820     4820

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2025-11-05T19:55:23Z

@coderabbitai review

coderabbitai · 2025-11-05T19:55:33Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-11-05T20:03:31Z

Walkthrough

This PR adds comprehensive Mixture of Experts (MoE) support to the model optimization framework, introducing new dynamic module classes for MoE components, expanding the Megatron plugin to handle MoE-specific hyperparameters (num_moe_experts, moe_ffn_hidden_size, moe_shared_expert_intermediate_size), updating test utilities with MoE-aware model factories, and extending documentation and tests across multiple areas.

Changes

Cohort / File(s)	Summary
Documentation Updates `CHANGELOG.rst`, `docs/source/guides/7_nas.rst`, `examples/megatron-lm/README.md`, `examples/pruning/README.md`	Added MoE-related entries to changelogs, NAS documentation now lists MoE-specific searchable parameters, example READMEs document MoE pruning dimensions for Qwen models and clarify pruning terminology.
Core NAS Infrastructure `modelopt/torch/nas/modules/container.py`	Introduced new public class `DynamicModuleList` with depth hyperparameter and dynamic module handling capabilities via `_setup` and `_get_modules` methods.
Dynamic Module Refactoring `modelopt/torch/nas/search_space.py`, `modelopt/torch/opt/dynamic.py`	Removed post-sort rebind step from `sort_parameters` and deleted `force_assign` method from `DynamicModule` class.
Megatron Plugin MoE Expansion `modelopt/torch/nas/plugins/megatron.py`	Added three new dynamic wrapper classes (`_DynamicTopKRouter`, `_DynamicSequentialMLP`, `_DynamicMoELayer`) with export/modify capabilities; extended `_DynamicMLP` to support MoE contexts with dynamic hparam naming (moe_shared_expert_intermediate_size, moe_ffn_hidden_size); added per-expert/per-router importance hooks; updated TransformerLayer and language model setup to handle MoE layers.
Pruning Configuration `modelopt/torch/prune/plugins/mcore_minitron.py`	Extended `SUPPORTED_HPARAMS` with MoE parameters; added validation assertions in search before/run methods; introduced `num_moe_experts_divisor` to default rule configurations.
Test Utilities `tests/_test_utils/torch/megatron/models.py`, `tests/_test_utils/torch/misc.py`	Added MoE configuration parameters to `get_mcore_gpt_model`; renamed `get_mcore_mamba_model` to `get_mcore_mamba_hybrid_model` with MoE support; introduced hybrid pattern generation logic; added optional `debug` parameter to `compare_outputs` for diagnostic output.
Export and GPU Tests `tests/gpu/torch/export/test_unified_export_megatron.py`, `tests/gpu/torch/nas/plugins/test_megatron_gpt_dynamic_modules.py`	Added `.cuda()` calls to model initialization; introduced new MoE-focused test sections (`_test_gpt_moe_search_space`, `test_gpt_moe_search_space`, `_test_gpt_moe_parameter_sorting`, `test_gpt_moe_parameter_sorting`); added `compare_outputs`, `export_searchspace`, and `DynamicModuleList` imports.
Mamba and Pruning Tests `tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py`, `tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py`, `tests/gpu/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py`	Updated to use hybrid Mamba model factory; added MoE pruning test workflow (`_test_mcore_gpt_pruning_moe`, `test_mcore_gpt_pruning_moe`); replaced manual output comparisons with `compare_outputs` utility; appended `.cuda()` to model constructions.
Unit Tests `tests/unit/torch/nas/modules/test_container.py`	Added comprehensive tests for new `DynamicModuleList` covering conversion, trimming, sorting by importance, and export lifecycle.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DynMLP as _DynamicMLP
    participant Export
    participant MoE as MoE Layer
    participant Router as TopKRouter

    User->>DynMLP: initialize (with MoE context)
    DynMLP->>DynMLP: detect shared_expert or moe_experts
    DynMLP->>DynMLP: set hparam_name (moe_shared_expert_intermediate_size / moe_ffn_hidden_size / ffn_hidden_size)
    
    User->>DynMLP: forward(input)
    DynMLP->>DynMLP: apply hidden_size hooks with dynamic hparam
    DynMLP->>MoE: route/process experts
    DynMLP-->>User: output
    
    User->>Export: export_searchspace()
    Export->>DynMLP: export()
    DynMLP->>Router: finalize token dispatcher
    DynMLP->>MoE: export nested submodules
    Export-->>User: converted model

sequenceDiagram
    participant Convert as convert()
    participant DML as DynamicModuleList
    participant Depth as depth hyperparameter
    participant Sort as sort_parameters()

    Convert->>DML: wrap ModuleList
    DML->>Depth: register TracedHp (1..len)
    DML->>DML: set _modules dynamic attribute
    Convert-->>DML: instance created
    
    User->>DML: access m.depth choices
    DML-->>User: depth options
    
    User->>DML: set m.depth = trimmed_len
    DML->>DML: _get_modules applies depth slice
    User->>DML: get state_dict()
    DML-->>User: state_dict with active keys
    
    User->>Sort: sort_parameters(m, importance_fn)
    Sort->>Depth: register importance scores
    Sort->>DML: reorder modules by importance
    User->>DML: export()
    DML-->>User: standard nn.ModuleList (converted back)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

High-priority areas:
- modelopt/torch/nas/plugins/megatron.py: Dense MoE support logic with multiple new dynamic wrapper classes, hparam naming decisions, importance hooks, and export paths requiring careful verification of correctness and edge cases.
- tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py: New MoE pruning test workflow with duplicate/similar logic patterns; verify correctness of MoE-specific pruning export_config and structural validation.
- tests/_test_utils/torch/megatron/models.py: Hybrid pattern generation logic and MoE gate routing are non-obvious; verify pipeline-parallelism handling and pattern validation.
- modelopt/torch/nas/modules/container.py and related test: New public class with depth hyperparameter registration; verify state_dict handling and export semantics align with existing dynamic module patterns.

Poem

🐰 Hops through experts, dynamic and deep,
MoE layers pruned, no secrets to keep,
DynamicLists now guide the depth,
Megatron routers take every step,
Sorted by importance, sparse yet fleet! 🌟

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the primary change: adding MoE pruning support in Minitron, which aligns with the changeset's focus on MoE parameters and pruning functionality.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch alit/moe_dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

tests/_test_utils/torch/misc.py (1)

24-38: LGTM!

The debug parameter addition is useful for diagnosing test failures. The implementation preserves backward compatibility and functional equivalence.

Minor: For consistency, you could simplify the debug print statements:

         if debug:
             diff = torch.abs(t1 - t2)
-            print(f"\n{i=}")
-            print(f"{t1=}")
-            print(f"{t2=}")
-            print(f"{diff=}")
-            print(f"{diff.shape=}")
-            print(f"{diff.min()=}")
-            print(f"{diff.max()=}")
-            print(f"{diff.mean()=}")
+            print(
+                f"\n{i=}\n{t1=}\n{t2=}\n{diff=}\n"
+                f"{diff.shape=} {diff.min()=} {diff.max()=} {diff.mean()=}"
+            )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47ddd14 and 40902ed.

📒 Files selected for processing (17)

CHANGELOG.rst (1 hunks)
docs/source/guides/7_nas.rst (2 hunks)
examples/megatron-lm/README.md (2 hunks)
examples/pruning/README.md (3 hunks)
modelopt/torch/nas/modules/container.py (2 hunks)
modelopt/torch/nas/plugins/megatron.py (19 hunks)
modelopt/torch/nas/search_space.py (0 hunks)
modelopt/torch/opt/dynamic.py (0 hunks)
modelopt/torch/prune/plugins/mcore_minitron.py (4 hunks)
tests/_test_utils/torch/megatron/models.py (9 hunks)
tests/_test_utils/torch/misc.py (1 hunks)
tests/gpu/torch/export/test_unified_export_megatron.py (2 hunks)
tests/gpu/torch/nas/plugins/test_megatron_gpt_dynamic_modules.py (8 hunks)
tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py (6 hunks)
tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py (2 hunks)
tests/gpu/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py (4 hunks)
tests/unit/torch/nas/modules/test_container.py (2 hunks)

💤 Files with no reviewable changes (2)

modelopt/torch/nas/search_space.py
modelopt/torch/opt/dynamic.py

🧰 Additional context used

🪛 LanguageTool

examples/pruning/README.md

[grammar] ~92-~92: Use a hyphen to join words.
Context: ...---: | | Minitron | Megatron-core / NeMo based GPT / Mamba / MoE / Hybrid Models<...

(QB_NEW_EN_HYPHEN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (15)

docs/source/guides/7_nas.rst (1)

644-644: LGTM!

Minor formatting fix removing trailing space.

examples/megatron-lm/README.md (2)

23-24: LGTM!

Support matrix correctly updated to reflect MoE pruning capabilities for Qwen3 models.

115-126: LGTM!

Pruning dimensions properly documented with MoE-specific parameters. The terminology change from "options" to "dimensions" improves clarity.

examples/pruning/README.md (3)

9-9: LGTM!

MoE support properly documented in the Minitron description with comprehensive parameter coverage.

92-92: LGTM!

Support matrix comprehensively lists all MoE-related pruning dimensions: num_moe_experts, moe_ffn_hidden_size, and moe_shared_expert_intermediate_size.

125-125: LGTM!

Width pruning section properly updated with MoE parameters.

modelopt/torch/nas/modules/container.py (1)

102-131: LGTM!

The DynamicModuleList implementation correctly supports both depth pruning and module reordering based on importance. The key design differences from _DynamicSequential (depth range starting at 1, no _dynamic_depth flag) are appropriate for the ModuleList use case.

The comment on line 102 mentions not registering to DMRegistry. Could you clarify the design rationale—is this intentional to allow explicit conversion only?

tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py (3)

19-19: LGTM!

Import of compare_outputs improves test maintainability and consistency.

70-82: LGTM!

Updated to use the hybrid model factory with CUDA placement for GPU execution.

189-189: LGTM!

Using the standardized compare_outputs utility with appropriate tolerances for hybrid model comparisons.

tests/gpu/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py (3)

25-25: LGTM!

Import updated to use hybrid model factory.

32-55: LGTM!

Function renamed to reflect hybrid model testing, with consistent use of .cuda() for GPU execution.

133-136: LGTM!

Test function properly renamed to match the hybrid variant.

tests/unit/torch/nas/modules/test_container.py (1)

111-142: LGTM!

The test comprehensively validates DynamicModuleList functionality including depth manipulation, importance-based sorting, and export behavior. The state_dict key assertions correctly account for modules without parameters.

tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py (1)

296-336: Excellent MoE pruning coverage
Thoroughly checking router, expert, and shared expert shapes plus config values ensures the new MoE export knobs stay regression-proof. Nice work on mirroring the score-based rerun to keep parity with the dense path.

docs/source/guides/7_nas.rst

modelopt/torch/nas/plugins/megatron.py

modelopt/torch/prune/plugins/mcore_minitron.py

Signed-off-by: Keval Morabia <[email protected]>

kevalmorabia97

Qwen3-30B-A3B Pruning Results (without distillation) (p: pruned)

num_layers	hidden_size	moe_ffn_hidden_size	num_moe_experts	Params, Active	MMLU
48	2048	768	128	30.5B, 3.4B	0.787
48	2048	768	112 p	26.9B, 3.4B	0.768
48	2048	768	96 p	23.3B, 3.3B	0.740
40 p	2048	768	128	25.5B, 2.9B	0.652
48	2048	512 p	128	20.9B, 2.7B	0.630
48	1536 p	768	128	22.9B, 2.5B	0.436
44 p	1792 p	640 p	112 p	18.2B, 2.5B	0.491
44 p	1792 p	640 p	96 p	15.8B, x.xB	0.313

Signed-off-by: Keval Morabia <[email protected]>

JRD971000 requested a review from kevalmorabia97 October 27, 2025 16:31

JRD971000 self-assigned this Oct 27, 2025

kevalmorabia97 changed the title ~~Alit/moe dev~~ Add MoE pruning support in Minitron Oct 27, 2025

kevalmorabia97 force-pushed the alit/moe_dev branch from c83780f to 4d0f996 Compare November 3, 2025 08:26

kevalmorabia97 force-pushed the alit/moe_dev branch 5 times, most recently from 8c5a927 to dec5105 Compare November 4, 2025 18:01

kevalmorabia97 marked this pull request as ready for review November 4, 2025 20:33

kevalmorabia97 requested review from a team as code owners November 4, 2025 20:33

kevalmorabia97 force-pushed the alit/moe_dev branch 4 times, most recently from 5b62318 to 0794a1e Compare November 5, 2025 17:24

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

docs/source/guides/7_nas.rst Outdated Show resolved Hide resolved

modelopt/torch/nas/plugins/megatron.py Show resolved Hide resolved

modelopt/torch/prune/plugins/mcore_minitron.py Show resolved Hide resolved

Add MoE pruning support.

3d246d7

Signed-off-by: Keval Morabia <[email protected]>

kevalmorabia97 force-pushed the alit/moe_dev branch 4 times, most recently from 928eb54 to 2214a3d Compare November 6, 2025 07:34

kevalmorabia97 changed the title ~~Add MoE pruning support in Minitron~~ Add MoE (e.g. Qwen3-30B-A3B, Mamba hybrid) pruning support in Minitron Nov 6, 2025

kevalmorabia97 requested a review from ChenhanYu November 6, 2025 09:16

kevalmorabia97 force-pushed the alit/moe_dev branch from 2214a3d to 0b65c51 Compare November 6, 2025 10:21

DynamicModuleList, moe_ffn hparam, Tests, Cleanup

ba2e062

Signed-off-by: Keval Morabia <[email protected]>

kevalmorabia97 force-pushed the alit/moe_dev branch from 0b65c51 to ba2e062 Compare November 6, 2025 20:02

kevalmorabia97 approved these changes Nov 6, 2025

View reviewed changes

Merge branch 'main' into alit/moe_dev

ebf60b7

Signed-off-by: Keval Morabia <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MoE (e.g. Qwen3-30B-A3B, Mamba hybrid) pruning support in Minitron #467

Add MoE (e.g. Qwen3-30B-A3B, Mamba hybrid) pruning support in Minitron #467

JRD971000 commented Oct 27, 2025 •

edited by kevalmorabia97

Loading

Uh oh!

copy-pr-bot bot commented Oct 27, 2025

Uh oh!

copy-pr-bot bot commented Nov 3, 2025

Uh oh!

codecov bot commented Nov 3, 2025 •

edited

Loading

Uh oh!

kevalmorabia97 commented Nov 5, 2025

Uh oh!

coderabbitai bot commented Nov 5, 2025

Uh oh!

coderabbitai bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevalmorabia97 left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add MoE (e.g. Qwen3-30B-A3B, Mamba hybrid) pruning support in Minitron #467

Are you sure you want to change the base?

Add MoE (e.g. Qwen3-30B-A3B, Mamba hybrid) pruning support in Minitron #467

Conversation

JRD971000 commented Oct 27, 2025 • edited by kevalmorabia97 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Oct 27, 2025

Uh oh!

copy-pr-bot bot commented Nov 3, 2025

Uh oh!

codecov bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevalmorabia97 commented Nov 5, 2025

Uh oh!

coderabbitai bot commented Nov 5, 2025

Uh oh!

coderabbitai bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevalmorabia97 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Qwen3-30B-A3B Pruning Results (without distillation) (p: pruned)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JRD971000 commented Oct 27, 2025 •

edited by kevalmorabia97

Loading

codecov bot commented Nov 3, 2025 •

edited

Loading

coderabbitai bot commented Nov 5, 2025 •

edited

Loading

kevalmorabia97 left a comment •

edited

Loading