Skip to content

Conversation

@JRD971000
Copy link

@JRD971000 JRD971000 commented Oct 27, 2025

What does this PR do?

Type of change: New feature

  • Support pruning num_moe_experts, moe_ffn_hidden_size, and moe_shared_expert_intermediate_size in mcore_minitron pruning

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: Yes

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Mixture of Experts (MoE) pruning support with new configurable dimensions for expert count and intermediate sizes
    • Extended NAS architecture search capabilities to include MoE model parameters
  • Documentation

    • Updated support matrix and pruning documentation for MoE-compatible models
    • Clarified available pruning dimensions and parameters for MoE architectures

@JRD971000 JRD971000 self-assigned this Oct 27, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 27, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kevalmorabia97 kevalmorabia97 changed the title Alit/moe dev Add MoE pruning support in Minitron Oct 27, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 3, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.54%. Comparing base (5adb9ba) to head (ebf60b7).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #467      +/-   ##
==========================================
+ Coverage   73.52%   73.54%   +0.01%     
==========================================
  Files         181      181              
  Lines       18207    18220      +13     
==========================================
+ Hits        13387    13400      +13     
  Misses       4820     4820              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 force-pushed the alit/moe_dev branch 5 times, most recently from 8c5a927 to dec5105 Compare November 4, 2025 18:01
@kevalmorabia97 kevalmorabia97 marked this pull request as ready for review November 4, 2025 20:33
@kevalmorabia97 kevalmorabia97 requested review from a team as code owners November 4, 2025 20:33
@kevalmorabia97 kevalmorabia97 force-pushed the alit/moe_dev branch 4 times, most recently from 5b62318 to 0794a1e Compare November 5, 2025 17:24
@kevalmorabia97
Copy link
Collaborator

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 5, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 5, 2025

Walkthrough

This PR adds comprehensive Mixture of Experts (MoE) support to the model optimization framework, introducing new dynamic module classes for MoE components, expanding the Megatron plugin to handle MoE-specific hyperparameters (num_moe_experts, moe_ffn_hidden_size, moe_shared_expert_intermediate_size), updating test utilities with MoE-aware model factories, and extending documentation and tests across multiple areas.

Changes

Cohort / File(s) Summary
Documentation Updates
CHANGELOG.rst, docs/source/guides/7_nas.rst, examples/megatron-lm/README.md, examples/pruning/README.md
Added MoE-related entries to changelogs, NAS documentation now lists MoE-specific searchable parameters, example READMEs document MoE pruning dimensions for Qwen models and clarify pruning terminology.
Core NAS Infrastructure
modelopt/torch/nas/modules/container.py
Introduced new public class DynamicModuleList with depth hyperparameter and dynamic module handling capabilities via _setup and _get_modules methods.
Dynamic Module Refactoring
modelopt/torch/nas/search_space.py, modelopt/torch/opt/dynamic.py
Removed post-sort rebind step from sort_parameters and deleted force_assign method from DynamicModule class.
Megatron Plugin MoE Expansion
modelopt/torch/nas/plugins/megatron.py
Added three new dynamic wrapper classes (_DynamicTopKRouter, _DynamicSequentialMLP, _DynamicMoELayer) with export/modify capabilities; extended _DynamicMLP to support MoE contexts with dynamic hparam naming (moe_shared_expert_intermediate_size, moe_ffn_hidden_size); added per-expert/per-router importance hooks; updated TransformerLayer and language model setup to handle MoE layers.
Pruning Configuration
modelopt/torch/prune/plugins/mcore_minitron.py
Extended SUPPORTED_HPARAMS with MoE parameters; added validation assertions in search before/run methods; introduced num_moe_experts_divisor to default rule configurations.
Test Utilities
tests/_test_utils/torch/megatron/models.py, tests/_test_utils/torch/misc.py
Added MoE configuration parameters to get_mcore_gpt_model; renamed get_mcore_mamba_model to get_mcore_mamba_hybrid_model with MoE support; introduced hybrid pattern generation logic; added optional debug parameter to compare_outputs for diagnostic output.
Export and GPU Tests
tests/gpu/torch/export/test_unified_export_megatron.py, tests/gpu/torch/nas/plugins/test_megatron_gpt_dynamic_modules.py
Added .cuda() calls to model initialization; introduced new MoE-focused test sections (_test_gpt_moe_search_space, test_gpt_moe_search_space, _test_gpt_moe_parameter_sorting, test_gpt_moe_parameter_sorting); added compare_outputs, export_searchspace, and DynamicModuleList imports.
Mamba and Pruning Tests
tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py, tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py, tests/gpu/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py
Updated to use hybrid Mamba model factory; added MoE pruning test workflow (_test_mcore_gpt_pruning_moe, test_mcore_gpt_pruning_moe); replaced manual output comparisons with compare_outputs utility; appended .cuda() to model constructions.
Unit Tests
tests/unit/torch/nas/modules/test_container.py
Added comprehensive tests for new DynamicModuleList covering conversion, trimming, sorting by importance, and export lifecycle.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DynMLP as _DynamicMLP
    participant Export
    participant MoE as MoE Layer
    participant Router as TopKRouter

    User->>DynMLP: initialize (with MoE context)
    DynMLP->>DynMLP: detect shared_expert or moe_experts
    DynMLP->>DynMLP: set hparam_name (moe_shared_expert_intermediate_size / moe_ffn_hidden_size / ffn_hidden_size)
    
    User->>DynMLP: forward(input)
    DynMLP->>DynMLP: apply hidden_size hooks with dynamic hparam
    DynMLP->>MoE: route/process experts
    DynMLP-->>User: output
    
    User->>Export: export_searchspace()
    Export->>DynMLP: export()
    DynMLP->>Router: finalize token dispatcher
    DynMLP->>MoE: export nested submodules
    Export-->>User: converted model
Loading
sequenceDiagram
    participant Convert as convert()
    participant DML as DynamicModuleList
    participant Depth as depth hyperparameter
    participant Sort as sort_parameters()

    Convert->>DML: wrap ModuleList
    DML->>Depth: register TracedHp (1..len)
    DML->>DML: set _modules dynamic attribute
    Convert-->>DML: instance created
    
    User->>DML: access m.depth choices
    DML-->>User: depth options
    
    User->>DML: set m.depth = trimmed_len
    DML->>DML: _get_modules applies depth slice
    User->>DML: get state_dict()
    DML-->>User: state_dict with active keys
    
    User->>Sort: sort_parameters(m, importance_fn)
    Sort->>Depth: register importance scores
    Sort->>DML: reorder modules by importance
    User->>DML: export()
    DML-->>User: standard nn.ModuleList (converted back)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

  • High-priority areas:
    • modelopt/torch/nas/plugins/megatron.py: Dense MoE support logic with multiple new dynamic wrapper classes, hparam naming decisions, importance hooks, and export paths requiring careful verification of correctness and edge cases.
    • tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py: New MoE pruning test workflow with duplicate/similar logic patterns; verify correctness of MoE-specific pruning export_config and structural validation.
    • tests/_test_utils/torch/megatron/models.py: Hybrid pattern generation logic and MoE gate routing are non-obvious; verify pipeline-parallelism handling and pattern validation.
    • modelopt/torch/nas/modules/container.py and related test: New public class with depth hyperparameter registration; verify state_dict handling and export semantics align with existing dynamic module patterns.

Poem

🐰 Hops through experts, dynamic and deep,
MoE layers pruned, no secrets to keep,
DynamicLists now guide the depth,
Megatron routers take every step,
Sorted by importance, sparse yet fleet! 🌟

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the primary change: adding MoE pruning support in Minitron, which aligns with the changeset's focus on MoE parameters and pruning functionality.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch alit/moe_dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/_test_utils/torch/misc.py (1)

24-38: LGTM!

The debug parameter addition is useful for diagnosing test failures. The implementation preserves backward compatibility and functional equivalence.

Minor: For consistency, you could simplify the debug print statements:

         if debug:
             diff = torch.abs(t1 - t2)
-            print(f"\n{i=}")
-            print(f"{t1=}")
-            print(f"{t2=}")
-            print(f"{diff=}")
-            print(f"{diff.shape=}")
-            print(f"{diff.min()=}")
-            print(f"{diff.max()=}")
-            print(f"{diff.mean()=}")
+            print(
+                f"\n{i=}\n{t1=}\n{t2=}\n{diff=}\n"
+                f"{diff.shape=} {diff.min()=} {diff.max()=} {diff.mean()=}"
+            )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47ddd14 and 40902ed.

📒 Files selected for processing (17)
  • CHANGELOG.rst (1 hunks)
  • docs/source/guides/7_nas.rst (2 hunks)
  • examples/megatron-lm/README.md (2 hunks)
  • examples/pruning/README.md (3 hunks)
  • modelopt/torch/nas/modules/container.py (2 hunks)
  • modelopt/torch/nas/plugins/megatron.py (19 hunks)
  • modelopt/torch/nas/search_space.py (0 hunks)
  • modelopt/torch/opt/dynamic.py (0 hunks)
  • modelopt/torch/prune/plugins/mcore_minitron.py (4 hunks)
  • tests/_test_utils/torch/megatron/models.py (9 hunks)
  • tests/_test_utils/torch/misc.py (1 hunks)
  • tests/gpu/torch/export/test_unified_export_megatron.py (2 hunks)
  • tests/gpu/torch/nas/plugins/test_megatron_gpt_dynamic_modules.py (8 hunks)
  • tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py (6 hunks)
  • tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py (2 hunks)
  • tests/gpu/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py (4 hunks)
  • tests/unit/torch/nas/modules/test_container.py (2 hunks)
💤 Files with no reviewable changes (2)
  • modelopt/torch/nas/search_space.py
  • modelopt/torch/opt/dynamic.py
🧰 Additional context used
🪛 LanguageTool
examples/pruning/README.md

[grammar] ~92-~92: Use a hyphen to join words.
Context: ...---: | | Minitron | Megatron-core / NeMo based GPT / Mamba / MoE / Hybrid Models<...

(QB_NEW_EN_HYPHEN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (15)
docs/source/guides/7_nas.rst (1)

644-644: LGTM!

Minor formatting fix removing trailing space.

examples/megatron-lm/README.md (2)

23-24: LGTM!

Support matrix correctly updated to reflect MoE pruning capabilities for Qwen3 models.


115-126: LGTM!

Pruning dimensions properly documented with MoE-specific parameters. The terminology change from "options" to "dimensions" improves clarity.

examples/pruning/README.md (3)

9-9: LGTM!

MoE support properly documented in the Minitron description with comprehensive parameter coverage.


92-92: LGTM!

Support matrix comprehensively lists all MoE-related pruning dimensions: num_moe_experts, moe_ffn_hidden_size, and moe_shared_expert_intermediate_size.


125-125: LGTM!

Width pruning section properly updated with MoE parameters.

modelopt/torch/nas/modules/container.py (1)

102-131: LGTM!

The DynamicModuleList implementation correctly supports both depth pruning and module reordering based on importance. The key design differences from _DynamicSequential (depth range starting at 1, no _dynamic_depth flag) are appropriate for the ModuleList use case.

The comment on line 102 mentions not registering to DMRegistry. Could you clarify the design rationale—is this intentional to allow explicit conversion only?

tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py (3)

19-19: LGTM!

Import of compare_outputs improves test maintainability and consistency.


70-82: LGTM!

Updated to use the hybrid model factory with CUDA placement for GPU execution.


189-189: LGTM!

Using the standardized compare_outputs utility with appropriate tolerances for hybrid model comparisons.

tests/gpu/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py (3)

25-25: LGTM!

Import updated to use hybrid model factory.


32-55: LGTM!

Function renamed to reflect hybrid model testing, with consistent use of .cuda() for GPU execution.


133-136: LGTM!

Test function properly renamed to match the hybrid variant.

tests/unit/torch/nas/modules/test_container.py (1)

111-142: LGTM!

The test comprehensively validates DynamicModuleList functionality including depth manipulation, importance-based sorting, and export behavior. The state_dict key assertions correctly account for modules without parameters.

tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py (1)

296-336: Excellent MoE pruning coverage
Thoroughly checking router, expert, and shared expert shapes plus config values ensures the new MoE export knobs stay regression-proof. Nice work on mirroring the score-based rerun to keep parity with the dense path.

Signed-off-by: Keval Morabia <[email protected]>
@kevalmorabia97 kevalmorabia97 force-pushed the alit/moe_dev branch 4 times, most recently from 928eb54 to 2214a3d Compare November 6, 2025 07:34
@kevalmorabia97 kevalmorabia97 changed the title Add MoE pruning support in Minitron Add MoE (e.g. Qwen3-30B-A3B, Mamba hybrid) pruning support in Minitron Nov 6, 2025
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qwen3-30B-A3B Pruning Results (without distillation) (p: pruned)

num_layers hidden_size moe_ffn_hidden_size num_moe_experts Params, Active MMLU
48 2048 768 128 30.5B, 3.4B 0.787
48 2048 768 112 p 26.9B, 3.4B 0.768
48 2048 768 96 p 23.3B, 3.3B 0.740
40 p 2048 768 128 25.5B, 2.9B 0.652
48 2048 512 p 128 20.9B, 2.7B 0.630
48 1536 p 768 128 22.9B, 2.5B 0.436
44 p 1792 p 640 p 112 p 18.2B, 2.5B 0.491
44 p 1792 p 640 p 96 p 15.8B, x.xB 0.313

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants