-
Notifications
You must be signed in to change notification settings - Fork 192
Add MoE (e.g. Qwen3-30B-A3B, Mamba hybrid) pruning support in Minitron #467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c83780f to
4d0f996
Compare
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #467 +/- ##
==========================================
+ Coverage 73.52% 73.54% +0.01%
==========================================
Files 181 181
Lines 18207 18220 +13
==========================================
+ Hits 13387 13400 +13
Misses 4820 4820 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8c5a927 to
dec5105
Compare
5b62318 to
0794a1e
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughThis PR adds comprehensive Mixture of Experts (MoE) support to the model optimization framework, introducing new dynamic module classes for MoE components, expanding the Megatron plugin to handle MoE-specific hyperparameters (num_moe_experts, moe_ffn_hidden_size, moe_shared_expert_intermediate_size), updating test utilities with MoE-aware model factories, and extending documentation and tests across multiple areas. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DynMLP as _DynamicMLP
participant Export
participant MoE as MoE Layer
participant Router as TopKRouter
User->>DynMLP: initialize (with MoE context)
DynMLP->>DynMLP: detect shared_expert or moe_experts
DynMLP->>DynMLP: set hparam_name (moe_shared_expert_intermediate_size / moe_ffn_hidden_size / ffn_hidden_size)
User->>DynMLP: forward(input)
DynMLP->>DynMLP: apply hidden_size hooks with dynamic hparam
DynMLP->>MoE: route/process experts
DynMLP-->>User: output
User->>Export: export_searchspace()
Export->>DynMLP: export()
DynMLP->>Router: finalize token dispatcher
DynMLP->>MoE: export nested submodules
Export-->>User: converted model
sequenceDiagram
participant Convert as convert()
participant DML as DynamicModuleList
participant Depth as depth hyperparameter
participant Sort as sort_parameters()
Convert->>DML: wrap ModuleList
DML->>Depth: register TracedHp (1..len)
DML->>DML: set _modules dynamic attribute
Convert-->>DML: instance created
User->>DML: access m.depth choices
DML-->>User: depth options
User->>DML: set m.depth = trimmed_len
DML->>DML: _get_modules applies depth slice
User->>DML: get state_dict()
DML-->>User: state_dict with active keys
User->>Sort: sort_parameters(m, importance_fn)
Sort->>Depth: register importance scores
Sort->>DML: reorder modules by importance
User->>DML: export()
DML-->>User: standard nn.ModuleList (converted back)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/_test_utils/torch/misc.py (1)
24-38: LGTM!The debug parameter addition is useful for diagnosing test failures. The implementation preserves backward compatibility and functional equivalence.
Minor: For consistency, you could simplify the debug print statements:
if debug: diff = torch.abs(t1 - t2) - print(f"\n{i=}") - print(f"{t1=}") - print(f"{t2=}") - print(f"{diff=}") - print(f"{diff.shape=}") - print(f"{diff.min()=}") - print(f"{diff.max()=}") - print(f"{diff.mean()=}") + print( + f"\n{i=}\n{t1=}\n{t2=}\n{diff=}\n" + f"{diff.shape=} {diff.min()=} {diff.max()=} {diff.mean()=}" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
CHANGELOG.rst(1 hunks)docs/source/guides/7_nas.rst(2 hunks)examples/megatron-lm/README.md(2 hunks)examples/pruning/README.md(3 hunks)modelopt/torch/nas/modules/container.py(2 hunks)modelopt/torch/nas/plugins/megatron.py(19 hunks)modelopt/torch/nas/search_space.py(0 hunks)modelopt/torch/opt/dynamic.py(0 hunks)modelopt/torch/prune/plugins/mcore_minitron.py(4 hunks)tests/_test_utils/torch/megatron/models.py(9 hunks)tests/_test_utils/torch/misc.py(1 hunks)tests/gpu/torch/export/test_unified_export_megatron.py(2 hunks)tests/gpu/torch/nas/plugins/test_megatron_gpt_dynamic_modules.py(8 hunks)tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py(6 hunks)tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py(2 hunks)tests/gpu/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py(4 hunks)tests/unit/torch/nas/modules/test_container.py(2 hunks)
💤 Files with no reviewable changes (2)
- modelopt/torch/nas/search_space.py
- modelopt/torch/opt/dynamic.py
🧰 Additional context used
🪛 LanguageTool
examples/pruning/README.md
[grammar] ~92-~92: Use a hyphen to join words.
Context: ...---: | | Minitron | Megatron-core / NeMo based GPT / Mamba / MoE / Hybrid Models<...
(QB_NEW_EN_HYPHEN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (15)
docs/source/guides/7_nas.rst (1)
644-644: LGTM!Minor formatting fix removing trailing space.
examples/megatron-lm/README.md (2)
23-24: LGTM!Support matrix correctly updated to reflect MoE pruning capabilities for Qwen3 models.
115-126: LGTM!Pruning dimensions properly documented with MoE-specific parameters. The terminology change from "options" to "dimensions" improves clarity.
examples/pruning/README.md (3)
9-9: LGTM!MoE support properly documented in the Minitron description with comprehensive parameter coverage.
92-92: LGTM!Support matrix comprehensively lists all MoE-related pruning dimensions:
num_moe_experts,moe_ffn_hidden_size, andmoe_shared_expert_intermediate_size.
125-125: LGTM!Width pruning section properly updated with MoE parameters.
modelopt/torch/nas/modules/container.py (1)
102-131: LGTM!The
DynamicModuleListimplementation correctly supports both depth pruning and module reordering based on importance. The key design differences from_DynamicSequential(depth range starting at 1, no_dynamic_depthflag) are appropriate for the ModuleList use case.The comment on line 102 mentions not registering to DMRegistry. Could you clarify the design rationale—is this intentional to allow explicit conversion only?
tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py (3)
19-19: LGTM!Import of
compare_outputsimproves test maintainability and consistency.
70-82: LGTM!Updated to use the hybrid model factory with CUDA placement for GPU execution.
189-189: LGTM!Using the standardized
compare_outputsutility with appropriate tolerances for hybrid model comparisons.tests/gpu/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py (3)
25-25: LGTM!Import updated to use hybrid model factory.
32-55: LGTM!Function renamed to reflect hybrid model testing, with consistent use of
.cuda()for GPU execution.
133-136: LGTM!Test function properly renamed to match the hybrid variant.
tests/unit/torch/nas/modules/test_container.py (1)
111-142: LGTM!The test comprehensively validates
DynamicModuleListfunctionality including depth manipulation, importance-based sorting, and export behavior. The state_dict key assertions correctly account for modules without parameters.tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py (1)
296-336: Excellent MoE pruning coverage
Thoroughly checking router, expert, and shared expert shapes plus config values ensures the new MoE export knobs stay regression-proof. Nice work on mirroring the score-based rerun to keep parity with the dense path.
Signed-off-by: Keval Morabia <[email protected]>
928eb54 to
2214a3d
Compare
2214a3d to
0b65c51
Compare
Signed-off-by: Keval Morabia <[email protected]>
0b65c51 to
ba2e062
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qwen3-30B-A3B Pruning Results (without distillation) (p: pruned)
| num_layers | hidden_size | moe_ffn_hidden_size | num_moe_experts | Params, Active | MMLU |
|---|---|---|---|---|---|
| 48 | 2048 | 768 | 128 | 30.5B, 3.4B | 0.787 |
| 48 | 2048 | 768 | 112 p | 26.9B, 3.4B | 0.768 |
| 48 | 2048 | 768 | 96 p | 23.3B, 3.3B | 0.740 |
| 40 p | 2048 | 768 | 128 | 25.5B, 2.9B | 0.652 |
| 48 | 2048 | 512 p | 128 | 20.9B, 2.7B | 0.630 |
| 48 | 1536 p | 768 | 128 | 22.9B, 2.5B | 0.436 |
| 44 p | 1792 p | 640 p | 112 p | 18.2B, 2.5B | 0.491 |
| 44 p | 1792 p | 640 p | 96 p | 15.8B, x.xB | 0.313 |
Signed-off-by: Keval Morabia <[email protected]>
What does this PR do?
Type of change: New feature
num_moe_experts,moe_ffn_hidden_size, andmoe_shared_expert_intermediate_sizeinmcore_minitronpruningTesting
Before your PR is "Ready for review"
Summary by CodeRabbit
Release Notes
New Features
Documentation