NAS export refactor + skip conversion on minitron restore #424

kevalmorabia97 · 2025-10-10T17:48:55Z

What does this PR do?

Fix Minitron M-LM sharded modelopt state restore where subnet_config is not present - We skip re-conversion to minintron search space (forces TP=1) during restore
Move Nas Export mode related logic from autonas.py to conversion.py as its not just for autonas but for all nas algos. Also rename mode name from export to export_nas
Update Megatron-LM pruning example README with pruning guidelines and uneven PP command
Add minitron import one level up: Instead of mtp.plugins.mcore_minitron.* we can do mtp.mcore_minitron.* which looks more neater

Summary by CodeRabbit

New Features
- Improved plugin loading for pruning integrations.
- NAS export workflow expanded for richer export/restore of subnet configurations and calibration; export route renamed to a NAS-specific "export_nas" path.
Documentation
- Expanded pruning guide with getting-started link, depth-pruning example (e.g., 36→24 layers), output path clarification, and tip for uneven pipeline-parallel sizing.
Tests
- Updated tests and inference checks to exercise the new export_nas flow and pruning behavior.

…f mtp.plugins.* Signed-off-by: Keval Morabia <[email protected]>

coderabbitai · 2025-10-10T17:49:10Z

Walkthrough

Replaces the legacy "export" path with a new "export_nas" export workflow, adds NAS export primitives (ExportConfig, export_searchspace, restore_export), rewires mode descriptors, conditionally loads the mcore_minitron plugin, expands plugin exports, updates docs, and adapts tests to the new "export_nas" name and behavior.

Changes

Cohort / File(s)	Summary
Docs: Megatron-LM pruning README `examples/megatron-lm/README.md`	Adds pruning docs: link to pruning getting started/guidelines, depth-pruning example (36→24) with updated save path, and a TIP for uneven pipeline-parallel sizing via `MLM_EXTRA_ARGS`. Minor ancillary text edits.
Prune package init: Conditional plugin import `modelopt/torch/prune/__init__.py`	Adds `import_plugin` and conditionally imports `plugins.mcore_minitron` inside `with import_plugin("mcore_minitron", verbose=False)`; preserves existing wildcard imports.
Minitron plugin: Public API & mode rename `modelopt/torch/prune/plugins/mcore_minitron.py`	Updates `__all__` to export `SUPPORTED_HPARAMS` and `drop_mcore_language_model_layers` (re-exported from `modelopt.torch.nas.plugins.megatron`), makes `restore_mcore_minitron` a no-op, and renames mode strings to use `export_nas`.
NAS conversion: new export NAS workflow & APIs `modelopt/torch/nas/conversion.py`	Introduces `ExportConfig`, `ExportNASModeDescriptor`, `export_searchspace`, `restore_export`, metadata handling and helpers (PatchManager/SearchSpace/get_subnet_config), and routes `export()` through `"export_nas"`. Updates `__all__`.
AutoNAS: mode rewiring & constants `modelopt/torch/nas/autonas.py`	Removes prior `ExportConfig`/`ExportModeDescriptor`/export helpers; updates `AutoNASModeDescriptor` to reference `export_nas`; adds `MODELOPT_QUEUE_MAXLEN` and `MODELOPT_BN_CALIB_ITERS` constants; updates `__all__`.
NAS utils: constant removal `modelopt/torch/nas/utils.py`	Removes top-level constants `MODELOPT_QUEUE_MAXLEN` and `MODELOPT_BN_CALIB_ITERS` (moved/added elsewhere).
FastNAS / Prune mode rename `modelopt/torch/prune/fastnas.py`	Replaces `"export"` with `"export_nas"` in `next_modes` and `export_mode`.
Registry doc comment `modelopt/torch/nas/registry.py`	Updates doc comment to refer specifically to the NAS registry; no functional change.
Tests: rename export mode & adjust pruning test `tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py`, `tests/unit/torch/nas/test_nas.py`, `tests/unit/torch/opt/test_chaining.py`, `tests/unit/torch/nas/plugins/test_hf_nas_save_restore.py`	Replace `"export"` with `"export_nas"` across tests; update sampling/export-related tests; modify pruning test to capture and reload `state_dict`, run real inference via `run_mcore_inference`, and assert outputs match after rerun.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant User
    participant App
    participant ModeMgr as ModeDescriptor
    participant NASConv as nas.conversion
    participant Model

    Note over ModeMgr: Legacy flow (before change)
    User->>App: apply_mode(..., mode="export")
    App->>ModeMgr: select "export"
    ModeMgr->>Model: legacy export/convert
    Model-->>User: exported model

    Note over ModeMgr: New NAS export flow (this change)
    User->>App: apply_mode(..., mode="export_nas")
    App->>ModeMgr: select "export_nas"
    ModeMgr->>NASConv: export_searchspace(model, ExportConfig)
    NASConv->>Model: in-place subnet export + optional BN calibration
    NASConv-->>ModeMgr: metadata (subnet_config, patches)
    ModeMgr-->>User: exported model + metadata

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I hopped through modes and names today,
Swapped "export" for a NASy way.
Plugins peek if doors align,
Subnets saved and layers fine.
A rabbit cheers — hop, prune, hooray! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title concisely captures the two primary changes in this pull request by referencing the NAS export refactoring and the new behavior to skip conversion during Minitron restore. It aligns directly with the PR objectives and uses clear terminology without extraneous detail. This makes it immediately understandable to reviewers scanning the project history.
Docstring Coverage	✅ Passed	Docstring coverage is 82.14% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kmorabia/pruning-doc-update-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

examples/megatron-lm/README.md (1)
136-139: Consider showing a complete usage example.

The TIP explains uneven PP configuration but doesn't demonstrate how to integrate MLM_EXTRA_ARGS into the pruning command shown above (lines 129-134). Consider adding a concrete example:
+
+Example with uneven PP:
+```sh
+PP=4 \
+TARGET_NUM_LAYERS=24 \
+HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+MLM_MODEL_SAVE=Qwen3-8B-Pruned \
+MLM_EXTRA_ARGS="--decoder-first-pipeline-num-layers 7 --decoder-last-pipeline-num-layers 5" \
+bash megatron-lm/examples/post_training/modelopt/prune.sh qwen/Qwen3-8B
+```

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6dffcd0 and b6f831f.

📒 Files selected for processing (3)

examples/megatron-lm/README.md (2 hunks)
modelopt/torch/prune/__init__.py (1 hunks)
modelopt/torch/prune/plugins/mcore_minitron.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

modelopt/torch/prune/plugins/mcore_minitron.py (1)

modelopt/torch/nas/plugins/megatron.py (1)

drop_mcore_language_model_layers (1392-1456)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (6)

examples/megatron-lm/README.md (2)

113-113: LGTM!

Good addition of links to the pruning getting started section and guidelines. This helps users find the detailed documentation they need.

126-134: Verify the path change is intentional.

The example now uses a relative path Qwen3-8B-Pruned instead of the absolute path /tmp/Qwen3-8B-DPruned. This changes the behavior:

Relative path: saves to $PWD/Qwen3-8B-Pruned

Absolute path: saves to /tmp/Qwen3-8B-Pruned

Ensure this change aligns with user expectations and is consistent with other examples in the file (e.g., lines 61, 67, 88, 94 use /tmp/ prefix).

modelopt/torch/prune/__init__.py (2)

24-24: LGTM!

Good addition of the import_plugin utility to support conditional plugin loading.

29-30: LGTM!

The conditional plugin loading pattern is appropriate for the mcore_minitron plugin. This allows the plugin to be loaded only when its dependencies are available, preventing import errors when Megatron-Core is not installed. The verbose=False flag suppresses unnecessary logging during import.

modelopt/torch/prune/plugins/mcore_minitron.py (2)

40-40: LGTM!

Good addition of the drop_mcore_language_model_layers import. This helper function is now available for re-export, making it accessible to users at the plugin level.

74-80: LGTM!

The expanded __all__ makes the plugin's public API clearer and aligns with the PR objective to bring minitron imports one level up. Users can now access:

SUPPORTED_HPARAMS: For discovering supported pruning hyperparameters

Configuration and descriptor classes for the pruning mode

drop_mcore_language_model_layers: Helper function for manual layer dropping

This enables imports like from modelopt.torch.prune.plugins.mcore_minitron import drop_mcore_language_model_layers or through the parent package as from modelopt.torch.prune import mcore_minitron (with mcore_minitron.drop_mcore_language_model_layers).

codecov · 2025-10-10T18:01:23Z

Codecov Report

❌ Patch coverage is 96.77419% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.38%. Comparing base (5b02483) to head (7a4394e).
⚠️ Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/nas/conversion.py	95.91%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #424      +/-   ##
==========================================
+ Coverage   73.36%   73.38%   +0.01%     
==========================================
  Files         180      180              
  Lines       17919    17934      +15     
==========================================
+ Hits        13147    13160      +13     
- Misses       4772     4774       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Keval Morabia <[email protected]>

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1016d82 and 7a4394e.

📒 Files selected for processing (10)

modelopt/torch/nas/autonas.py (3 hunks)
modelopt/torch/nas/conversion.py (3 hunks)
modelopt/torch/nas/registry.py (1 hunks)
modelopt/torch/nas/utils.py (0 hunks)
modelopt/torch/prune/fastnas.py (1 hunks)
modelopt/torch/prune/plugins/mcore_minitron.py (4 hunks)
tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py (3 hunks)
tests/unit/torch/nas/plugins/test_hf_nas_save_restore.py (1 hunks)
tests/unit/torch/nas/test_nas.py (3 hunks)
tests/unit/torch/opt/test_chaining.py (3 hunks)

💤 Files with no reviewable changes (1)

modelopt/torch/nas/utils.py

✅ Files skipped from review due to trivial changes (1)

tests/unit/torch/nas/plugins/test_hf_nas_save_restore.py

🚧 Files skipped from review as they are similar to previous changes (3)

modelopt/torch/prune/fastnas.py
modelopt/torch/nas/registry.py
tests/unit/torch/nas/test_nas.py

🧰 Additional context used

🧬 Code graph analysis (4)

modelopt/torch/nas/autonas.py (5)

modelopt/torch/opt/config.py (1)

get_kwargs_for_create_model_with_rules (322-383)

modelopt/torch/nas/search_space.py (1)

generate_search_space (199-260)

modelopt/torch/nas/utils.py (3)

get_subnet_config (160-170)

sample (131-142)

select (145-157)

modelopt/torch/prune/fastnas.py (2)

sample (141-144)

export_mode (349-351)

modelopt/torch/prune/plugins/mcore_minitron.py (1)

export_mode (305-307)

modelopt/torch/prune/plugins/mcore_minitron.py (3)

modelopt/torch/nas/plugins/megatron.py (1)

drop_mcore_language_model_layers (1392-1456)

modelopt/torch/nas/autonas.py (1)

export_mode (680-682)

modelopt/torch/prune/fastnas.py (1)

export_mode (349-351)

modelopt/torch/nas/conversion.py (6)

modelopt/torch/opt/config.py (2)

ModeloptBaseConfig (59-147)

ModeloptField (50-53)

modelopt/torch/opt/conversion.py (2)

ApplyModeError (314-315)

apply_mode (342-429)

modelopt/torch/opt/mode.py (2)

ModeDescriptor (56-259)

_ModeRegistryCls (267-344)

modelopt/torch/utils/network.py (2)

compare_dict (423-427)

unwrap_model (430-454)

modelopt/torch/nas/search_space.py (1)

SearchSpace (38-196)

modelopt/torch/nas/utils.py (2)

get_subnet_config (160-170)

select (145-157)

tests/gpu/torch/prune/plugins/test_mcore_gpt_minitron_pruning.py (1)

tests/_test_utils/torch_dist/plugins/megatron_common.py (1)

run_mcore_inference (326-379)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

modelopt/torch/prune/plugins/mcore_minitron.py

[Minor] Pruning doc update + bring minitron import to mtp.* instead o…

b6f831f

…f mtp.plugins.* Signed-off-by: Keval Morabia <[email protected]>

kevalmorabia97 requested review from AAnoosheh, ChenhanYu and jenchen13 October 10, 2025 17:48

kevalmorabia97 requested review from a team as code owners October 10, 2025 17:48

coderabbitai bot reviewed Oct 10, 2025

View reviewed changes

AAnoosheh approved these changes Oct 10, 2025

View reviewed changes

kevalmorabia97 changed the title ~~[Minor] Pruning doc update + bring minitron import one level up~~ NAS export refactor + skip conversion on minitron restore Oct 11, 2025

kevalmorabia97 enabled auto-merge (squash) October 11, 2025 08:37

NAS export refactor + skip conversion on minitron restore

7a4394e

Signed-off-by: Keval Morabia <[email protected]>

kevalmorabia97 force-pushed the kmorabia/pruning-doc-update-2 branch from 1016d82 to 7a4394e Compare October 11, 2025 08:50

coderabbitai bot reviewed Oct 11, 2025

View reviewed changes

modelopt/torch/prune/plugins/mcore_minitron.py Show resolved Hide resolved

kevalmorabia97 merged commit 9e64f81 into main Oct 11, 2025
27 checks passed

kevalmorabia97 deleted the kmorabia/pruning-doc-update-2 branch October 11, 2025 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NAS export refactor + skip conversion on minitron restore #424

NAS export refactor + skip conversion on minitron restore #424

Uh oh!

kevalmorabia97 commented Oct 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Oct 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NAS export refactor + skip conversion on minitron restore #424

NAS export refactor + skip conversion on minitron restore #424

Uh oh!

Conversation

kevalmorabia97 commented Oct 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevalmorabia97 commented Oct 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 10, 2025 •

edited

Loading

codecov bot commented Oct 10, 2025 •

edited

Loading