Skip to content

Conversation

@kevalmorabia97
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 commented Oct 7, 2025

What does this PR do?

Type of change: Implementation improvement

Overview: Remove unnecessary complexities of using the FastNAS/AutoNAS automatic recursive tracer and directly convert top level gpt/mamba model to DynamicModule for simplicity. No change in user-facing feature, only internal implementation improved.

Testing

  • All tests still passing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Summary by CodeRabbit

  • New Features

    • Added conversion and restore entrypoints for dynamic-space workflows; mode descriptor now guides next steps and supports export/restore.
  • Refactor

    • Consolidated layer-dropping under a single language-model API and removed deprecated alias.
    • Centralized plugin import handling and removed legacy Megatron trace plugin and its optional registrations.
    • Updated wording to reference broader language-model support.
  • Tests

    • Updated tests to use the new dynamic-space conversion utilities and adjusted helpers accordingly.

@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner October 7, 2025 16:21
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 7, 2025

Walkthrough

Consolidates Megatron GPT-specific APIs and removes the Megatron trace plugin; adds a DynamicSpace conversion workflow for MCore Minitron with convert/restore entrypoints and updates tests and trace plugin imports accordingly.

Changes

Cohort / File(s) Summary
Megatron NAS plugin API consolidation
modelopt/torch/nas/plugins/megatron.py
Removes deprecated drop_mcore_gpt_layers, reduces __all__ to ["drop_mcore_language_model_layers"], updates module docstring to reference language models generally.
MCore Minitron DynamicSpace conversion & mode descriptor
modelopt/torch/prune/plugins/mcore_minitron.py
Adds _convert_model_to_dynamic_space, convert_mcore_minitron, restore_mcore_minitron; constructs/returns DynamicSpace, validates configurability (raises ApplyModeError if none); MCoreMinitronModeDescriptor now inherits ModeDescriptor and exposes next_modes, export_mode, convert, restore; updates __all__.
Trace plugins import refactor
modelopt/torch/trace/plugins/__init__.py
Replaces per-plugin try/except with centralized import_plugin context; removes explicit Megatron import attempts; wraps transformers import via import_plugin.
Remove Megatron trace plugin module
modelopt/torch/trace/plugins/megatron.py
Deletes the module and its get_megatron_language_model_sym_info and SymMap registrations.
Tests updated to use DynamicSpace
tests/gpu/torch/nas/plugins/test_megatron_gpt_dynamic_modules.py, tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py
Replace generate_search_space with _convert_model_to_dynamic_space; use dynamic_space APIs (named_hparams, export(DMRegistry)); update sorting to mtn.utils.sort_parameters; add DMRegistry and conversion imports.
Trace registry adjustments in tests
tests/unit/torch/trace/test_symbol.py
Removes optional imports/registry additions for Megatron GPTModel/MambaModel; retains remaining registry checks.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Search as Searcher / Mode runner
  participant Convert as convert_mcore_minitron
  participant Helper as _convert_model_to_dynamic_space
  participant DM as DMRegistry
  participant Model as nn.Module

  User->>Search: start(mode=MCoreMinitron, model, config)
  Search->>Convert: call convert_mcore_minitron(model, config)
  Convert->>Helper: _convert_model_to_dynamic_space(model, config?)
  Helper->>DM: request DMRegistry conversion / mark models
  DM-->>Helper: returns DynamicSpace
  Helper->>Helper: validate configurability (or raise ApplyModeError)
  Helper-->>Convert: DynamicSpace + metadata
  Convert-->>Search: (model, metadata)
  Note over Search,DM: Subsequent step: dynamic_space.export(DMRegistry) → export pipeline
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

A rabbit hops through code at night,
Trims GPT paths with a gentle bite.
Minitron wakes in DynamicSpace bloom,
Convert and restore — a tidy room.
Plugins slimmed, tests follow suit — hop, onward to new light. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly captures the primary goal of the changeset by stating that tracing use in Minitron search space generation is being removed, which aligns with the conversion to a direct DynamicModule approach described in the PR. It is concise, specific, and understandable to a teammate scanning the history. It avoids vague or noisy phrasing and focuses on the main implementation improvement.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kmorabia/minitron-without-tracing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/minitron-without-tracing branch from 00f4c5a to ce1c306 Compare October 7, 2025 16:24
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b39c73d and ce1c306.

📒 Files selected for processing (7)
  • modelopt/torch/nas/plugins/megatron.py (2 hunks)
  • modelopt/torch/prune/plugins/mcore_minitron.py (4 hunks)
  • modelopt/torch/trace/plugins/__init__.py (1 hunks)
  • modelopt/torch/trace/plugins/megatron.py (0 hunks)
  • tests/gpu/torch/nas/plugins/test_megatron_gpt_dynamic_modules.py (3 hunks)
  • tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py (3 hunks)
  • tests/unit/torch/trace/test_symbol.py (0 hunks)
💤 Files with no reviewable changes (2)
  • modelopt/torch/trace/plugins/megatron.py
  • tests/unit/torch/trace/test_symbol.py
🧰 Additional context used
🧬 Code graph analysis (3)
modelopt/torch/prune/plugins/mcore_minitron.py (6)
modelopt/torch/nas/plugins/megatron.py (3)
  • _DynamicMCoreLanguageModel (1150-1389)
  • convert (272-297)
  • convert (382-405)
modelopt/torch/nas/utils.py (1)
  • get_subnet_config (166-176)
modelopt/torch/opt/dynamic.py (3)
  • config (1265-1278)
  • DynamicSpace (1127-1368)
  • convert_to_dynamic (1138-1187)
modelopt/torch/opt/config.py (2)
  • ModeloptBaseConfig (59-147)
  • keys (132-134)
modelopt/torch/opt/conversion.py (1)
  • ApplyModeError (314-315)
modelopt/torch/opt/mode.py (1)
  • ModeDescriptor (56-259)
tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py (3)
modelopt/torch/prune/plugins/mcore_minitron.py (1)
  • _convert_model_to_dynamic_space (235-248)
modelopt/torch/nas/search_space.py (1)
  • sort_parameters (131-170)
modelopt/torch/opt/utils.py (1)
  • named_hparams (62-66)
tests/gpu/torch/nas/plugins/test_megatron_gpt_dynamic_modules.py (2)
modelopt/torch/prune/plugins/mcore_minitron.py (1)
  • _convert_model_to_dynamic_space (235-248)
modelopt/torch/nas/search_space.py (2)
  • sort_parameters (131-170)
  • export (172-174)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (10)
modelopt/torch/prune/plugins/mcore_minitron.py (5)

30-30: LGTM!

The import additions support the new dynamic-space conversion workflow and properly expose the updated public API.

Also applies to: 34-40, 45-54, 73-73


235-248: LGTM!

The function correctly implements the simplified dynamic-space conversion workflow by directly converting top-level language models (GPT/Mamba) without recursive tracing. The validation check ensures only models with configurable hyperparameters proceed.


251-264: LGTM!

The convert entrypoint correctly creates the dynamic space and captures the current subnet configuration in metadata for potential restoration later.


267-271: LGTM!

The restore entrypoint re-applies the mode conversion, which is appropriate when the conversion is idempotent and model state is restored separately via checkpoint loading mechanisms.


292-315: LGTM!

The mode descriptor properties correctly implement the ModeDescriptor interface and wire up the new convert/restore entrypoints. The next_modes and export_mode properties properly define the mode's compatibility and export behavior.

modelopt/torch/trace/plugins/__init__.py (1)

18-21: LGTM!

The centralized import_plugin utility simplifies plugin loading and aligns with the removal of the Megatron trace plugin, which is no longer needed after switching to direct dynamic-space conversion.

modelopt/torch/nas/plugins/megatron.py (1)

16-16: LGTM!

The API consolidation to drop_mcore_language_model_layers and removal of the deprecated GPT-specific variant simplifies the public interface while supporting both GPT and Mamba models.

Also applies to: 100-100

tests/gpu/torch/nas/plugins/test_megatron_gpt_dynamic_modules.py (2)

51-53: LGTM!

The added imports support the new dynamic-space conversion workflow.


182-182: LGTM!

The test correctly adopts the new dynamic-space conversion workflow:

  • Uses _convert_model_to_dynamic_space for conversion
  • Calls mtn.utils.sort_parameters utility function
  • Uses dynamic_space methods for querying hparams and exporting

Also applies to: 192-193, 196-197, 203-203

tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py (1)

48-48: LGTM!

The test adopts the new dynamic-space conversion workflow with _convert_model_to_dynamic_space and uses dynamic_space.named_hparams to query hyperparameters.

Also applies to: 166-166, 180-180

@codecov
Copy link

codecov bot commented Oct 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.79%. Comparing base (340eb7a) to head (ce1c306).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #405      +/-   ##
==========================================
- Coverage   73.79%   73.79%   -0.01%     
==========================================
  Files         171      171              
  Lines       17591    17591              
==========================================
- Hits        12982    12981       -1     
- Misses       4609     4610       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@realAsma realAsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!!

@kevalmorabia97 kevalmorabia97 merged commit e09d36c into main Oct 8, 2025
27 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kmorabia/minitron-without-tracing branch October 8, 2025 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants