Skip to content

Conversation

kevalmorabia97
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 commented Sep 19, 2025

What does this PR do?

Usage

Usage from https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt

PP=1 \
TARGET_NUM_LAYERS=24 \
HF_MODEL_CKPT=<pretrained_model_name_or_path> \
MLM_MODEL_SAVE=/tmp/Qwen3-8B-DPruned \
./prune.sh qwen/Qwen3-8B

Testing

  • Pruned Qwen3-0.6B in M-LM framework

Summary by CodeRabbit

  • Documentation

    • Added CHANGELOG entry for a Minitron pruning example targeting Megatron-LM.
    • Updated pruning examples to include Megatron-LM and NeMo LLMs (e.g., Llama 3.1, Nemotron Nano 12B v2), refreshed links, placeholders, and notebook references.
    • Reorganized GradNAS sections and clarified guidance for Hugging Face LMs (e.g., BERT).
    • Expanded Megatron-LM docs with pruning support matrix, options, examples, and container notes.
  • Refactor

    • Removed a deprecated Minitron mode alias; standard Minitron mode remains.

@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner September 19, 2025 05:34
Copy link

coderabbitai bot commented Sep 19, 2025

Walkthrough

Adds Megatron-LM / NeMo-targeted pruning documentation and examples, inserts a CHANGELOG entry for the Minitron pruning example, and removes an unused warnings import plus a deprecated MCoreGPTMinitronModeDescriptor export from the Minitron pruning plugin.

Changes

Cohort / File(s) Summary
Changelog docs
CHANGELOG.rst
Adds a New Features entry for the Minitron pruning example under the 0.37 section.
Pruning examples docs
examples/pruning/README.md, examples/megatron-lm/README.md
Reworks README content to reference Megatron-LM and NeMo frameworks; updates model names/placeholders (e.g., Llama 3 → Llama 3.1, Nemotron Nano 12B v2); adds a Pruning column/section with supported options and example invocations; reorganizes GradNAS/FastNAS sections; updates containerization and placeholder wording.
Pruning plugin cleanup
modelopt/torch/prune/plugins/mcore_minitron.py
Removes an unused warnings.warn import and deletes the deprecated exported class MCoreGPTMinitronModeDescriptor; keeps MCoreMinitronModeDescriptor and its config/search mappings unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my ears and nibble a log,
A deprecated hop left a tidy spot.
Docs now hum with Megatron and NeMo cheer,
Prune the weeds, the path is clear. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title is concise and directly describes the primary change—adding a Megatron-LM pruning example link and related usage documentation—so it accurately summarizes the main documentation update for reviewers.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kmorabia/pruning-doc-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
CHANGELOG.rst (1)

19-19: Changelog phrasing: this is a docs/examples link, not a product feature

Consider marking this as a docs/examples update and be explicit that it adds a link, to avoid implying a binary change. Also confirm this belongs under 0.37 (2025-09-xx) and not 0.35 as mentioned elsewhere.

Proposed edit:

-- Add Minitron pruning example for Megatron-LM framework.
+- Docs: Add link to Minitron pruning example in Megatron‑LM.
examples/pruning/README.md (2)

94-99: Deep-link to Megatron‑LM pruning section and add quick-start one‑liner

Align with the PR objective by deep-linking directly to the Pruning section and adding the usage one‑liner shown in the PR description.

Proposed diff:

-### Minitron Pruning for Megatron-LM / NeMo Framework LLMs (e.g. Llama 3.1, Nemotron Nano)
+### Minitron Pruning for Megatron‑LM / NeMo Framework LLMs (e.g., Llama 3.1, Nemotron Nano)

-Checkout the Minitron pruning example in the [Megatron-LM repository](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt) and [NeMo repository](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html) which showcases the usage of the powerful Minitron pruning algorithm developed by NVIDIA Research for pruning LLMs like Llama 3.1 8B, Qwen 3 8B, Nemotron Nano 12B v2, etc.
+Check out the Minitron pruning example in the [Megatron‑LM repository](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt#-pruning) and the [NeMo docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html), which showcase the Minitron pruning algorithm for LLMs like Llama 3.1 8B, Qwen3 8B, Nemotron Nano 12B v2, etc.
+
+Quick start (Megatron‑LM example):
+```bash
+PP=1 TARGET_NUM_LAYERS=24 \
+HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+MLM_MODEL_SAVE=/tmp/Qwen3-8B-DPruned \
+./prune.sh qwen/Qwen3-8B
+```

-You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.
+You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation), which showcase Minitron pruning followed by distillation for Llama 3.1 8B step by step in the NeMo framework. Hugging Face models can be converted to NeMo format and used as shown in the tutorial.

Note: please verify the exact anchor slug in the Megatron‑LM README (the “-pruning” fragment).


112-117: Nit: “Hugging Face” spelling and minor copy tweak

Use the standard “Hugging Face” spelling and add an article.

-### GradNAS Pruning for HuggingFace Language Models (e.g. BERT)
+### GradNAS Pruning for Hugging Face Language Models (e.g., BERT)
@@
-which showcases the usage of GradNAS for pruning BERT model for Question Answering followed by fine-tuning
+which showcases the usage of GradNAS for pruning a BERT model for Question Answering, followed by fine-tuning
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c36abe and 35ea079.

📒 Files selected for processing (3)
  • CHANGELOG.rst (1 hunks)
  • examples/pruning/README.md (1 hunks)
  • modelopt/torch/prune/plugins/mcore_minitron.py (0 hunks)
💤 Files with no reviewable changes (1)
  • modelopt/torch/prune/plugins/mcore_minitron.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality

Copy link

codecov bot commented Sep 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.84%. Comparing base (4c36abe) to head (e43f5fb).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #344   +/-   ##
=======================================
  Coverage   73.83%   73.84%           
=======================================
  Files         172      172           
  Lines       17453    17453           
=======================================
+ Hits        12887    12888    +1     
+ Misses       4566     4565    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/pruning-doc-update branch from 35ea079 to 7887d42 Compare September 19, 2025 05:47
@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner September 19, 2025 05:47
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/pruning-doc-update branch from 7887d42 to c127b7f Compare September 19, 2025 05:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
examples/megatron-lm/README.md (9)

20-26: Clarify "WAR" and add a PP footnote in the matrix.

  • Expand the "WAR" acronym directly in-table or in a footnote below to avoid ambiguity.
  • Since pruning is “PP only,” add a brief footnote under the table clarifying that pruning currently requires Pipeline Parallel integration (the examples use PP=1).

53-71: Fix copy-paste runnable-ness: remove lone "" lines and quote the placeholder.

A standalone "" will error as a command when pasted. Also quote the placeholder to support local paths with spaces.

Apply this diff:

 ```sh
-\
     TP=1 \
-    HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+    HF_MODEL_CKPT="<pretrained_model_name_or_path>" \
     MLM_MODEL_SAVE=/tmp/Llama-3.2-1B-Instruct-FP8 \
     bash megatron-lm/examples/post_training/modelopt/quantize.sh meta-llama/Llama-3.2-1B-Instruct fp8
 
-\
     PP=1 \
-    HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+    HF_MODEL_CKPT="<pretrained_model_name_or_path>" \
     MLM_MODEL_LOAD=/tmp/Llama-3.2-1B-Instruct-FP8 \
     EXPORT_DIR=/tmp/Llama-3.2-1B-Instruct-Export \
     bash megatron-lm/examples/post_training/modelopt/export.sh meta-llama/Llama-3.2-1B-Instruct

---

`84-97`: **Repeat the runnable-ness fixes for EAGLE3 examples.**

Remove the leading "\" lines and quote the placeholder.

Apply this diff:

```diff
 ```sh
-\
     TP=1 \
-    HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+    HF_MODEL_CKPT="<pretrained_model_name_or_path>" \
     MLM_MODEL_SAVE=/tmp/Llama-3.2-1B-Eagle3 \
     bash megatron-lm/examples/post_training/modelopt/eagle3.sh meta-llama/Llama-3.2-1B-Instruct
 
-\
     PP=1 \
-    HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+    HF_MODEL_CKPT="<pretrained_model_name_or_path>" \
     MLM_MODEL_LOAD=/tmp/Llama-3.2-1B-Eagle3 \
     EXPORT_DIR=/tmp/Llama-3.2-1B-Eagle3-Export \
     bash megatron-lm/examples/post_training/modelopt/export.sh meta-llama/Llama-3.2-1B-Instruct

---

`99-101`: **Typo: “checkpoiint” → “checkpoint”.**

Apply this diff:

```diff
- Megatron-LM checkpoint (`/tmp/Llama-3.2-1B-Eagle3`) and a Hugging Face-Like exported checkpoiint
+ Megatron-LM checkpoint (`/tmp/Llama-3.2-1B-Eagle3`) and a Hugging Face-like exported checkpoint

107-111: Optional: link “Coming soon …” to a tracking issue.

Add a reference to a GH issue/milestone for discoverability.


111-122: Surround the list with blank lines (markdownlint MD032) and add upstream link.

  • Insert a blank line before the list to satisfy MD032.
  • Add a pointer to the upstream pruning doc for discoverability.

Apply this diff:

 ### ⭐ Pruning

-Pruning is supported for GPT and Mamba models in Pipeline Parallel mode. Available pruning options are:
+Pruning is supported for GPT and Mamba models in Pipeline Parallel mode. See upstream notes: https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt#-pruning
+
+Available pruning options are:
 - `TARGET_FFN_HIDDEN_SIZE`
 - `TARGET_HIDDEN_SIZE`
 - `TARGET_NUM_ATTENTION_HEADS`
 - `TARGET_NUM_QUERY_GROUPS`
 - `TARGET_MAMBA_NUM_HEADS`
 - `TARGET_MAMBA_HEAD_DIM`
 - `TARGET_NUM_LAYERS`
 - `LAYERS_TO_DROP` (comma separated, 1-indexed list of layer numbers to directly drop)
+

123-129: Pruning example looks good; consider adding a quick validation tip.

After prune.sh, suggest a short “load and run a sanity inference” snippet or pointer to export/eval to reduce confusion.


139-141: Quote the placeholder for robustness.

Apply this diff:

-    HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+    HF_MODEL_CKPT="<pretrained_model_name_or_path>" \

35-37: Optional: avoid floating “latest” container tag.

Recommend a versioned tag or digest for reproducibility.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35ea079 and 7887d42.

📒 Files selected for processing (4)
  • CHANGELOG.rst (1 hunks)
  • examples/megatron-lm/README.md (5 hunks)
  • examples/pruning/README.md (1 hunks)
  • modelopt/torch/prune/plugins/mcore_minitron.py (0 hunks)
💤 Files with no reviewable changes (1)
  • modelopt/torch/prune/plugins/mcore_minitron.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/pruning/README.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
examples/megatron-lm/README.md

114-114: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (1)
examples/megatron-lm/README.md (1)

79-79: Section heading change looks good.

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/pruning-doc-update branch from c127b7f to e43f5fb Compare September 19, 2025 06:51
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
examples/pruning/README.md (2)

98-98: Fix typo in "framework" and verify model compatibility.

There's an inconsistency in capitalization and a minor phrasing issue.

-You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.
+You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo Framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.

30-30: Fix typo: "requisred" should be "required".

-For GradNAS pruning for Hugging Face BERT / GPT-J, no additional dependencies are requisred.
+For GradNAS pruning for Hugging Face BERT / GPT-J, no additional dependencies are required.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c127b7f and e43f5fb.

📒 Files selected for processing (4)
  • CHANGELOG.rst (1 hunks)
  • examples/megatron-lm/README.md (5 hunks)
  • examples/pruning/README.md (1 hunks)
  • modelopt/torch/prune/plugins/mcore_minitron.py (0 hunks)
💤 Files with no reviewable changes (1)
  • modelopt/torch/prune/plugins/mcore_minitron.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • CHANGELOG.rst
  • examples/megatron-lm/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (3)
examples/pruning/README.md (3)

94-94: LGTM! Updated heading accurately reflects expanded scope.

The heading update from "Minitron Pruning" to "Minitron Pruning for Megatron-LM / NeMo Framework LLMs" with specific model examples (Llama 3.1, Nemotron Nano) better captures the broader target ecosystem and provides concrete examples.


96-96: LGTM! Comprehensive documentation links for both frameworks.

The updated content properly references both Megatron-LM and NeMo frameworks with appropriate links, and uses current model examples. Based on my search, Nemotron Nano 12B v2 is indeed a legitimate NVIDIA model that showcases the hybrid Mamba-Transformer architecture.


112-117: LGTM! Well-structured reorganization of GradNAS section.

The reorganization into a dedicated subsection improves document structure and readability. The content is accurate and the links are appropriate.

@kevalmorabia97 kevalmorabia97 merged commit c60baae into main Sep 19, 2025
30 of 32 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kmorabia/pruning-doc-update branch September 19, 2025 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants