Add Megatron-LM pruning example link #344

kevalmorabia97 · 2025-09-19T05:34:23Z

What does this PR do?

Add link to https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt#-pruning

Usage

Usage from https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt

PP=1 \
TARGET_NUM_LAYERS=24 \
HF_MODEL_CKPT=<pretrained_model_name_or_path> \
MLM_MODEL_SAVE=/tmp/Qwen3-8B-DPruned \
./prune.sh qwen/Qwen3-8B

Testing

Pruned Qwen3-0.6B in M-LM framework

Summary by CodeRabbit

Documentation
- Added CHANGELOG entry for a Minitron pruning example targeting Megatron-LM.
- Updated pruning examples to include Megatron-LM and NeMo LLMs (e.g., Llama 3.1, Nemotron Nano 12B v2), refreshed links, placeholders, and notebook references.
- Reorganized GradNAS sections and clarified guidance for Hugging Face LMs (e.g., BERT).
- Expanded Megatron-LM docs with pruning support matrix, options, examples, and container notes.
Refactor
- Removed a deprecated Minitron mode alias; standard Minitron mode remains.

coderabbitai · 2025-09-19T05:34:34Z

Walkthrough

Adds Megatron-LM / NeMo-targeted pruning documentation and examples, inserts a CHANGELOG entry for the Minitron pruning example, and removes an unused warnings import plus a deprecated MCoreGPTMinitronModeDescriptor export from the Minitron pruning plugin.

Changes

Cohort / File(s)	Summary
Changelog docs `CHANGELOG.rst`	Adds a New Features entry for the Minitron pruning example under the 0.37 section.
Pruning examples docs `examples/pruning/README.md`, `examples/megatron-lm/README.md`	Reworks README content to reference Megatron-LM and NeMo frameworks; updates model names/placeholders (e.g., Llama 3 → Llama 3.1, Nemotron Nano 12B v2); adds a Pruning column/section with supported options and example invocations; reorganizes GradNAS/FastNAS sections; updates containerization and placeholder wording.
Pruning plugin cleanup `modelopt/torch/prune/plugins/mcore_minitron.py`	Removes an unused `warnings.warn` import and deletes the deprecated exported class `MCoreGPTMinitronModeDescriptor`; keeps `MCoreMinitronModeDescriptor` and its config/search mappings unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my ears and nibble a log,
A deprecated hop left a tidy spot.
Docs now hum with Megatron and NeMo cheer,
Prune the weeds, the path is clear. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title is concise and directly describes the primary change—adding a Megatron-LM pruning example link and related usage documentation—so it accurately summarizes the main documentation update for reviewers.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kmorabia/pruning-doc-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

CHANGELOG.rst (1)

19-19: Changelog phrasing: this is a docs/examples link, not a product feature

Consider marking this as a docs/examples update and be explicit that it adds a link, to avoid implying a binary change. Also confirm this belongs under 0.37 (2025-09-xx) and not 0.35 as mentioned elsewhere.

Proposed edit:
-- Add Minitron pruning example for Megatron-LM framework.
+- Docs: Add link to Minitron pruning example in Megatron‑LM.

examples/pruning/README.md (2)

94-99: Deep-link to Megatron‑LM pruning section and add quick-start one‑liner

Align with the PR objective by deep-linking directly to the Pruning section and adding the usage one‑liner shown in the PR description.

Proposed diff:

-### Minitron Pruning for Megatron-LM / NeMo Framework LLMs (e.g. Llama 3.1, Nemotron Nano)
+### Minitron Pruning for Megatron‑LM / NeMo Framework LLMs (e.g., Llama 3.1, Nemotron Nano)

-Checkout the Minitron pruning example in the [Megatron-LM repository](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt) and [NeMo repository](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html) which showcases the usage of the powerful Minitron pruning algorithm developed by NVIDIA Research for pruning LLMs like Llama 3.1 8B, Qwen 3 8B, Nemotron Nano 12B v2, etc.
+Check out the Minitron pruning example in the [Megatron‑LM repository](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt#-pruning) and the [NeMo docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html), which showcase the Minitron pruning algorithm for LLMs like Llama 3.1 8B, Qwen3 8B, Nemotron Nano 12B v2, etc.
+
+Quick start (Megatron‑LM example):
+```bash
+PP=1 TARGET_NUM_LAYERS=24 \
+HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+MLM_MODEL_SAVE=/tmp/Qwen3-8B-DPruned \
+./prune.sh qwen/Qwen3-8B
+```

-You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.
+You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation), which showcase Minitron pruning followed by distillation for Llama 3.1 8B step by step in the NeMo framework. Hugging Face models can be converted to NeMo format and used as shown in the tutorial.

Note: please verify the exact anchor slug in the Megatron‑LM README (the “-pruning” fragment).

112-117: Nit: “Hugging Face” spelling and minor copy tweak

Use the standard “Hugging Face” spelling and add an article.

-### GradNAS Pruning for HuggingFace Language Models (e.g. BERT)
+### GradNAS Pruning for Hugging Face Language Models (e.g., BERT)
@@
-which showcases the usage of GradNAS for pruning BERT model for Question Answering followed by fine-tuning
+which showcases the usage of GradNAS for pruning a BERT model for Question Answering, followed by fine-tuning

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c36abe and 35ea079.

📒 Files selected for processing (3)

CHANGELOG.rst (1 hunks)
examples/pruning/README.md (1 hunks)
modelopt/torch/prune/plugins/mcore_minitron.py (0 hunks)

💤 Files with no reviewable changes (1)

modelopt/torch/prune/plugins/mcore_minitron.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

codecov · 2025-09-19T05:47:10Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.84%. Comparing base (4c36abe) to head (e43f5fb).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #344   +/-   ##
=======================================
  Coverage   73.83%   73.84%           
=======================================
  Files         172      172           
  Lines       17453    17453           
=======================================
+ Hits        12887    12888    +1     
+ Misses       4566     4565    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (9)

examples/megatron-lm/README.md (9)
20-26: Clarify "WAR" and add a PP footnote in the matrix.

Expand the "WAR" acronym directly in-table or in a footnote below to avoid ambiguity.

Since pruning is “PP only,” add a brief footnote under the table clarifying that pruning currently requires Pipeline Parallel integration (the examples use PP=1).

53-71: Fix copy-paste runnable-ness: remove lone "" lines and quote the placeholder.

A standalone "" will error as a command when pasted. Also quote the placeholder to support local paths with spaces.

Apply this diff:
 ```sh
-\
     TP=1 \
-    HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+    HF_MODEL_CKPT="<pretrained_model_name_or_path>" \
     MLM_MODEL_SAVE=/tmp/Llama-3.2-1B-Instruct-FP8 \
     bash megatron-lm/examples/post_training/modelopt/quantize.sh meta-llama/Llama-3.2-1B-Instruct fp8
 
-\
     PP=1 \
-    HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+    HF_MODEL_CKPT="<pretrained_model_name_or_path>" \
     MLM_MODEL_LOAD=/tmp/Llama-3.2-1B-Instruct-FP8 \
     EXPORT_DIR=/tmp/Llama-3.2-1B-Instruct-Export \
     bash megatron-lm/examples/post_training/modelopt/export.sh meta-llama/Llama-3.2-1B-Instruct
---

`84-97`: **Repeat the runnable-ness fixes for EAGLE3 examples.**

Remove the leading "\" lines and quote the placeholder.

Apply this diff:

```diff
 ```sh
-\
     TP=1 \
-    HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+    HF_MODEL_CKPT="<pretrained_model_name_or_path>" \
     MLM_MODEL_SAVE=/tmp/Llama-3.2-1B-Eagle3 \
     bash megatron-lm/examples/post_training/modelopt/eagle3.sh meta-llama/Llama-3.2-1B-Instruct
 
-\
     PP=1 \
-    HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+    HF_MODEL_CKPT="<pretrained_model_name_or_path>" \
     MLM_MODEL_LOAD=/tmp/Llama-3.2-1B-Eagle3 \
     EXPORT_DIR=/tmp/Llama-3.2-1B-Eagle3-Export \
     bash megatron-lm/examples/post_training/modelopt/export.sh meta-llama/Llama-3.2-1B-Instruct
---

`99-101`: **Typo: “checkpoiint” → “checkpoint”.**

Apply this diff:

```diff
- Megatron-LM checkpoint (`/tmp/Llama-3.2-1B-Eagle3`) and a Hugging Face-Like exported checkpoiint
+ Megatron-LM checkpoint (`/tmp/Llama-3.2-1B-Eagle3`) and a Hugging Face-like exported checkpoint
107-111: Optional: link “Coming soon …” to a tracking issue.

Add a reference to a GH issue/milestone for discoverability.

111-122: Surround the list with blank lines (markdownlint MD032) and add upstream link.

Insert a blank line before the list to satisfy MD032.

Add a pointer to the upstream pruning doc for discoverability.

Apply this diff:
 ### ⭐ Pruning

-Pruning is supported for GPT and Mamba models in Pipeline Parallel mode. Available pruning options are:
+Pruning is supported for GPT and Mamba models in Pipeline Parallel mode. See upstream notes: https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt#-pruning
+
+Available pruning options are:
 - `TARGET_FFN_HIDDEN_SIZE`
 - `TARGET_HIDDEN_SIZE`
 - `TARGET_NUM_ATTENTION_HEADS`
 - `TARGET_NUM_QUERY_GROUPS`
 - `TARGET_MAMBA_NUM_HEADS`
 - `TARGET_MAMBA_HEAD_DIM`
 - `TARGET_NUM_LAYERS`
 - `LAYERS_TO_DROP` (comma separated, 1-indexed list of layer numbers to directly drop)
+
123-129: Pruning example looks good; consider adding a quick validation tip.

After prune.sh, suggest a short “load and run a sanity inference” snippet or pointer to export/eval to reduce confusion.

139-141: Quote the placeholder for robustness.

Apply this diff:
-    HF_MODEL_CKPT=<pretrained_model_name_or_path> \
+    HF_MODEL_CKPT="<pretrained_model_name_or_path>" \
35-37: Optional: avoid floating “latest” container tag.

Recommend a versioned tag or digest for reproducibility.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35ea079 and 7887d42.

📒 Files selected for processing (4)

CHANGELOG.rst (1 hunks)
examples/megatron-lm/README.md (5 hunks)
examples/pruning/README.md (1 hunks)
modelopt/torch/prune/plugins/mcore_minitron.py (0 hunks)

💤 Files with no reviewable changes (1)

modelopt/torch/prune/plugins/mcore_minitron.py

✅ Files skipped from review due to trivial changes (1)

CHANGELOG.rst

🚧 Files skipped from review as they are similar to previous changes (1)

examples/pruning/README.md

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

examples/megatron-lm/README.md

114-114: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (1)

examples/megatron-lm/README.md (1)

79-79: Section heading change looks good.

Signed-off-by: Keval Morabia <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

examples/pruning/README.md (2)

98-98: Fix typo in "framework" and verify model compatibility.

There's an inconsistency in capitalization and a minor phrasing issue.

-You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.
+You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo Framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.

30-30: Fix typo: "requisred" should be "required".

-For GradNAS pruning for Hugging Face BERT / GPT-J, no additional dependencies are requisred.
+For GradNAS pruning for Hugging Face BERT / GPT-J, no additional dependencies are required.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c127b7f and e43f5fb.

📒 Files selected for processing (4)

CHANGELOG.rst (1 hunks)
examples/megatron-lm/README.md (5 hunks)
examples/pruning/README.md (1 hunks)
modelopt/torch/prune/plugins/mcore_minitron.py (0 hunks)

💤 Files with no reviewable changes (1)

modelopt/torch/prune/plugins/mcore_minitron.py

🚧 Files skipped from review as they are similar to previous changes (2)

CHANGELOG.rst
examples/megatron-lm/README.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: code-quality
GitHub Check: build-docs

🔇 Additional comments (3)

examples/pruning/README.md (3)

94-94: LGTM! Updated heading accurately reflects expanded scope.

The heading update from "Minitron Pruning" to "Minitron Pruning for Megatron-LM / NeMo Framework LLMs" with specific model examples (Llama 3.1, Nemotron Nano) better captures the broader target ecosystem and provides concrete examples.

96-96: LGTM! Comprehensive documentation links for both frameworks.

The updated content properly references both Megatron-LM and NeMo frameworks with appropriate links, and uses current model examples. Based on my search, Nemotron Nano 12B v2 is indeed a legitimate NVIDIA model that showcases the hybrid Mamba-Transformer architecture.

112-117: LGTM! Well-structured reorganization of GradNAS section.

The reorganization into a dedicated subsection improves document structure and readability. The content is accurate and the links are appropriate.

kevalmorabia97 requested a review from a team as a code owner September 19, 2025 05:34

kevalmorabia97 requested a review from AAnoosheh September 19, 2025 05:34

kevalmorabia97 requested review from ChenhanYu and jenchen13 September 19, 2025 05:34

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

kevalmorabia97 force-pushed the kmorabia/pruning-doc-update branch from 35ea079 to 7887d42 Compare September 19, 2025 05:47

kevalmorabia97 requested a review from a team as a code owner September 19, 2025 05:47

kevalmorabia97 force-pushed the kmorabia/pruning-doc-update branch from 7887d42 to c127b7f Compare September 19, 2025 05:50

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

Add Megatron-LM pruning example link

e43f5fb

Signed-off-by: Keval Morabia <[email protected]>

kevalmorabia97 force-pushed the kmorabia/pruning-doc-update branch from c127b7f to e43f5fb Compare September 19, 2025 06:51

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

ChenhanYu approved these changes Sep 19, 2025

View reviewed changes

kevalmorabia97 merged commit c60baae into main Sep 19, 2025
30 of 32 checks passed

kevalmorabia97 deleted the kmorabia/pruning-doc-update branch September 19, 2025 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Megatron-LM pruning example link #344

Add Megatron-LM pruning example link #344

Uh oh!

kevalmorabia97 commented Sep 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Add Megatron-LM pruning example link #344

Add Megatron-LM pruning example link #344

Uh oh!

Conversation

kevalmorabia97 commented Sep 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kevalmorabia97 commented Sep 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 19, 2025 •

edited

Loading

codecov bot commented Sep 19, 2025 •

edited

Loading