-
Notifications
You must be signed in to change notification settings - Fork 162
Add Megatron-LM pruning example link #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds Megatron-LM / NeMo-targeted pruning documentation and examples, inserts a CHANGELOG entry for the Minitron pruning example, and removes an unused warnings import plus a deprecated Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
CHANGELOG.rst (1)
19-19
: Changelog phrasing: this is a docs/examples link, not a product featureConsider marking this as a docs/examples update and be explicit that it adds a link, to avoid implying a binary change. Also confirm this belongs under 0.37 (2025-09-xx) and not 0.35 as mentioned elsewhere.
Proposed edit:
-- Add Minitron pruning example for Megatron-LM framework. +- Docs: Add link to Minitron pruning example in Megatron‑LM.examples/pruning/README.md (2)
94-99
: Deep-link to Megatron‑LM pruning section and add quick-start one‑linerAlign with the PR objective by deep-linking directly to the Pruning section and adding the usage one‑liner shown in the PR description.
Proposed diff:
-### Minitron Pruning for Megatron-LM / NeMo Framework LLMs (e.g. Llama 3.1, Nemotron Nano) +### Minitron Pruning for Megatron‑LM / NeMo Framework LLMs (e.g., Llama 3.1, Nemotron Nano) -Checkout the Minitron pruning example in the [Megatron-LM repository](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt) and [NeMo repository](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html) which showcases the usage of the powerful Minitron pruning algorithm developed by NVIDIA Research for pruning LLMs like Llama 3.1 8B, Qwen 3 8B, Nemotron Nano 12B v2, etc. +Check out the Minitron pruning example in the [Megatron‑LM repository](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt#-pruning) and the [NeMo docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html), which showcase the Minitron pruning algorithm for LLMs like Llama 3.1 8B, Qwen3 8B, Nemotron Nano 12B v2, etc. + +Quick start (Megatron‑LM example): +```bash +PP=1 TARGET_NUM_LAYERS=24 \ +HF_MODEL_CKPT=<pretrained_model_name_or_path> \ +MLM_MODEL_SAVE=/tmp/Qwen3-8B-DPruned \ +./prune.sh qwen/Qwen3-8B +``` -You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial. +You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation), which showcase Minitron pruning followed by distillation for Llama 3.1 8B step by step in the NeMo framework. Hugging Face models can be converted to NeMo format and used as shown in the tutorial.Note: please verify the exact anchor slug in the Megatron‑LM README (the “-pruning” fragment).
112-117
: Nit: “Hugging Face” spelling and minor copy tweakUse the standard “Hugging Face” spelling and add an article.
-### GradNAS Pruning for HuggingFace Language Models (e.g. BERT) +### GradNAS Pruning for Hugging Face Language Models (e.g., BERT) @@ -which showcases the usage of GradNAS for pruning BERT model for Question Answering followed by fine-tuning +which showcases the usage of GradNAS for pruning a BERT model for Question Answering, followed by fine-tuning
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
CHANGELOG.rst
(1 hunks)examples/pruning/README.md
(1 hunks)modelopt/torch/prune/plugins/mcore_minitron.py
(0 hunks)
💤 Files with no reviewable changes (1)
- modelopt/torch/prune/plugins/mcore_minitron.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #344 +/- ##
=======================================
Coverage 73.83% 73.84%
=======================================
Files 172 172
Lines 17453 17453
=======================================
+ Hits 12887 12888 +1
+ Misses 4566 4565 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
35ea079
to
7887d42
Compare
7887d42
to
c127b7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
examples/megatron-lm/README.md (9)
20-26
: Clarify "WAR" and add a PP footnote in the matrix.
- Expand the "WAR" acronym directly in-table or in a footnote below to avoid ambiguity.
- Since pruning is “PP only,” add a brief footnote under the table clarifying that pruning currently requires Pipeline Parallel integration (the examples use PP=1).
53-71
: Fix copy-paste runnable-ness: remove lone "" lines and quote the placeholder.A standalone "" will error as a command when pasted. Also quote the placeholder to support local paths with spaces.
Apply this diff:
```sh -\ TP=1 \ - HF_MODEL_CKPT=<pretrained_model_name_or_path> \ + HF_MODEL_CKPT="<pretrained_model_name_or_path>" \ MLM_MODEL_SAVE=/tmp/Llama-3.2-1B-Instruct-FP8 \ bash megatron-lm/examples/post_training/modelopt/quantize.sh meta-llama/Llama-3.2-1B-Instruct fp8 -\ PP=1 \ - HF_MODEL_CKPT=<pretrained_model_name_or_path> \ + HF_MODEL_CKPT="<pretrained_model_name_or_path>" \ MLM_MODEL_LOAD=/tmp/Llama-3.2-1B-Instruct-FP8 \ EXPORT_DIR=/tmp/Llama-3.2-1B-Instruct-Export \ bash megatron-lm/examples/post_training/modelopt/export.sh meta-llama/Llama-3.2-1B-Instruct--- `84-97`: **Repeat the runnable-ness fixes for EAGLE3 examples.** Remove the leading "\" lines and quote the placeholder. Apply this diff: ```diff ```sh -\ TP=1 \ - HF_MODEL_CKPT=<pretrained_model_name_or_path> \ + HF_MODEL_CKPT="<pretrained_model_name_or_path>" \ MLM_MODEL_SAVE=/tmp/Llama-3.2-1B-Eagle3 \ bash megatron-lm/examples/post_training/modelopt/eagle3.sh meta-llama/Llama-3.2-1B-Instruct -\ PP=1 \ - HF_MODEL_CKPT=<pretrained_model_name_or_path> \ + HF_MODEL_CKPT="<pretrained_model_name_or_path>" \ MLM_MODEL_LOAD=/tmp/Llama-3.2-1B-Eagle3 \ EXPORT_DIR=/tmp/Llama-3.2-1B-Eagle3-Export \ bash megatron-lm/examples/post_training/modelopt/export.sh meta-llama/Llama-3.2-1B-Instruct
--- `99-101`: **Typo: “checkpoiint” → “checkpoint”.** Apply this diff: ```diff - Megatron-LM checkpoint (`/tmp/Llama-3.2-1B-Eagle3`) and a Hugging Face-Like exported checkpoiint + Megatron-LM checkpoint (`/tmp/Llama-3.2-1B-Eagle3`) and a Hugging Face-like exported checkpoint
107-111
: Optional: link “Coming soon …” to a tracking issue.Add a reference to a GH issue/milestone for discoverability.
111-122
: Surround the list with blank lines (markdownlint MD032) and add upstream link.
- Insert a blank line before the list to satisfy MD032.
- Add a pointer to the upstream pruning doc for discoverability.
Apply this diff:
### ⭐ Pruning -Pruning is supported for GPT and Mamba models in Pipeline Parallel mode. Available pruning options are: +Pruning is supported for GPT and Mamba models in Pipeline Parallel mode. See upstream notes: https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt#-pruning + +Available pruning options are: - `TARGET_FFN_HIDDEN_SIZE` - `TARGET_HIDDEN_SIZE` - `TARGET_NUM_ATTENTION_HEADS` - `TARGET_NUM_QUERY_GROUPS` - `TARGET_MAMBA_NUM_HEADS` - `TARGET_MAMBA_HEAD_DIM` - `TARGET_NUM_LAYERS` - `LAYERS_TO_DROP` (comma separated, 1-indexed list of layer numbers to directly drop) +
123-129
: Pruning example looks good; consider adding a quick validation tip.After prune.sh, suggest a short “load and run a sanity inference” snippet or pointer to export/eval to reduce confusion.
139-141
: Quote the placeholder for robustness.Apply this diff:
- HF_MODEL_CKPT=<pretrained_model_name_or_path> \ + HF_MODEL_CKPT="<pretrained_model_name_or_path>" \
35-37
: Optional: avoid floating “latest” container tag.Recommend a versioned tag or digest for reproducibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CHANGELOG.rst
(1 hunks)examples/megatron-lm/README.md
(5 hunks)examples/pruning/README.md
(1 hunks)modelopt/torch/prune/plugins/mcore_minitron.py
(0 hunks)
💤 Files with no reviewable changes (1)
- modelopt/torch/prune/plugins/mcore_minitron.py
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/pruning/README.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
examples/megatron-lm/README.md
114-114: Lists should be surrounded by blank lines
(MD032, blanks-around-lists)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (1)
examples/megatron-lm/README.md (1)
79-79
: Section heading change looks good.
Signed-off-by: Keval Morabia <[email protected]>
c127b7f
to
e43f5fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
examples/pruning/README.md (2)
98-98
: Fix typo in "framework" and verify model compatibility.There's an inconsistency in capitalization and a minor phrasing issue.
-You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial. +You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo Framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.
30-30
: Fix typo: "requisred" should be "required".-For GradNAS pruning for Hugging Face BERT / GPT-J, no additional dependencies are requisred. +For GradNAS pruning for Hugging Face BERT / GPT-J, no additional dependencies are required.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CHANGELOG.rst
(1 hunks)examples/megatron-lm/README.md
(5 hunks)examples/pruning/README.md
(1 hunks)modelopt/torch/prune/plugins/mcore_minitron.py
(0 hunks)
💤 Files with no reviewable changes (1)
- modelopt/torch/prune/plugins/mcore_minitron.py
🚧 Files skipped from review as they are similar to previous changes (2)
- CHANGELOG.rst
- examples/megatron-lm/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (3)
examples/pruning/README.md (3)
94-94
: LGTM! Updated heading accurately reflects expanded scope.The heading update from "Minitron Pruning" to "Minitron Pruning for Megatron-LM / NeMo Framework LLMs" with specific model examples (Llama 3.1, Nemotron Nano) better captures the broader target ecosystem and provides concrete examples.
96-96
: LGTM! Comprehensive documentation links for both frameworks.The updated content properly references both Megatron-LM and NeMo frameworks with appropriate links, and uses current model examples. Based on my search, Nemotron Nano 12B v2 is indeed a legitimate NVIDIA model that showcases the hybrid Mamba-Transformer architecture.
112-117
: LGTM! Well-structured reorganization of GradNAS section.The reorganization into a dedicated subsection improves document structure and readability. The content is accurate and the links are appropriate.
What does this PR do?
Usage
Usage from https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt
Testing
Summary by CodeRabbit
Documentation
Refactor