Skip to content

Commit 35ea079

Browse files
Add Megatron-LM pruning example link
Signed-off-by: Keval Morabia <[email protected]>
1 parent 4c36abe commit 35ea079

File tree

3 files changed

+10
-30
lines changed

3 files changed

+10
-30
lines changed

CHANGELOG.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Model Optimizer Changelog (Linux)
1616
- ``high_precision_dtype`` default to fp16 in ONNX quantization, i.e. quantized output model weights are now FP16 by default.
1717
- Upgrade TensorRT-LLM dependency to 1.1.0rc2.
1818
- Support Phi-4-multimodal and Qwen2.5-VL quantized HF checkpoint export in ``examples/vlm_ptq``.
19+
- Add Minitron pruning example for Megatron-LM framework.
1920

2021
0.35 (2025-09-04)
2122
^^^^^^^^^^^^^^^^^

examples/pruning/README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -91,30 +91,30 @@ mtp.prune(
9191
9292
## Examples
9393

94-
### Minitron Pruning for NVIDIA NeMo / Megatron-LM LLMs (e.g. Llama 3)
94+
### Minitron Pruning for Megatron-LM / NeMo Framework LLMs (e.g. Llama 3.1, Nemotron Nano)
9595

96-
Checkout the Minitron pruning example in the [NVIDIA NeMo repository](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html) which showcases the usage of the powerful Minitron pruning algorithm developed by NVIDIA Research for pruning LLMs like Llama 3.1 8B, Qwen 3 8B, Mistral NeMo 12B, etc.
96+
Checkout the Minitron pruning example in the [Megatron-LM repository](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt) and [NeMo repository](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html) which showcases the usage of the powerful Minitron pruning algorithm developed by NVIDIA Research for pruning LLMs like Llama 3.1 8B, Qwen 3 8B, Nemotron Nano 12B v2, etc.
9797

98-
You can also look at the tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.
98+
You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.
9999

100100
Some of the models pruned using Minitron method followed by distillation and post-training are:
101101

102102
- [Minitron Collection on Hugging Face](https://huggingface.co/collections/nvidia/minitron-669ac727dc9c86e6ab7f0f3e)
103103
- [NVIDIA-Nemotron-Nano-9B-v2](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2)
104104

105-
### GradNAS Pruning for HuggingFace Language Models (e.g. BERT)
106-
107-
Checkout the BERT pruning example in [chained_optimizations](../chained_optimizations/README.md) directory
108-
which showcases the usage of GradNAS for pruning BERT model for Question Answering followed by fine-tuning
109-
with distillation and quantization. The example also demonstrates how to save and restore pruned models.
110-
111105
### FastNAS Pruning for PyTorch Computer Vision Models
112106

113107
Checkout the FastNAS pruning interactive notebook [cifar_resnet](./cifar_resnet.ipynb) in this directory
114108
which showcases the usage of FastNAS for pruning a ResNet 20 model for the CIFAR-10 dataset. The notebook
115109
also how to profiling the model to understand the search space of possible pruning options and demonstrates
116110
the usage saving and restoring pruned models.
117111

112+
### GradNAS Pruning for HuggingFace Language Models (e.g. BERT)
113+
114+
Checkout the BERT pruning example in [chained_optimizations](../chained_optimizations/README.md) directory
115+
which showcases the usage of GradNAS for pruning BERT model for Question Answering followed by fine-tuning
116+
with distillation and quantization. The example also demonstrates how to save and restore pruned models.
117+
118118
## Resources
119119

120120
- 📅 [Roadmap](https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues/146)

modelopt/torch/prune/plugins/mcore_minitron.py

Lines changed: 0 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,6 @@
2424
Actual dynamic module implementations are at :mod:`modelopt.torch.nas.plugins.megatron`.
2525
"""
2626

27-
from warnings import warn
28-
2927
import torch
3028
from pydantic import create_model
3129

@@ -209,22 +207,3 @@ def config_class(self) -> type[ModeloptBaseConfig]:
209207
def search_algorithm(self) -> type[BaseSearcher]:
210208
"""Specifies the search algorithm to use for this mode (if any)."""
211209
return MCoreMinitronSearcher
212-
213-
214-
@NASModeRegistry.register_mode
215-
@PruneModeRegistry.register_mode
216-
class MCoreGPTMinitronModeDescriptor(MCoreMinitronModeDescriptor):
217-
"""[Deprecated] Class to describe the ``"mcore_gpt_minitron"`` mode.
218-
219-
The properties of this mode can be inspected via the source code.
220-
"""
221-
222-
@property
223-
def name(self) -> str:
224-
"""Returns the value (str representation) of the mode."""
225-
warn(
226-
"`mcore_gpt_minitron` mode is deprecated will be removed in a later release. "
227-
"Please use `mcore_minitron` instead.",
228-
DeprecationWarning,
229-
)
230-
return "mcore_gpt_minitron"

0 commit comments

Comments
 (0)