Add Megatron-LM pruning example link

kevalmorabia97 · kevalmorabia97 · commit 35ea07987f5f · 2025-09-18T22:31:54.000-07:00
Signed-off-by: Keval Morabia &lt;28916987+kevalmorabia97@users.noreply.github.com&gt;
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -16,6 +16,7 @@ Model Optimizer Changelog (Linux)
 - ``high_precision_dtype`` default to fp16 in ONNX quantization, i.e. quantized output model weights are now FP16 by default.
 - Upgrade TensorRT-LLM dependency to 1.1.0rc2.
 - Support Phi-4-multimodal and Qwen2.5-VL quantized HF checkpoint export in ``examples/vlm_ptq``.
+- Add Minitron pruning example for Megatron-LM framework.
 
 0.35 (2025-09-04)
 ^^^^^^^^^^^^^^^^^
diff --git a/examples/pruning/README.md b/examples/pruning/README.md
@@ -91,30 +91,30 @@ mtp.prune(
 
 ## Examples
 
-### Minitron Pruning for NVIDIA NeMo / Megatron-LM LLMs (e.g. Llama 3)
+### Minitron Pruning for Megatron-LM / NeMo Framework LLMs (e.g. Llama 3.1, Nemotron Nano)
 
-Checkout the Minitron pruning example in the [NVIDIA NeMo repository](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html) which showcases the usage of the powerful Minitron pruning algorithm developed by NVIDIA Research for pruning LLMs like Llama 3.1 8B, Qwen 3 8B, Mistral NeMo 12B, etc.
+Checkout the Minitron pruning example in the [Megatron-LM repository](https://github.com/NVIDIA/Megatron-LM/tree/main/examples/post_training/modelopt) and [NeMo repository](https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/pruning/pruning.html) which showcases the usage of the powerful Minitron pruning algorithm developed by NVIDIA Research for pruning LLMs like Llama 3.1 8B, Qwen 3 8B, Nemotron Nano 12B v2, etc.
 
-You can also look at the tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.
+You can also look at the NeMo tutorial notebooks [here](https://github.com/NVIDIA-NeMo/NeMo/tree/main/tutorials/llm/llama/pruning-distillation) which showcase the usage of Minitron pruning followed by distillation for Llama 3.1 8B step-by-step in NeMo framework. Hugging Face models can also be converted to NeMo format and used subsequently as shown in the tutorial.
 
 Some of the models pruned using Minitron method followed by distillation and post-training are:
 
 - [Minitron Collection on Hugging Face](https://huggingface.co/collections/nvidia/minitron-669ac727dc9c86e6ab7f0f3e)
 - [NVIDIA-Nemotron-Nano-9B-v2](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2)
 
-### GradNAS Pruning for HuggingFace Language Models (e.g. BERT)
-
-Checkout the BERT pruning example in [chained_optimizations](../chained_optimizations/README.md) directory
-which showcases the usage of GradNAS for pruning BERT model for Question Answering followed by fine-tuning
-with distillation and quantization. The example also demonstrates how to save and restore pruned models.
-
 ### FastNAS Pruning for PyTorch Computer Vision Models
 
 Checkout the FastNAS pruning interactive notebook [cifar_resnet](./cifar_resnet.ipynb) in this directory
 which showcases the usage of FastNAS for pruning a ResNet 20 model for the CIFAR-10 dataset. The notebook
 also how to profiling the model to understand the search space of possible pruning options and demonstrates
 the usage saving and restoring pruned models.
 
+### GradNAS Pruning for HuggingFace Language Models (e.g. BERT)
+
+Checkout the BERT pruning example in [chained_optimizations](../chained_optimizations/README.md) directory
+which showcases the usage of GradNAS for pruning BERT model for Question Answering followed by fine-tuning
+with distillation and quantization. The example also demonstrates how to save and restore pruned models.
+
 ## Resources
 
 - 📅 [Roadmap](https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues/146)
diff --git a/modelopt/torch/prune/plugins/mcore_minitron.py b/modelopt/torch/prune/plugins/mcore_minitron.py
@@ -24,8 +24,6 @@
 Actual dynamic module implementations are at :mod:`modelopt.torch.nas.plugins.megatron`.
 """
 
-from warnings import warn
-
 import torch
 from pydantic import create_model
 
@@ -209,22 +207,3 @@ def config_class(self) -> type[ModeloptBaseConfig]:
     def search_algorithm(self) -> type[BaseSearcher]:
         """Specifies the search algorithm to use for this mode (if any)."""
         return MCoreMinitronSearcher
-
-
-@NASModeRegistry.register_mode
-@PruneModeRegistry.register_mode
-class MCoreGPTMinitronModeDescriptor(MCoreMinitronModeDescriptor):
-    """[Deprecated] Class to describe the ``"mcore_gpt_minitron"`` mode.
-
-    The properties of this mode can be inspected via the source code.
-    """
-
-    @property
-    def name(self) -> str:
-        """Returns the value (str representation) of the mode."""
-        warn(
-            "`mcore_gpt_minitron` mode is deprecated will be removed in a later release. "
-            "Please use `mcore_minitron` instead.",
-            DeprecationWarning,
-        )
-        return "mcore_gpt_minitron"