update doc

kevalmorabia97 · kevalmorabia97 · commit 5b9c93038ad8 · 2026-02-06T11:30:13.000-08:00
Signed-off-by: Keval Morabia &lt;28916987+kevalmorabia97@users.noreply.github.com&gt;
diff --git a/docs/training/pruning.md b/docs/training/pruning.md
@@ -1,31 +1,50 @@
 # Pruning
 
-Pruning reduces model size by removing redundant parameters (e.g., shrinking hidden dimensions or layers) while preserving accuracy. In Megatron Bridge, pruning is provided by NVIDIA Model Optimizer (ModelOpt) using the Minitron algorithm for GPT and Mamba-based models loaded from HuggingFace.
+Pruning reduces model size by removing redundant parameters (e.g., shrinking hidden dimensions or layers) while preserving accuracy. In Megatron Bridge, pruning is provided by [NVIDIA Model Optimizer (ModelOpt)](https://github.com/NVIDIA/Model-Optimizer) using the Minitron algorithm for GPT and Mamba-based models loaded from HuggingFace.
 
 ## Pre-requisites
 
-Running the pruning example requires Megatron-Bridge and Model-Optimizer dependencies. We recommend using the NeMo container (e.g., `nvcr.io/nvidia/nemo:26.02`). To use the latest ModelOpt scripts, mount your Model-Optimizer repo at `/opt/Megatron-Bridge/3rdparty/Model-Optimizer` or pull the latest changes inside the container (`cd /opt/Megatron-Bridge/3rdparty/Model-Optimizer && git checkout main && git pull`).
+Running the pruning example requires Megatron-Bridge and Model-Optimizer dependencies. We recommend using the NeMo container (e.g., `nvcr.io/nvidia/nemo:26.02`). To use the latest ModelOpt scripts, mount your Model-Optimizer repo to the container.
+
+```bash
+export MODELOPT_DIR=${PWD}/Model-Optimizer # or set to your local Model-Optimizer repository path if you have cloned it
+if [ ! -d "${MODELOPT_DIR}" ]; then
+  git clone https://github.com/NVIDIA/Model-Optimizer.git ${MODELOPT_DIR}
+fi
+
+export DOCKER_IMAGE=nvcr.io/nvidia/nemo:26.02
+docker run \
+  --gpus all \
+  --shm-size=20g \
+  --net=host \
+  --ulimit memlock=-1 \
+  --rm -it \
+  -v ${MODELOPT_DIR}:/opt/Model-Optimizer \
+  -v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt \
+  -w /opt/Model-Optimizer/examples/megatron_bridge \
+  ${DOCKER_IMAGE} bash
+```
 
 ## Usage
 
-### Prune to a target parameter count (NAS)
+### Prune to a target parameter count (using Neural Architecture Search)
 
 Example: prune Qwen3-8B to 6B on 2 GPUs (Pipeline Parallelism = 2), skipping pruning of `num_attention_heads`. Defaults: 1024 samples from [nemotron-post-training-dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2) for calibration, at most 20% depth (`num_layers`) and 40% width per prunable hyperparameter (`hidden_size`, `ffn_hidden_size`, ...), top-10 candidates evaluated for MMLU (5% sampled data) to select the best model.
 
 ```bash
-torchrun --nproc_per_node 2 /opt/Megatron-Bridge/3rdparty/Model-Optimizer/examples/megatron_bridge/prune_minitron.py \
+torchrun --nproc_per_node 2 prune_minitron.py \
     --hf_model_name_or_path Qwen/Qwen3-8B \
     --prune_target_params 6e9 \
     --hparams_to_skip num_attention_heads \
     --output_hf_path /tmp/Qwen3-8B-Pruned-6B
 ```
 
-### Prune to a specific architecture (manual config)
+### Prune to a specific architecture (using manual configuration)
 
 Example: prune Qwen3-8B to a fixed architecture. Defaults: 1024 samples from [nemotron-post-training-dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2) for calibration.
 
 ```bash
-torchrun --nproc_per_node 2 /opt/Megatron-Bridge/3rdparty/Model-Optimizer/examples/megatron_bridge/prune_minitron.py \
+torchrun --nproc_per_node 2 prune_minitron.py \
     --hf_model_name_or_path Qwen/Qwen3-8B \
     --prune_export_config '{"hidden_size": 3584, "ffn_hidden_size": 9216}' \
     --output_hf_path /tmp/Qwen3-8B-Pruned-6B-manual
@@ -34,7 +53,7 @@ torchrun --nproc_per_node 2 /opt/Megatron-Bridge/3rdparty/Model-Optimizer/exampl
 To see the full list of options for advanced configurations, run:
 
 ```bash
-torchrun --nproc_per_node 1 /opt/Megatron-Bridge/3rdparty/Model-Optimizer/examples/megatron_bridge/prune_minitron.py --help
+torchrun --nproc_per_node 1 prune_minitron.py --help
 ```
 
 ### Uneven pipeline parallelism
@@ -47,4 +66,4 @@ For more details, see the [ModelOpt pruning README](https://github.com/NVIDIA/Mo
 
 ## Next steps: Knowledge Distillation
 
-Knowledge Distillation is required to recover the performance of the pruned model. See the [Knowledge Distillation](distillation.md) guide for more details.
+Knowledge Distillation is required to recover the performance of the pruned model. See the [Knowledge Distillation](distillation.md) guide for more details.