Skip to content

Commit 5b9c930

Browse files
update doc
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
1 parent 8e4d09d commit 5b9c930

File tree

1 file changed

+27
-8
lines changed

1 file changed

+27
-8
lines changed

docs/training/pruning.md

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,50 @@
11
# Pruning
22

3-
Pruning reduces model size by removing redundant parameters (e.g., shrinking hidden dimensions or layers) while preserving accuracy. In Megatron Bridge, pruning is provided by NVIDIA Model Optimizer (ModelOpt) using the Minitron algorithm for GPT and Mamba-based models loaded from HuggingFace.
3+
Pruning reduces model size by removing redundant parameters (e.g., shrinking hidden dimensions or layers) while preserving accuracy. In Megatron Bridge, pruning is provided by [NVIDIA Model Optimizer (ModelOpt)](https://github.com/NVIDIA/Model-Optimizer) using the Minitron algorithm for GPT and Mamba-based models loaded from HuggingFace.
44

55
## Pre-requisites
66

7-
Running the pruning example requires Megatron-Bridge and Model-Optimizer dependencies. We recommend using the NeMo container (e.g., `nvcr.io/nvidia/nemo:26.02`). To use the latest ModelOpt scripts, mount your Model-Optimizer repo at `/opt/Megatron-Bridge/3rdparty/Model-Optimizer` or pull the latest changes inside the container (`cd /opt/Megatron-Bridge/3rdparty/Model-Optimizer && git checkout main && git pull`).
7+
Running the pruning example requires Megatron-Bridge and Model-Optimizer dependencies. We recommend using the NeMo container (e.g., `nvcr.io/nvidia/nemo:26.02`). To use the latest ModelOpt scripts, mount your Model-Optimizer repo to the container.
8+
9+
```bash
10+
export MODELOPT_DIR=${PWD}/Model-Optimizer # or set to your local Model-Optimizer repository path if you have cloned it
11+
if [ ! -d "${MODELOPT_DIR}" ]; then
12+
git clone https://github.com/NVIDIA/Model-Optimizer.git ${MODELOPT_DIR}
13+
fi
14+
15+
export DOCKER_IMAGE=nvcr.io/nvidia/nemo:26.02
16+
docker run \
17+
--gpus all \
18+
--shm-size=20g \
19+
--net=host \
20+
--ulimit memlock=-1 \
21+
--rm -it \
22+
-v ${MODELOPT_DIR}:/opt/Model-Optimizer \
23+
-v ${MODELOPT_DIR}/modelopt:/opt/venv/lib/python3.12/site-packages/modelopt \
24+
-w /opt/Model-Optimizer/examples/megatron_bridge \
25+
${DOCKER_IMAGE} bash
26+
```
827

928
## Usage
1029

11-
### Prune to a target parameter count (NAS)
30+
### Prune to a target parameter count (using Neural Architecture Search)
1231

1332
Example: prune Qwen3-8B to 6B on 2 GPUs (Pipeline Parallelism = 2), skipping pruning of `num_attention_heads`. Defaults: 1024 samples from [nemotron-post-training-dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2) for calibration, at most 20% depth (`num_layers`) and 40% width per prunable hyperparameter (`hidden_size`, `ffn_hidden_size`, ...), top-10 candidates evaluated for MMLU (5% sampled data) to select the best model.
1433

1534
```bash
16-
torchrun --nproc_per_node 2 /opt/Megatron-Bridge/3rdparty/Model-Optimizer/examples/megatron_bridge/prune_minitron.py \
35+
torchrun --nproc_per_node 2 prune_minitron.py \
1736
--hf_model_name_or_path Qwen/Qwen3-8B \
1837
--prune_target_params 6e9 \
1938
--hparams_to_skip num_attention_heads \
2039
--output_hf_path /tmp/Qwen3-8B-Pruned-6B
2140
```
2241

23-
### Prune to a specific architecture (manual config)
42+
### Prune to a specific architecture (using manual configuration)
2443

2544
Example: prune Qwen3-8B to a fixed architecture. Defaults: 1024 samples from [nemotron-post-training-dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2) for calibration.
2645

2746
```bash
28-
torchrun --nproc_per_node 2 /opt/Megatron-Bridge/3rdparty/Model-Optimizer/examples/megatron_bridge/prune_minitron.py \
47+
torchrun --nproc_per_node 2 prune_minitron.py \
2948
--hf_model_name_or_path Qwen/Qwen3-8B \
3049
--prune_export_config '{"hidden_size": 3584, "ffn_hidden_size": 9216}' \
3150
--output_hf_path /tmp/Qwen3-8B-Pruned-6B-manual
@@ -34,7 +53,7 @@ torchrun --nproc_per_node 2 /opt/Megatron-Bridge/3rdparty/Model-Optimizer/exampl
3453
To see the full list of options for advanced configurations, run:
3554

3655
```bash
37-
torchrun --nproc_per_node 1 /opt/Megatron-Bridge/3rdparty/Model-Optimizer/examples/megatron_bridge/prune_minitron.py --help
56+
torchrun --nproc_per_node 1 prune_minitron.py --help
3857
```
3958

4059
### Uneven pipeline parallelism
@@ -47,4 +66,4 @@ For more details, see the [ModelOpt pruning README](https://github.com/NVIDIA/Mo
4766

4867
## Next steps: Knowledge Distillation
4968

50-
Knowledge Distillation is required to recover the performance of the pruned model. See the [Knowledge Distillation](distillation.md) guide for more details.
69+
Knowledge Distillation is required to recover the performance of the pruned model. See the [Knowledge Distillation](distillation.md) guide for more details.

0 commit comments

Comments
 (0)