Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion examples/megatron-lm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ Coming soon ...

### ⭐ Pruning

Checkout pruning [getting started section](../pruning/README.md#getting-started) and [guidelines](../pruning/README.md#pruning-guidelines) for configuring pruning parameters in the pruning README.

Pruning is supported for GPT and Mamba models in Pipeline Parallel mode. Available pruning options are:

- `TARGET_FFN_HIDDEN_SIZE`
Expand All @@ -121,14 +123,20 @@ Pruning is supported for GPT and Mamba models in Pipeline Parallel mode. Availab
- `TARGET_NUM_LAYERS`
- `LAYERS_TO_DROP` (comma separated, 1-indexed list of layer numbers to directly drop)

Example for depth pruning Qwen3-8B from 36 to 24 layers:

```sh
PP=1 \
TARGET_NUM_LAYERS=24 \
HF_MODEL_CKPT=<pretrained_model_name_or_path> \
MLM_MODEL_SAVE=/tmp/Qwen3-8B-DPruned \
MLM_MODEL_SAVE=Qwen3-8B-Pruned \
bash megatron-lm/examples/post_training/modelopt/prune.sh qwen/Qwen3-8B
```

> [!TIP]
> If number of layers in the model is not divisible by pipeline parallel size (PP), you can configure uneven
> PP by setting `MLM_EXTRA_ARGS="--decoder-first-pipeline-num-layers <X> --decoder-last-pipeline-num-layers <Y>"`

## Learn More About Configuration

For simplicity, we use `shell` scripts and variables as arguments. Each script has at least 1 positional
Expand Down
4 changes: 4 additions & 0 deletions modelopt/torch/prune/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@

# nas is a required - so let's check if it's available
import modelopt.torch.nas
from modelopt.torch.utils import import_plugin

from . import fastnas, gradnas, plugins
from .pruning import *

with import_plugin("mcore_minitron", verbose=False):
from .plugins import mcore_minitron
9 changes: 8 additions & 1 deletion modelopt/torch/prune/plugins/mcore_minitron.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
HAS_MAMBA,
_DynamicMCoreLanguageModel,
SUPPORTED_MODELS,
drop_mcore_language_model_layers,
)
# isort: on

Expand Down Expand Up @@ -70,7 +71,13 @@
"num_layers",
}

__all__ = ["MCoreMinitronConfig", "MCoreMinitronModeDescriptor", "MCoreMinitronSearcher"]
__all__ = [
"SUPPORTED_HPARAMS",
"MCoreMinitronConfig",
"MCoreMinitronModeDescriptor",
"MCoreMinitronSearcher",
"drop_mcore_language_model_layers",
]


class MCoreMinitronSearcher(BaseSearcher):
Expand Down