Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 1 addition & 9 deletions .gitlab/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,20 +54,12 @@ example-torch:
timeout: 30m
parallel:
matrix:
- EXAMPLE: [llm_distill, llm_sparsity, speculative_decoding]
- EXAMPLE: [llm_distill, llm_qat, llm_sparsity, speculative_decoding]
script:
- pip install ".[hf,dev-test]"
- find examples/$EXAMPLE -name "requirements.txt" | while read req_file; do pip install -r "$req_file" || exit 1; done
- pytest -s tests/examples/$EXAMPLE

# TODO: Fix llm_qat test hang in GitLab CI
example-failing:
extends: example-torch
allow_failure: true
parallel:
matrix:
- EXAMPLE: [llm_qat]

example-trtllm:
extends: example-torch
timeout: 60m
Expand Down
9 changes: 9 additions & 0 deletions docs/source/guides/7_nas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -635,3 +635,12 @@ The difference between NAS and pruning is summarized below.
increased training time.
- May provide similar performance to NAS in particular applications, however, usually exhibits
worse performance due to the limited search space and training time.


[Advanced] Adding a new NAS/Prune Algorithm
===========================================

* Please refer to this `template <https://github.com/NVIDIA/TensorRT-Model-Optimizer/compare/template/new-nas-mode>`_
for adding a new NAS algorithm.
* Please refer to `mcore_minitron.py <https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/prune/plugins/mcore_minitron.py>`_
for an actual example of adding Minitron Pruning algorithm.
2 changes: 1 addition & 1 deletion modelopt/torch/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

if not (_Version("4.48") <= _Version(_transformers_version) < _Version("5.0")):
_warnings.warn(
f"transformers version {_transformers_version} is incompatible with nvidia-modelopt and may cause issues. "
f"transformers version {_transformers_version} is not tested with nvidia-modelopt and may cause issues. "
"Please install recommended version with `pip install nvidia-modelopt[hf]` if working with HF models.",
)
except ImportError:
Expand Down
3 changes: 0 additions & 3 deletions modelopt/torch/opt/plugins/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@

from .huggingface import *

with import_plugin("megatron core model config"):
from .megatron_model_config import *

with import_plugin("megatron core dist checkpointing"):
from .mcore_dist_checkpointing import *

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def _test_mamba_parameter_sorting(rank, size):
prompt_tokens = torch.randint(0, vocab_size, (batch_size, max_sequence_length)).cuda()
y1 = run_mcore_inference(model, prompt_tokens)

dynamic_space.sort_parameters()
mtn.utils.sort_parameters(model)

# check if all mamba_num_heads, mamba_head_dim, hidden_size have been sorted
sortable_per_pp = [
Expand Down