Skip to content

Commit 6ffe2d8

Browse files
Fix multi-gpu mamba test
Signed-off-by: Keval Morabia <[email protected]>
1 parent 512e96f commit 6ffe2d8

File tree

4 files changed

+12
-11
lines changed

4 files changed

+12
-11
lines changed

.gitlab/tests.yml

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -54,20 +54,12 @@ example-torch:
5454
timeout: 30m
5555
parallel:
5656
matrix:
57-
- EXAMPLE: [llm_distill, llm_sparsity, speculative_decoding]
57+
- EXAMPLE: [llm_distill, llm_qat, llm_sparsity, speculative_decoding]
5858
script:
5959
- pip install ".[hf,dev-test]"
6060
- find examples/$EXAMPLE -name "requirements.txt" | while read req_file; do pip install -r "$req_file" || exit 1; done
6161
- pytest -s tests/examples/$EXAMPLE
6262

63-
# TODO: Fix llm_qat test hang in GitLab CI
64-
example-failing:
65-
extends: example-torch
66-
allow_failure: true
67-
parallel:
68-
matrix:
69-
- EXAMPLE: [llm_qat]
70-
7163
example-trtllm:
7264
extends: example-torch
7365
timeout: 60m

docs/source/guides/7_nas.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -635,3 +635,12 @@ The difference between NAS and pruning is summarized below.
635635
increased training time.
636636
- May provide similar performance to NAS in particular applications, however, usually exhibits
637637
worse performance due to the limited search space and training time.
638+
639+
640+
[Advanced] Adding a new NAS/Prune Algorithm
641+
===========================================
642+
643+
* Please refer to this `template <https://github.com/NVIDIA/TensorRT-Model-Optimizer/compare/template/new-nas-mode>`_
644+
for adding a new NAS algorithm.
645+
* Please refer to `mcore_minitron.py <https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/prune/plugins/mcore_minitron.py>`_
646+
for an actual example of adding Minitron Pruning algorithm.

modelopt/torch/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434

3535
if not (_Version("4.48") <= _Version(_transformers_version) < _Version("5.0")):
3636
_warnings.warn(
37-
f"transformers version {_transformers_version} is incompatible with nvidia-modelopt and may cause issues. "
37+
f"transformers version {_transformers_version} is not tested with nvidia-modelopt and may cause issues. "
3838
"Please install recommended version with `pip install nvidia-modelopt[hf]` if working with HF models.",
3939
)
4040
except ImportError:

tests/gpu/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,7 @@ def _test_mamba_parameter_sorting(rank, size):
173173
prompt_tokens = torch.randint(0, vocab_size, (batch_size, max_sequence_length)).cuda()
174174
y1 = run_mcore_inference(model, prompt_tokens)
175175

176-
dynamic_space.sort_parameters()
176+
mtn.utils.sort_parameters(model)
177177

178178
# check if all mamba_num_heads, mamba_head_dim, hidden_size have been sorted
179179
sortable_per_pp = [

0 commit comments

Comments
 (0)