Skip to content

Commit 5a6c995

Browse files
Bump TRT-LLM to 1.1.0rc5 + fix failing CICD tests
Signed-off-by: Keval Morabia <[email protected]>
1 parent 6ef9954 commit 5a6c995

File tree

13 files changed

+30
-35
lines changed

13 files changed

+30
-35
lines changed

.github/workflows/example_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ jobs:
6666
matrix:
6767
EXAMPLE: [llm_ptq]
6868
container: &example_container
69-
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
69+
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc5
7070
env:
7171
PIP_CONSTRAINT: "" # Disable pip constraint for upgrading packages
7272
HF_TOKEN: ${{ secrets.HF_TOKEN }}

.github/workflows/gpu_tests.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,8 +73,7 @@ jobs:
7373
- uses: nv-gha-runners/setup-proxy-cache@main
7474
- name: Setup environment variables
7575
run: |
76-
echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib" >> $GITHUB_ENV
77-
echo "PATH=${PATH}:/usr/local/tensorrt/targets/x86_64-linux-gnu/bin" >> $GITHUB_ENV
76+
echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu" >> $GITHUB_ENV
7877
- name: Run gpu tests
7978
run: pip install tox-current-env && tox -e py312-cuda12-gpu --current-env
8079
gpu-tests-non-pr:

.gitlab/tests.yml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,7 @@ unit:
3535
tags: [docker, linux, 2-gpu]
3636
before_script:
3737
# Add libcudnn*.so and libnv*.so to path
38-
- export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib"
39-
# Add trtexec to path
40-
- export PATH="${PATH}:/usr/local/tensorrt/targets/x86_64-linux-gnu/bin"
38+
- export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu"
4139
# Install git-lfs for Daring-Anteater dataset
4240
- apt-get update && apt-get install -y git-lfs
4341
- git lfs install --system
@@ -64,7 +62,7 @@ example-torch:
6462
example-trtllm:
6563
extends: example-torch
6664
timeout: 60m
67-
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
65+
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc5
6866
tags: [docker, linux, 2-gpu, sm>=89]
6967
parallel:
7068
matrix:

CHANGELOG.rst

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,16 @@
11
Model Optimizer Changelog (Linux)
22
=================================
33

4-
0.39 (2025-11-xx)
4+
0.39 (2025-11-07)
55
^^^^^^^^^^^^^^^^^
66

7-
**Deprecations**
8-
97
**New Features**
108

9+
- Upgrade TensorRT-LLM requirement to 1.1.0rc5.
1110
- Add flag ``op_types_to_exclude_fp16`` in ONNX quantization to exclude ops from being converted to FP16/BF16. Alternatively, for custom TensorRT ops, this can also be done by indicating ``'fp32'`` precision in ``trt_plugins_precision``.
1211
- Add LoRA mode support for MCore in a new peft submodule: ``modelopt.torch.peft.update_model(model, LORA_CFG)``.
1312
- Support PTQ and fakequant in vLLM for fast evaluation of arbitrary quantization formats. See ``examples/vllm_serve`` for more details.
14-
- Add support for ``nemotron-post-training-dataset-v2`` and ``nemotron-post-training-dataset-v1`` in ``examples/llm_ptq``. Default to a mix of ``cnn_dailymail`` and ``nemotron-post-training-dataset-v2`` if no dataset is specified.
13+
- Add support for ``nemotron-post-training-dataset-v2`` and ``nemotron-post-training-dataset-v1`` in ``examples/llm_ptq``. Default to a mix of ``cnn_dailymail`` and ``nemotron-post-training-dataset-v2`` (gated dataset accessed using ``HF_TOKEN`` environment variable) if no dataset is specified.
1514
- Allow specifying ``calib_seq`` in ``examples/llm_ptq`` to set the maximum sequence length for calibration.
1615

1716
**Documentation**

docs/source/getting_started/_installation_for_Linux.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Latest Model Optimizer (``nvidia-modelopt``) currently has the following system
1818
+-------------------------+-----------------------------+
1919
| PyTorch | >=2.6 |
2020
+-------------------------+-----------------------------+
21-
| TensorRT-LLM (Optional) | 1.1.0rc2.post2 |
21+
| TensorRT-LLM (Optional) | 1.1.0rc5 |
2222
+-------------------------+-----------------------------+
2323
| ONNX Runtime (Optional) | 1.22 |
2424
+-------------------------+-----------------------------+
@@ -41,8 +41,7 @@ Environment setup
4141
.. code-block:: shell
4242
4343
export PIP_CONSTRAINT=""
44-
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib"
45-
export PATH="${PATH}:/usr/local/tensorrt/targets/x86_64-linux-gnu/bin"
44+
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu"
4645
4746
You may need to install additional dependencies from the respective examples's `requirements.txt` file.
4847

examples/diffusers/quantization/requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,6 @@ nvtx
44
onnx_graphsurgeon
55
opencv-python>=4.8.1.78,<4.12.0.88
66
sentencepiece
7+
# TODO: Fix for torch 2.9
8+
torch<2.9
9+
torchvision<0.24.0

examples/llm_ptq/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This section focuses on Post-training quantization, a technique that reduces mod
2727

2828
### Docker
2929

30-
For Hugging Face models, please use the TensorRT-LLM docker image (e.g., `nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2`).
30+
For Hugging Face models, please use the TensorRT-LLM docker image (e.g., `nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc5`).
3131
For NeMo models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:25.07`).
3232
Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.
3333

examples/llm_sparsity/launch_finetune.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ CMD="accelerate launch --multi_gpu --mixed_precision bf16 finetune.py \
9191
--warmup_ratio 0.0 \
9292
--lr_scheduler_type cosine \
9393
--logging_steps 1 \
94-
--fsdp full_shard auto_wrap \
94+
--fsdp 'full_shard auto_wrap' \
9595
--fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \
9696
--tf32 True \
9797
--modelopt_restore_path $MODELOPT_RESTORE_PATH \
Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
11
flash-attn
22
sentencepiece>=0.2.0
33
tensorboardX
4-
transformers>=4.57.0

tests/_test_utils/torch_quantization/onnx_export.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ def forward_loop(model):
6565
input_names=input_names,
6666
output_names=output_names,
6767
do_constant_folding=constant_folding,
68+
dynamo=False,
6869
**kwargs,
6970
)
7071

0 commit comments

Comments
 (0)