Skip to content

Commit 2d68aff

Browse files
Bump TRT-LLM to 1.1.0rc5 + fix failing CICD tests
Signed-off-by: Keval Morabia <[email protected]>
1 parent 6ef9954 commit 2d68aff

File tree

15 files changed

+76
-55
lines changed

15 files changed

+76
-55
lines changed

.github/workflows/example_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ jobs:
6666
matrix:
6767
EXAMPLE: [llm_ptq]
6868
container: &example_container
69-
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
69+
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc5
7070
env:
7171
PIP_CONSTRAINT: "" # Disable pip constraint for upgrading packages
7272
HF_TOKEN: ${{ secrets.HF_TOKEN }}

.github/workflows/gpu_tests.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,8 +73,7 @@ jobs:
7373
- uses: nv-gha-runners/setup-proxy-cache@main
7474
- name: Setup environment variables
7575
run: |
76-
echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib" >> $GITHUB_ENV
77-
echo "PATH=${PATH}:/usr/local/tensorrt/targets/x86_64-linux-gnu/bin" >> $GITHUB_ENV
76+
echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu" >> $GITHUB_ENV
7877
- name: Run gpu tests
7978
run: pip install tox-current-env && tox -e py312-cuda12-gpu --current-env
8079
gpu-tests-non-pr:

.gitlab/tests.yml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,7 @@ unit:
3535
tags: [docker, linux, 2-gpu]
3636
before_script:
3737
# Add libcudnn*.so and libnv*.so to path
38-
- export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib"
39-
# Add trtexec to path
40-
- export PATH="${PATH}:/usr/local/tensorrt/targets/x86_64-linux-gnu/bin"
38+
- export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu"
4139
# Install git-lfs for Daring-Anteater dataset
4240
- apt-get update && apt-get install -y git-lfs
4341
- git lfs install --system
@@ -64,7 +62,7 @@ example-torch:
6462
example-trtllm:
6563
extends: example-torch
6664
timeout: 60m
67-
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
65+
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc5
6866
tags: [docker, linux, 2-gpu, sm>=89]
6967
parallel:
7068
matrix:

CHANGELOG.rst

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,16 @@
11
Model Optimizer Changelog (Linux)
22
=================================
33

4-
0.39 (2025-11-xx)
4+
0.39 (2025-11-07)
55
^^^^^^^^^^^^^^^^^
66

7-
**Deprecations**
8-
97
**New Features**
108

9+
- Upgrade TensorRT-LLM requirement to 1.1.0rc5.
1110
- Add flag ``op_types_to_exclude_fp16`` in ONNX quantization to exclude ops from being converted to FP16/BF16. Alternatively, for custom TensorRT ops, this can also be done by indicating ``'fp32'`` precision in ``trt_plugins_precision``.
1211
- Add LoRA mode support for MCore in a new peft submodule: ``modelopt.torch.peft.update_model(model, LORA_CFG)``.
1312
- Support PTQ and fakequant in vLLM for fast evaluation of arbitrary quantization formats. See ``examples/vllm_serve`` for more details.
14-
- Add support for ``nemotron-post-training-dataset-v2`` and ``nemotron-post-training-dataset-v1`` in ``examples/llm_ptq``. Default to a mix of ``cnn_dailymail`` and ``nemotron-post-training-dataset-v2`` if no dataset is specified.
13+
- Add support for ``nemotron-post-training-dataset-v2`` and ``nemotron-post-training-dataset-v1`` in ``examples/llm_ptq``. Default to a mix of ``cnn_dailymail`` and ``nemotron-post-training-dataset-v2`` (gated dataset accessed using ``HF_TOKEN`` environment variable) if no dataset is specified.
1514
- Allow specifying ``calib_seq`` in ``examples/llm_ptq`` to set the maximum sequence length for calibration.
1615

1716
**Documentation**

docs/source/getting_started/_installation_for_Linux.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Latest Model Optimizer (``nvidia-modelopt``) currently has the following system
1818
+-------------------------+-----------------------------+
1919
| PyTorch | >=2.6 |
2020
+-------------------------+-----------------------------+
21-
| TensorRT-LLM (Optional) | 1.1.0rc2.post2 |
21+
| TensorRT-LLM (Optional) | 1.1.0rc5 |
2222
+-------------------------+-----------------------------+
2323
| ONNX Runtime (Optional) | 1.22 |
2424
+-------------------------+-----------------------------+
@@ -41,8 +41,7 @@ Environment setup
4141
.. code-block:: shell
4242
4343
export PIP_CONSTRAINT=""
44-
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib"
45-
export PATH="${PATH}:/usr/local/tensorrt/targets/x86_64-linux-gnu/bin"
44+
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu"
4645
4746
You may need to install additional dependencies from the respective examples's `requirements.txt` file.
4847

@@ -127,6 +126,10 @@ Additionally, we support installing dependencies for following 3rd-party package
127126
* - Huggingface (``transformers``, ``diffusers``, etc.)
128127
- ``[hf]``
129128

129+
**CUDA / Python specific dependencies**
130+
131+
* Onnxsim for Python 3.12+ requires CMake to build from source. If you are installing ``nvidia-modelopt`` on Python 3.12+, you need to run ``pip install cmake`` before installing ``nvidia-modelopt``.
132+
130133
**Accelerated Quantization with Triton Kernels**
131134

132135
ModelOpt includes optimized quantization kernels implemented with Triton language that accelerate quantization

examples/diffusers/quantization/requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,6 @@ nvtx
44
onnx_graphsurgeon
55
opencv-python>=4.8.1.78,<4.12.0.88
66
sentencepiece
7+
# TODO: Fix for torch 2.9
8+
torch<2.9
9+
torchvision<0.24.0

examples/llm_ptq/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This section focuses on Post-training quantization, a technique that reduces mod
2727

2828
### Docker
2929

30-
For Hugging Face models, please use the TensorRT-LLM docker image (e.g., `nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2`).
30+
For Hugging Face models, please use the TensorRT-LLM docker image (e.g., `nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc5`).
3131
For NeMo models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:25.07`).
3232
Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.
3333

examples/llm_sparsity/launch_finetune.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ CMD="accelerate launch --multi_gpu --mixed_precision bf16 finetune.py \
9191
--warmup_ratio 0.0 \
9292
--lr_scheduler_type cosine \
9393
--logging_steps 1 \
94-
--fsdp full_shard auto_wrap \
94+
--fsdp 'full_shard auto_wrap' \
9595
--fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer \
9696
--tf32 True \
9797
--modelopt_restore_path $MODELOPT_RESTORE_PATH \
Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
11
flash-attn
22
sentencepiece>=0.2.0
33
tensorboardX
4-
transformers>=4.57.0

pyproject.toml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,12 @@
22
############################### BUILD CONFIGURATION ##############################################
33
####################################################################################################
44
[build-system]
5-
requires = ["cython", "setuptools>=80", "setuptools-scm>=8"]
5+
requires = [
6+
"cmake ; python_version >= '3.12'", # onnxsim for Python 3.12+ requires CMake to build from source
7+
"cython",
8+
"setuptools>=80",
9+
"setuptools-scm>=8",
10+
]
611
build-backend = "setuptools.build_meta"
712

813
[tool.setuptools_scm]
@@ -154,7 +159,7 @@ exclude_lines = [
154159

155160

156161
[tool.bandit]
157-
exclude_dirs = ["examples/", "tests/"]
162+
exclude_dirs = ["examples/", "tests/", "setup.py"]
158163
# Do not change `skips`. It should be consistent with NVIDIA's Wheel-CI-CD bandit.yml config.
159164
# Use of `# nosec BXXX` requires special approval
160165
skips = [

0 commit comments

Comments
 (0)