Skip to content

Commit 6702f99

Browse files
Bump TRT-LLM docker to 1.2.0rc4 (CUDA 13) (#578)
## What does this PR do? Bump TRT-LLM docker from 1.1.0rc2.post2 (CUDA 12) to 1.2.0rc4 (CUDA 13) ## Testing <!-- Mention how have you tested your change if applicable. --> - [ ] 2-gpu tests for all examples: ? ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes <!--- If No, explain why. --> - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: Yes - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> --------- Signed-off-by: Keval Morabia <[email protected]>
1 parent 7074615 commit 6702f99

File tree

7 files changed

+12
-13
lines changed

7 files changed

+12
-13
lines changed

.github/workflows/example_tests.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -93,11 +93,11 @@ jobs:
9393
strategy:
9494
fail-fast: false
9595
matrix:
96-
example: [llm_ptq]
96+
example: [llm_ptq, vlm_ptq]
9797
uses: ./.github/workflows/_example_tests_runner.yml
9898
secrets: inherit
9999
with:
100-
docker_image: "nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2"
100+
docker_image: "nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc4"
101101
example: ${{ matrix.example }}
102102
pip_install_extras: "[hf,dev-test]"
103103
runner: linux-amd64-gpu-h100-latest-1
@@ -111,7 +111,7 @@ jobs:
111111
uses: ./.github/workflows/_example_tests_runner.yml
112112
secrets: inherit
113113
with:
114-
docker_image: "nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2"
114+
docker_image: "nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc4"
115115
example: ${{ matrix.example }}
116116
pip_install_extras: "[hf,dev-test]"
117117
runner: linux-amd64-gpu-h100-latest-2

CHANGELOG.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ Model Optimizer Changelog (Linux)
2727

2828
**Misc**
2929

30+
- Bump TensorRT-LLM docker to 1.2.0rc4.
3031
- Bump minimum recommended transformers version to 4.53.
3132
- Replace ONNX simplification package from ``onnxsim`` to ``onnxslim``.
3233

docs/source/getting_started/_installation_for_Linux.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Latest Model Optimizer (``nvidia-modelopt``) currently has the following system
1818
+-------------------------+-----------------------------+
1919
| PyTorch | >=2.6 |
2020
+-------------------------+-----------------------------+
21-
| TensorRT-LLM (Optional) | 1.1.0rc2.post2 |
21+
| TensorRT-LLM (Optional) | 1.2.0rc4 |
2222
+-------------------------+-----------------------------+
2323
| ONNX Runtime (Optional) | 1.22 |
2424
+-------------------------+-----------------------------+

examples/llm_ptq/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This section focuses on Post-training quantization, a technique that reduces mod
2727

2828
### Docker
2929

30-
For Hugging Face models, please use the TensorRT-LLM docker image (e.g., `nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2`).
30+
For Hugging Face models, please use the TensorRT-LLM docker image (e.g., `nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc4`).
3131
For NeMo models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:25.09`).
3232
Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.
3333

examples/specdec_bench/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
This benchmark is meant to be a lightweight layer ontop of an existing vLLM/SGLang/TRTLLM installation. For example, no install
66
is required if one is running in the following dockers: `vllm/vllm-openai:v0.11.0` (vLLM), `lmsysorg/sglang:v0.5.4.post2` (SGLang), or
7-
`nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc1` (TRT-LLM).
7+
`nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc4` (TRT-LLM).
88

99
Next
1010

@@ -16,7 +16,7 @@ cd examples/specdec_bench
1616

1717
Collect relevant metrics on acceptance rate, timing, and outputs for Speculative Decoding methods.
1818
Acceptance rate refers to the number of tokens generated on every iteration. For a standard Autoregressive LLM, this number
19-
is just 1.
19+
is just 1.
2020

2121
## Getting Started
2222

examples/speculative_decoding/collect_hidden_states/slurm_dump.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,16 +20,16 @@
2020
# THE BIWEEKLY CAPACITY MEETING. IF YOU DON'T KNOW WHO IS THE PIC OF YOUR CSRG PPP
2121
# MANAGEMET, GO WITH `-p backfill -t 00:25:00`.
2222

23-
#SBATCH -A coreai_dlalgo_modelopt
24-
#SBATCH --job-name=coreai_dlalgo_modelopt-generate_eagle_hidden_states
23+
#SBATCH -A <account_name>
24+
#SBATCH --job-name=<job_name>
2525
#SBATCH --nodes=1 --ntasks-per-node=4 --gpus-per-node=4
2626
#SBATCH -p batch
2727
#SBATCH -t 04:00:00
2828

2929
echo "SLURM_ARRAY_TASK_ID: $SLURM_ARRAY_TASK_ID"
3030
echo "SLURM_ARRAY_TASK_COUNT: $SLURM_ARRAY_TASK_COUNT"
3131

32-
CONTAINER="nvcr.io#nvidia/tensorrt-llm/release:1.2.0rc0"
32+
CONTAINER="nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc4"
3333

3434
INPUT_DIR="<Can be directory containing the .jsonl files, or path to single .jsonl file>"
3535
DUMP_DIR="<Directory for output hidden states>"

tests/examples/vlm_ptq/test_qwen_vl.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,8 @@
1717
import pytest
1818
from _test_utils.examples.models import QWEN_VL_PATH
1919
from _test_utils.examples.run_command import run_vlm_ptq_command
20-
from _test_utils.torch.misc import minimum_gpu
2120

2221

2322
@pytest.mark.parametrize("quant", ["fp8", "int8_sq", "nvfp4"])
24-
@minimum_gpu(2)
25-
def test_qwen_vl_multi_gpu(quant):
23+
def test_qwen_vl(quant):
2624
run_vlm_ptq_command(model=QWEN_VL_PATH, quant=quant)

0 commit comments

Comments
 (0)