Skip to content

Commit e03f104

Browse files
Bump trtllm to 1.2.0rc0.post1 and pytorch to 25.08 for cuda 13
Signed-off-by: Keval Morabia <[email protected]>
1 parent c692074 commit e03f104

File tree

10 files changed

+62
-22
lines changed

10 files changed

+62
-22
lines changed

.github/workflows/example_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ jobs:
6666
matrix:
6767
EXAMPLE: [llm_ptq]
6868
container: &example_container
69-
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
69+
image: nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc0.post1
7070
env:
7171
PIP_CONSTRAINT: "" # Disable pip constraint for upgrading packages
7272
HF_TOKEN: ${{ secrets.HF_TOKEN }}

.github/workflows/gpu_tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ jobs:
6363
runs-on: linux-amd64-gpu-l4-latest-1
6464
timeout-minutes: 90
6565
container: &gpu_container
66-
image: nvcr.io/nvidia/pytorch:25.06-py3
66+
image: nvcr.io/nvidia/pytorch:25.08-py3
6767
env:
6868
GIT_DEPTH: 1000 # For correct version for tests/gpu/torch/quantization/plugins/test_megatron.py
6969
PIP_CONSTRAINT: "" # Disable pip constraint for upgrading packages
@@ -76,7 +76,7 @@ jobs:
7676
echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu:/usr/local/tensorrt/targets/x86_64-linux-gnu/lib" >> $GITHUB_ENV
7777
echo "PATH=${PATH}:/usr/local/tensorrt/targets/x86_64-linux-gnu/bin" >> $GITHUB_ENV
7878
- name: Run gpu tests
79-
run: pip install tox-current-env && tox -e py312-cuda12-gpu --current-env
79+
run: pip install tox-current-env && tox -e py312-cuda13-gpu --current-env
8080
gpu-tests-non-pr:
8181
if: ${{ !startsWith(github.ref, 'refs/heads/pull-request/') }}
8282
# Runner list at https://github.com/nv-gha-runners/enterprise-runner-configuration/blob/main/docs/runner-groups.md

.gitlab/tests.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ unit:
2929
.multi-gpu-tests-default:
3030
extends: .tests-default
3131
timeout: 90m
32-
image: nvcr.io/nvidia/pytorch:25.06-py3
32+
image: nvcr.io/nvidia/pytorch:25.08-py3
3333
variables:
3434
GIT_DEPTH: 1000 # For correct version for tests/gpu/torch/quantization/plugins/test_megatron.py
3535
tags: [docker, linux, 2-gpu]
@@ -47,7 +47,7 @@ multi-gpu:
4747
script:
4848
# Use pre-installed packages without a new venv with tox-current-env
4949
- pip install tox-current-env
50-
- tox -e py312-cuda12-gpu --current-env
50+
- tox -e py312-cuda13-gpu --current-env
5151

5252
##### Example Tests #####
5353
example-torch:
@@ -64,7 +64,7 @@ example-torch:
6464
example-trtllm:
6565
extends: example-torch
6666
timeout: 60m
67-
image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2
67+
image: nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc0.post1
6868
tags: [docker, linux, 2-gpu, sm>=89]
6969
parallel:
7070
matrix:

CHANGELOG.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,13 @@ Model Optimizer Changelog (Linux)
44
0.39 (2025-11-xx)
55
^^^^^^^^^^^^^^^^^
66

7-
**Deprecations**
7+
**Backward Breaking Changes**
8+
9+
- Default ``cupy`` package (for INT4 ONNX quantization) is now ``cupy-cuda13x`` for CUDA 13 unless installed from source. If you install from PyPI wheel and have CUDA 12, you need to run ``pip uninstall -y cupy-cuda13x`` and ``pip install cupy-cuda12x`` separately.
810

911
**New Features**
1012

13+
- Upgrade TensorRT-LLM requirement to 1.2.0rc0.post1.
1114
- Add flag ``op_types_to_exclude_fp16`` in ONNX quantization to exclude ops from being converted to FP16/BF16. Alternatively, for custom TensorRT ops, this can also be done by indicating ``'fp32'`` precision in ``trt_plugins_precision``.
1215
- Add LoRA mode support for MCore in a new peft submodule: ``modelopt.torch.peft.update_model(model, LORA_CFG)``.
1316
- Support PTQ and fakequant in vLLM for fast evaluation of arbitrary quantization formats. See ``examples/vllm_serve`` for more details.

docs/source/getting_started/_installation_for_Linux.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Latest Model Optimizer (``nvidia-modelopt``) currently has the following system
1818
+-------------------------+-----------------------------+
1919
| PyTorch | >=2.6 |
2020
+-------------------------+-----------------------------+
21-
| TensorRT-LLM (Optional) | 1.1.0rc2.post2 |
21+
| TensorRT-LLM (Optional) | 1.2.0rc0.post1 |
2222
+-------------------------+-----------------------------+
2323
| ONNX Runtime (Optional) | 1.22 |
2424
+-------------------------+-----------------------------+

examples/llm_ptq/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ This section focuses on Post-training quantization, a technique that reduces mod
2727

2828
### Docker
2929

30-
For Hugging Face models, please use the TensorRT-LLM docker image (e.g., `nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc2.post2`).
30+
For Hugging Face models, please use the TensorRT-LLM docker image (e.g., `nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc0.post1`).
3131
For NeMo models, use the NeMo container (e.g., `nvcr.io/nvidia/nemo:25.07`).
3232
Visit our [installation docs](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more information.
3333

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ exclude_lines = [
154154

155155

156156
[tool.bandit]
157-
exclude_dirs = ["examples/", "tests/"]
157+
exclude_dirs = ["examples/", "tests/", "setup.py"]
158158
# Do not change `skips`. It should be consistent with NVIDIA's Wheel-CI-CD bandit.yml config.
159159
# Use of `# nosec BXXX` requires special approval
160160
skips = [

setup.py

Lines changed: 44 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,45 @@
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
1515

16-
"""The package setup script for modelopt customizing certain aspects of the installation process."""
16+
"""The package setup script for modelopt customizing certain aspects of the installation process.
17+
18+
If installing from source, the CUDA version is detected and the appropriate cupy package is selected.
19+
If installing from a wheel, cupy for CUDA 13 is installed by default.
20+
If you have CUDA 12, you need to run `pip uninstall -y cupy-cuda13x` and `pip install cupy-cuda12x` separately.
21+
"""
22+
23+
import re
24+
import subprocess
1725

1826
import setuptools
1927
from setuptools_scm import get_version
2028

21-
# TODO: Set fallback_version to X.Y.Z release version when creating the release branch
2229
version = get_version(root=".", fallback_version="0.0.0")
2330

31+
32+
def get_cuda_major_version() -> int | None:
33+
"""Return CUDA major version installed on the system or None if detection fails."""
34+
# Check nvcc version
35+
try:
36+
result = subprocess.run(
37+
["nvcc", "--version"],
38+
capture_output=True,
39+
text=True,
40+
timeout=5,
41+
)
42+
if result.returncode == 0:
43+
# Parse output like "release 12.0, V12.0.140" or "release 13.0, V13.0.0"
44+
for line in result.stdout.split("\n"):
45+
if "release" in line.lower():
46+
match = re.search(r"release (\d+)\.", line)
47+
if match:
48+
return int(match.group(1))
49+
except Exception:
50+
pass
51+
52+
return None
53+
54+
2455
# Required and optional dependencies ###############################################################
2556
required_deps = [
2657
# Common
@@ -43,7 +74,6 @@
4374
optional_deps = {
4475
"onnx": [
4576
"cppimport",
46-
"cupy-cuda12x; platform_machine != 'aarch64' and platform_system != 'Darwin'",
4777
"ml_dtypes", # for bfloat16 conversion
4878
"onnx-graphsurgeon",
4979
"onnx~=1.19.0",
@@ -93,14 +123,19 @@
93123
"sphinx-rtd-theme~=3.0.0", # 3.0 does not show version, which we want as Linux & Windows have separate releases
94124
"sphinx-togglebutton>=0.3.2",
95125
],
96-
# build/packaging tools
97-
"dev-build": [
98-
"cython",
99-
"setuptools>=80",
100-
"setuptools-scm>=8",
101-
],
102126
}
103127

128+
# Select the appropriate cupy package based on the detected CUDA version or fallback to cupy-cuda12x
129+
cuda_version = get_cuda_major_version()
130+
131+
if cuda_version is None:
132+
# Default to CUDA 13 if detection fails
133+
cuda_version = 13
134+
135+
optional_deps["onnx"].append(
136+
f"cupy-cuda{cuda_version}x ; platform_machine != 'aarch64' and platform_system != 'Darwin'"
137+
)
138+
104139
# create "compound" optional dependencies
105140
optional_deps["all"] = [
106141
deps for k in optional_deps if not k.startswith("dev") for deps in optional_deps[k]

tests/_test_utils/torch_quantization/onnx_export.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ def forward_loop(model):
6565
input_names=input_names,
6666
output_names=output_names,
6767
do_constant_folding=constant_folding,
68+
dynamo=False, # torch 2.9 flips default to True
6869
**kwargs,
6970
)
7071

tox.ini

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ deps =
1717
torch27: torchvision~=0.22.0
1818
torch28: torchvision~=0.23.0
1919

20+
2021
# Build onnxsim from sdists for Python 3.12 until http://github.com/daquexian/onnx-simplifier/pull/353
2122
py312: onnxsim
2223

@@ -62,7 +63,7 @@ commands =
6263
########################################################
6364
# GPU test environments (Can be used with --current-env)
6465
########################################################
65-
[testenv:{py310,py311,py312}-cuda12-gpu]
66+
[testenv:{py310,py311,py312}-cuda13-gpu]
6667
setenv =
6768
MAMBA_FORCE_BUILD=TRUE
6869
commands_pre =
@@ -71,8 +72,8 @@ commands_pre =
7172
pip install git+https://github.com/Dao-AILab/fast-hadamard-transform.git
7273

7374
# Install Mamba model dependencies (takes 8-10mins!)
74-
# Triton 3.4.0 causes some real quant tests to fail
75-
pip install "triton<3.4"
75+
# Install same triton as pytorch-triton in the NGC PyTorch 25.08 docker otherwise Mamba may install an incompatible version
76+
pip install triton==3.3.1
7677
pip install --no-build-isolation git+https://github.com/state-spaces/mamba.git
7778

7879
# Install Eagle-3 test dependencies

0 commit comments

Comments
 (0)