Update dependencies #9818

EmmaQiaoCh · 2025-12-09T02:54:24Z

Summary by CodeRabbit

Chores
- Updated Docker base image registry and build tags for multi-stage builds
- Upgraded CUDA toolkit to version 13.1.0
- Updated PyTorch from 2.9.0 to 2.9.1
- Upgraded TensorRT, cuDNN, NCCL, and cuBLAS dependencies to latest compatible versions
- Updated Polygraphy to version 0.49.26
- Updated transformers library dependency to 4.57.1

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-12-09T02:59:12Z

📝 Walkthrough

Walkthrough

Version updates across Docker build configuration and Python dependencies. Changes affect base images, CUDA toolkit, PyTorch, TensorRT, Polygraphy, and package versions without modifying any control flow or introducing new features.

Changes

Cohort / File(s)	Summary
Docker multi-stage build configuration `docker/Dockerfile.multi`	Updated default build arguments: BASE_IMAGE to `gitlab-master.nvidia.com:5005/dl/dgx/pytorch`, BASE_TAG to `25.12-py3.39451658-devel`, TRITON_BASE_TAG to `25.11-py3`
Docker CUDA toolkit installer `docker/common/install_cuda_toolkit.sh`	Updated CUDA_VER from `13.0.2_580.95.05` to `13.1.0_590.44.01`
Docker Polygraphy installer `docker/common/install_polygraphy.sh`	Updated Polygraphy version from `0.49.9` to `0.49.26`
Docker PyTorch installer `docker/common/install_pytorch.sh`	Updated PyTorch version from `2.9.0` to `2.9.1`
Docker TensorRT installer `docker/common/install_tensorrt.sh`	Updated TensorRT to `10.14.1.48` and CUDA stack versions: CUDA_VER to `13.1`, CUDNN_VER to `9.17.0.21-1`, NCCL_VER to `2.28.9-1+cuda13.0`, CUBLAS_VER to `13.2.0.9-1`, NVRTC_VER to `13.1.80-1`, CUDA_RUNTIME to `13.1.80-1`, CUDA_DRIVER_VERSION to `590.44.01-1.el8`
Python dependencies `requirements.txt`	Updated tensorrt to `~=10.14.1`, torch range to `>=2.1.0a0,<=2.9.1`, nvidia-nccl-cu13 to `2.28.9`, transformers to `4.57.1`

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Verification focus: Ensure all version updates across Docker scripts and requirements.txt are consistent (e.g., CUDA and TensorRT versions align across multiple files)
Cross-file consistency: Check that docker/common/install_tensorrt.sh and requirements.txt TensorRT versions match (10.14.1)
Dependency compatibility: Verify PyTorch 2.9.1 and TensorRT 10.14.1 compatibility with CUDA 13.1

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely template boilerplate with no substantive content. All required sections (Description, Test Coverage) are empty; only a checked checklist box is present without actual details.	Fill in the Description section explaining which dependencies were updated and why, and complete the Test Coverage section listing relevant tests that validate the dependency updates.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Update dependencies' directly and accurately summarizes the main change—updating multiple dependency versions across Docker files and requirements.txt.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a3a66b and b065e62.

📒 Files selected for processing (6)

docker/Dockerfile.multi (1 hunks)
docker/common/install_cuda_toolkit.sh (1 hunks)
docker/common/install_polygraphy.sh (1 hunks)
docker/common/install_pytorch.sh (1 hunks)
docker/common/install_tensorrt.sh (1 hunks)
requirements.txt (1 hunks)

🧰 Additional context used

🧠 Learnings (7)

📓 Common learnings

Learnt from: farshadghodsian
Repo: NVIDIA/TensorRT-LLM PR: 7101
File: docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md:36-36
Timestamp: 2025-08-21T00:16:56.457Z
Learning: TensorRT-LLM container release tags in documentation should only reference published NGC container images. The README badge version may be ahead of the actual published container versions.

📚 Learning: 2025-10-17T13:21:31.724Z

Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 8398
File: tensorrt_llm/_torch/pyexecutor/sampling_utils.py:237-272
Timestamp: 2025-10-17T13:21:31.724Z
Learning: The setup.py file in TensorRT-LLM explicitly requires Python 3.10+ via `python_requires=">=3.10, <4"`, making match/case statements and other Python 3.10+ features appropriate throughout the codebase.

Applied to files:

requirements.txt

📚 Learning: 2025-08-01T15:14:45.673Z

Learnt from: yibinl-nvidia
Repo: NVIDIA/TensorRT-LLM PR: 6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

requirements.txt

📚 Learning: 2025-09-23T15:12:38.312Z

Learnt from: nv-lschneider
Repo: NVIDIA/TensorRT-LLM PR: 7910
File: cpp/tensorrt_llm/thop/allreduceOp.cpp:352-446
Timestamp: 2025-09-23T15:12:38.312Z
Learning: In TensorRT-LLM NCCL device implementation, NCCL version 2.28+ requirements are handled at runtime in the nccl_device/config layer rather than with compile-time guards. This allows the allreduceOp to remain version-agnostic and delegates version compatibility validation to the appropriate lower-level components that can gracefully handle unsupported configurations.

Applied to files:

requirements.txt

📚 Learning: 2025-08-27T14:23:55.566Z

Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 7294
File: tensorrt_llm/_torch/modules/rms_norm.py:17-17
Timestamp: 2025-08-27T14:23:55.566Z
Learning: The TensorRT-LLM project requires Python 3.10+ as evidenced by the use of TypeAlias from typing module, match/case statements, and union type | syntax throughout the codebase, despite some documentation still mentioning Python 3.8+.

Applied to files:

requirements.txt

📚 Learning: 2025-08-21T00:16:56.457Z

Learnt from: farshadghodsian
Repo: NVIDIA/TensorRT-LLM PR: 7101
File: docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md:36-36
Timestamp: 2025-08-21T00:16:56.457Z
Learning: TensorRT-LLM container release tags in documentation should only reference published NGC container images. The README badge version may be ahead of the actual published container versions.

Applied to files:

requirements.txt
docker/common/install_tensorrt.sh

📚 Learning: 2025-08-20T15:04:42.885Z

Learnt from: dbari
Repo: NVIDIA/TensorRT-LLM PR: 7095
File: docker/Dockerfile.multi:168-168
Timestamp: 2025-08-20T15:04:42.885Z
Learning: In docker/Dockerfile.multi, wildcard COPY for benchmarks (${CPP_BUILD_DIR}/benchmarks/*Benchmark) is intentionally used instead of directory copy because the benchmarks directory contains various other build artifacts during C++ builds, and only specific benchmark executables should be copied to the final image.

Applied to files:

docker/Dockerfile.multi

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (5)

docker/Dockerfile.multi (2)

5-5: The Triton container image nvcr.io/nvidia/tritonserver:25.11-py3 is published and available on the NGC registry. The TRITON_BASE_TAG=25.11-py3 update is valid.

2-4: MAJOR: Base image switched from public NGC registry to internal GitLab registry.

Line 2 changes the BASE_IMAGE from the public NVIDIA NGC registry (nvcr.io/nvidia/pytorch) to an internal NVIDIA GitLab registry (gitlab-master.nvidia.com:5005/dl/dgx/pytorch). This prevents external users and CI/CD systems from accessing the image, breaks reproducibility, and contradicts established best practices for TensorRT-LLM which should reference only published NGC container images.

Additionally, Line 4 BASE_TAG includes an internal commit hash (py3.39451658-devel), indicating a development snapshot rather than a stable release.

Confirm whether this is intentional for the main branch, and if a public release is intended, identify and use the corresponding published NGC image instead.

requirements.txt (3)

28-28: NCCL 2.28.9 is available on PyPI.

nvidia-nccl-cu13 version 2.28.9 is confirmed available on PyPI (released Nov 18, 2025) with wheels for x86_64 and aarch64 architectures. No compatibility concerns identified.

22-22: TensorRT 10.14.1 is available on PyPI (tensorrt-10.14.1.48.post1) and is compatible with CUDA 13.1. No compatibility issues identified.

30-30: No action needed — transformers 4.57.1 is available and compatible.

transformers 4.57.1 is available on PyPI (released Oct 14, 2025) and compatible with PyTorch 2.9.1. The package is tested against PyTorch 2.2+, and practical usage confirms it works reliably with torch ≥ 2.3. The version bump from 4.56.0 to 4.57.1 is safe.

coderabbitai · 2025-12-09T02:59:15Z

docker/common/install_cuda_toolkit.sh

 # CUDA version is usually aligned with the latest NGC CUDA image tag.
 # Only use when public CUDA image is not ready.
-CUDA_VER="13.0.2_580.95.05"
+CUDA_VER="13.1.0_590.44.01"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

CUDA 13.1.0 driver 590.44.01 NVIDIA availability compatibility

💡 Result:

Short answer: Yes — NVIDIA driver 590.44.01 (beta, released Dec 2, 2025) is a newer driver branch than the R580 baseline required for CUDA 13.x, so it should be compatible with CUDA 13.1.0. (docs.nvidia.com)

Notes:

CUDA 13.x requires a driver >= the R580 series (minimum driver ~580.xx); any later driver branch (including 590) is backward‑compatible for running CUDA 13.x applications. (docs.nvidia.com)

590.44.01 is a Linux beta release (Dec 2, 2025) and its release notes / community reports indicate changes to supported hardware (some older GTX 9/10-series behavior on Linux). If you use older Maxwell/Pascal GPUs, test carefully before upgrading. (nvidia.com)

If you want, I can: (a) check the exact CUDA 13.1 toolkit release notes for any additional driver-minimums, or (b) verify whether a specific GPU model you have is listed as supported with the 590.44.01 driver.

Citations:

1: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html?utm_source=openai

2: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html?utm_source=openai

3: https://www.nvidia.com/Download/processFind.aspx?ctk=0&lang=en-us&lid=1&osid=12&qnfslb=01&whql=&utm_source=openai

CUDA 13.1.0 and driver 590.44.01 are compatible and available.

Both versions exist and are legitimate NVIDIA releases. CUDA 13.x requires driver >= R580 series, and driver 590.44.01 is backward-compatible. However, note that 590.44.01 is a beta driver (released Dec 2, 2025) with hardware compatibility changes for older Maxwell/Pascal GPUs—test thoroughly before upgrading on systems using legacy GPU models.

🤖 Prompt for AI Agents

In docker/common/install_cuda_toolkit.sh around line 8, the script sets CUDA_VER="13.1.0_590.44.01" but lacks documentation about driver compatibility and the fact that 590.44.01 is a beta release with potential hardware impacts; add an inline comment above this variable stating that CUDA 13.1 requires driver R580+ and that driver 590.44.01 is a beta release (Dec 2, 2025) which may affect older Maxwell/Pascal GPUs and should be tested on legacy hardware before deployment, so operators are warned to validate hardware compatibility and consider pinning a stable driver if needed.

coderabbitai · 2025-12-09T02:59:15Z

docker/common/install_pytorch.sh

 # and closest to the version specified in
 # https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-10.html#rel-25-10
-TORCH_VERSION="2.9.0"
+TORCH_VERSION="2.9.1"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

PyTorch 2.9.1 PyPI availability CUDA 13.1 compatibility

💡 Result:

Short answer: Yes — PyTorch 2.9.1 is published to PyPI (release files uploaded Nov 12, 2025). Official prebuilt CUDA wheels for 2.9.1 on download.pytorch.org are provided for CUDA 13.0 (cu130) — CUDA 13.1 / cu131 is not listed as an official wheel. [1][2][3]

Sources:

PyTorch 2.9.1 GA announcement (promotions to PyPI/download done). [1]

PyTorch wheel index showing torch-2.9.1+cu130 files. [2]

torch (PyPI) release files (torch‑2.9.1 upload entries). [3]

Address CUDA 13.1 compatibility gap for PyTorch 2.9.1.

PyTorch 2.9.1 is available on PyPI but official prebuilt wheels are only provided for CUDA 13.0 (cu130). CUDA 13.1 (cu131) is not officially supported. If this script targets CUDA 13.1, either use the cu130 wheel (with caution), downgrade to a version with CUDA 13.1 support, or add a note documenting this limitation.

🤖 Prompt for AI Agents

docker/common/install_pytorch.sh around line 8: TORCH_VERSION is set to "2.9.1" but PyTorch 2.9.1 does not have official cu131 wheels; update the script to explicitly pick a compatible CUDA wheel or document the limitation. Either (A) set TORCH_VERSION and the corresponding CUDA tag to an officially supported pairing (e.g., keep 2.9.1 but force the cu130 wheel tag when installing), or (B) change TORCH_VERSION to a release that officially supports CUDA 13.1, and add a short comment above the variable explaining which CUDA tag is targeted and that cu131 has no official 2.9.1 wheel. Ensure the install command uses the matching CUDA wheel tag and add the comment so future maintainers know the compatibility choice.

coderabbitai · 2025-12-09T02:59:16Z

docker/common/install_tensorrt.sh

+TRT_VER="10.14.1.48"
 # Align with the pre-installed cuDNN / cuBLAS / NCCL versions from
-# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-10.html#rel-25-10
-CUDA_VER="13.0" # 13.0.2
+# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-12.html#rel-25-12
+CUDA_VER="13.1" # 13.1.0
 # Keep the installation for cuDNN if users want to install PyTorch with source codes.
 # PyTorch 2.x can compile with cuDNN v9.
-CUDNN_VER="9.14.0.64-1"
-NCCL_VER="2.27.7-1+cuda13.0"
-CUBLAS_VER="13.1.0.3-1"
+CUDNN_VER="9.17.0.21-1"
+NCCL_VER="2.28.9-1+cuda13.0"
+CUBLAS_VER="13.2.0.9-1"
 # Align with the pre-installed CUDA / NVCC / NVRTC versions from
 # https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
-NVRTC_VER="13.0.88-1"
-CUDA_RUNTIME="13.0.96-1"
-CUDA_DRIVER_VERSION="580.95.05-1.el8"
+NVRTC_VER="13.1.80-1"
+CUDA_RUNTIME="13.1.80-1"
+CUDA_DRIVER_VERSION="590.44.01-1.el8"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

NVIDIA CUDA 13.1 cuDNN 9.17.0 cuBLAS 13.2 compatibility

💡 Result:

Short answer: cuDNN 9.17.0 is officially supported on CUDA 13.1, but cuBLAS 13.2 is tied to CUDA 13.2 — so either use cuBLAS 13.1 with CUDA 13.1 + cuDNN 9.17.0, or upgrade the CUDA toolkit to 13.2 (and matching driver) if you want cuBLAS 13.2. [1][2][3]

Details / references:

cuDNN 9.17.0 support matrix — lists "cuDNN 9.17.0 for CUDA 13.x" and explicitly includes CUDA 13.1 (recommended config: cuDNN 9.17.0 with CUDA 13.1). [1]

cuBLAS is shipped as part of the CUDA Toolkit releases (cuBLAS 13.2 corresponds to CUDA 13.2 toolchain). Mixing a cuBLAS built for 13.2 with a CUDA 13.1 toolkit is not the recommended configuration; use the cuBLAS that matches your CUDA toolkit or upgrade CUDA. See CUDA/cuBLAS release notes and toolkit components. [2][3]

If you want, tell me which exact OS and driver version you plan to use and I’ll give the precise driver/CUDA/cuBLAS/cuDNN combo to install.

Sources:
[1] NVIDIA cuDNN 9.17.0 support matrix.
[2] NVIDIA CUDA Toolkit / cuBLAS release notes (CUDA 13.x).
[3] NVIDIA Frameworks / container support matrix (shows cuBLAS versions tied to CUDA).

🌐 Web query:

NVIDIA CUDA 13.1 package repository cuDNN 9.17.0.21-1

💡 Result:

You can install cuDNN 9.17 (for CUDA 13.1) from NVIDIA's CUDA package repository. Steps (Ubuntu/Debian example) — replace ${OS} with ubuntu2204|ubuntu2004|debian11 as appropriate:

Add NVIDIA repo and keys:
wget https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.pin
sudo mv cuda-${OS}.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/ /"
sudo apt-get update. [1][2]

Install the cuDNN 9 (CUDA‑13) packages. Example (specify exact version if you need 9.17.0.21-1):
sudo apt-get install libcudnn9=9.17.0.21-1+cuda13.1 libcudnn9-dev=9.17.0.21-1+cuda13.1
or use the meta-package to get the latest cuDNN 9 for CUDA 13:
sudo apt-get install cudnn9-cuda-13. [2][5]

Notes

Official cuDNN 9.17 supports CUDA 13.x (including 13.1) — see support matrix. [1]

If you use RHEL/Rocky/SUSE use the dnf/zypper instructions and package names (cudnn9-cuda-13 or libcudnn9-cuda-13). [2][5]

If the exact package build string you need (9.17.0.21-1) is not available via repo, download the archived packages directly from NVIDIA’s cuDNN redist/index pages and install locally. [4][6]

References
[1] NVIDIA cuDNN support matrix (cuDNN 9.17 for CUDA 13.x).
[2] NVIDIA cuDNN installation (Linux) — repo + apt/dnf/zypper install instructions.
[4] NVIDIA cuDNN redist/index (direct package archive).
[5] cuDNN base/meta package names (libcudnn9-cuda-13, cudnn9-cuda-13).

🌐 Web query:

NCCL 2.28.9 CUDA 13.1 compatibility

💡 Result:

Short answer: Yes — NCCL 2.28.9 supports CUDA 13.x (so it is compatible with CUDA 13.1). [1][2]

Sources:

NCCL 2.28.9 release notes (states "This NCCL release supports CUDA 12.x and CUDA 13.x"). [1]

NVIDIA frameworks/support matrix showing CUDA 13.x usage across NVIDIA DL components. [2]

Fix cuBLAS version mismatch with CUDA 13.1.

Line 13 specifies cuBLAS 13.2.0.9-1, but the script uses CUDA 13.1 (line 8). cuBLAS 13.2 is tied to CUDA 13.2; using it with CUDA 13.1 is an unsupported configuration. Either downgrade cuBLAS to 13.1.0.3-1 (or later 13.1.x) to match CUDA 13.1, or upgrade CUDA to 13.2 with a matching driver version.

The other updated versions are compatible: cuDNN 9.17.0.21-1 and NCCL 2.28.9 both support CUDA 13.1.

🤖 Prompt for AI Agents

In docker/common/install_tensorrt.sh around lines 5 to 18, CUBLAS_VER is set to 13.2.0.9-1 while CUDA_VER is 13.1 which is an unsupported mix; change the CUBLAS_VER to a 13.1-compatible release (e.g., set CUBLAS_VER="13.1.0.3-1" or another 13.1.x package) so cuBLAS matches CUDA 13.1, or alternatively bump CUDA_VER (and any corresponding CUDA_RUNTIME/NVRTC/CUDA_DRIVER_VERSION) to 13.2 to match the existing cuBLAS—preferably update CUBLAS_VER to 13.1.0.3-1 to keep CUDA at 13.1.

coderabbitai · 2025-12-09T02:59:16Z

requirements.txt

 # https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-10.html#rel-25-10 uses 2.9.0a0.
-torch>=2.9.0a0,<=2.9.0
+torch>=2.1.0a0,<=2.9.1


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's confirm the actual torch version specifications in requirements.txt cat -n requirements.txt | head -30

Repository: NVIDIA/TensorRT-LLM

Length of output: 182

🏁 Script executed:

# Check install_tensorrt.sh to understand the PyTorch release notes reference cat -n docker/common/install_tensorrt.sh | head -20

Repository: NVIDIA/TensorRT-LLM

Length of output: 182

🏁 Script executed:

# Search for PyTorch version checks or compatibility requirements in the codebase rg -i "torch.*version" --type py -A 3 | head -100

Repository: NVIDIA/TensorRT-LLM

Length of output: 182

🏁 Script executed:

# Look for CUDA version compatibility specifications rg -i "cuda.*13\.1\|cuda.*version" --type py --type cpp -A 2 | head -50

Repository: NVIDIA/TensorRT-LLM

Length of output: 182

🏁 Script executed:

# Check if there are CI configurations testing specific PyTorch versions fd "\.yml|\.yaml" | xargs grep -l "torch" 2>/dev/null | head -5

Repository: NVIDIA/TensorRT-LLM

Length of output: 182

🌐 Web query:

PyTorch 2.1 2.2 2.3 CUDA 13.1 compatibility support

💡 Result:

Short answer: Official PyTorch binaries for 2.1 / 2.2 / 2.3 do NOT include CUDA 13.1 (cu131) wheels. PyTorch 2.1–2.3 official releases provide CUDA 11.8 / 12.1 (and similar 12.x) builds — CUDA 13.x wheels are provided only in later PyTorch releases/nightlies. If you need CUDA 13.1 with one of those older PyTorch versions you must build PyTorch from source (or use a later PyTorch wheel compiled for cu131). [1][2][3]

References

PyTorch 2.3 RC/installer notes: official CUDA builds listed as CUDA 11.8 and 12.1 for 2.3. [1]

PyTorch Get Started / installer page shows the official supported CUDA build tags (stable/pip/conda) and up through 12.x for recent stable releases (no cu131 listed for 2.1–2.3). [2]

Community compatibility matrix / discussion showing 2.1–2.3 map to CUDA 12.1/11.8 (no cu131 wheels) and advice to build from source when you need a different local CUDA toolkit. [3]

If you want, tell me which PyTorch version and OS/package manager you use and I’ll give the exact build (pip/conda) command or step‑by‑step source-build instructions for CUDA 13.1.

🌐 Web query:

PyTorch 2.9 CUDA 13.1 support compatibility

💡 Result:

Short answer: PyTorch 2.9 does not officially ship a CUDA 13.1 wheel. PyTorch 2.9 added CUDA 13 support (CUDA 13.0 as a prototype/variant) while shipping stable wheels for CUDA 12.8 (and older 12.x); CUDA 13.0/variant wheels are the supported CUDA‑13 option for 2.9. [1][2][3]

If you need CUDA 13.1 specifically:

Use the CUDA 13.0 (cu130) PyTorch 2.9 wheel if that is acceptable. [2]

Otherwise you must build PyTorch 2.9 from source against your CUDA 13.1 toolkit (or use a newer PyTorch release that explicitly lists cu131 wheels). Building from source is the way to ensure compatibility when an official binary for a minor CUDA point‑release is not provided. [1][3]

Sources:

PyTorch 2.9 release notes / blog (CUDA 13 wheel/variant support). [1]

PyTorch previous‑versions / download instructions listing cu130/cu128/cu126 wheels for 2.9. [2]

PyTorch CUDA support RFC / release discussion (CUDA 13.0 marked prototype for 2.9). [3]

🌐 Web query:

PyTorch release notes CUDA version requirements 2024 2025

💡 Result:

Summary — PyTorch CUDA requirements (2024–2025)

2024 (PyTorch 2.2 release): official PyTorch release notes / install guidance for the 2.2 timeframe list CUDA 11.8 and CUDA 12.x (12.1) as the primary supported binary runtimes for PyTorch 2.2. See the PyTorch 2.2 announcement and the Get Started installer guidance. [1][2]

2024–2025 (later 2024 & 2025 builds/containers): PyTorch container images and NVIDIA-optimized builds moved to CUDA 12.x series (examples: CUDA 12.4.1 in the 24.05 image, CUDA 12.6.3 in 24.11, CUDA 12.8 in 25.01, and CUDA 12.9 in 25.04). For container/NGC builds verify the specific release’s CUDA toolkits and driver requirements. [3][4][5]

Notes:

PyTorch pip/Conda binaries are published for specific CUDA runtimes (e.g., cu118, cu121, etc.); drivers supporting a given CUDA toolkit are required but newer drivers are typically backward-compatible. If you need to run with a different local CUDA toolkit you can build from source. Check the PyTorch “Get Started / Locally” installer for the exact CUDA tags available for each PyTorch release. [2][3]

Sources

PyTorch 2.2 release announcement. [PyTorch blog]. [1] (pytorch.org)

PyTorch Get Started / Locally (installation matrix & CUDA choices). [PyTorch docs]. [2] (pytorch.org)

NVIDIA PyTorch release notes — PyTorch Release 24.05 (CUDA 12.4.1). [3] (docs.nvidia.com)

NVIDIA PyTorch release notes — PyTorch Release 24.11 (CUDA 12.6.3). [4] (docs.nvidia.com)

NVIDIA PyTorch release notes — PyTorch Release 25.04 / 25.01 (CUDA 12.9 / 12.8). [5] (docs.nvidia.com)

If you want, I can list exact PyTorch binary CUDA tags (cu116/cu118/cu121/… ) for each PyTorch release (2024 and 2025) and the matching pip/conda install commands.

Citations:

1: https://pytorch.org/blog/pytorch2-2/?utm_source=openai

2: https://pytorch.org/get-started/locally/?utm_source=openai

3: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-05.html?utm_source=openai

4: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/chunk2018173383.html?utm_source=openai

5: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-04.html?utm_source=openai

🌐 Web query:

PyTorch 2.4 2.5 2.6 2.7 2.8 CUDA 13 support official wheels

💡 Result:

Short answer — official PyTorch wheels with CUDA 13 were not provided for 2.4–2.8. PyTorch began publishing official CUDA‑13 wheel support with the 2.9 release (CUDA‑13 added to the wheel-variant matrix); 2.8 focused on CUDA 12.x (and experimental wheel‑variants) and earlier releases do not have CUDA‑13 wheels. [1][2][3][4]

Sources:

PyTorch 2.8 release (CUDA 12.x focus, wheel‑variants experimental). [1]

PyTorch 2.7 release notes (CUDA 12.8 wheels mentioned). [2]

PyTorch 2.9 release (explicitly expands wheel-variant support to include CUDA 13). [3]

Community/issue reports noting lack of CUDA‑13 wheels for earlier/nightly builds. [4]

References:
[1] https://pytorch.org/blog/pytorch-2-8/ (PyTorch 2.8)
[2] https://pytorch.org/blog/pytorch-2-7/ (PyTorch 2.7)
[3] https://pytorch.org/blog/pytorch-2-9/ (PyTorch 2.9: CUDA 13 added)
[4] pytorch/pytorch#167817 (community issue re: missing CUDA 13 wheels)

CRITICAL: Torch version range expansion is incompatible with CUDA 13.1 requirements.

The change from torch>=2.9.0a0,<=2.9.0 to torch>=2.1.0a0,<=2.9.1 creates an unresolvable dependency conflict. Official PyTorch wheels for versions 2.1–2.8 do not support CUDA 13.x at all (only CUDA 11.8/12.x). PyTorch 2.9, the first release with CUDA 13 support, only provides CUDA 13.0 (cu130) wheels, not CUDA 13.1. Since the requirements specify nvidia-nccl-cu13 (CUDA 13.1), users installing torch>=2.1.0a0 will receive CUDA 12.x wheels that are incompatible with the specified NCCL library. This breaks the installation for anyone not building PyTorch from source.

If the intent is to broaden PyTorch version support, the lower bound must remain at least >=2.9.0 to ensure CUDA 13.x compatibility. Alternatively, if older PyTorch versions are needed, the NCCL requirement must be adjusted to CUDA 12.x compatible versions, or documentation must explicitly state that source builds of PyTorch are required.

🤖 Prompt for AI Agents

In requirements.txt around lines 23-24, the expanded torch range (torch>=2.1.0a0,<=2.9.1) creates an impossible combo with the repo's nvidia-nccl-cu13 (CUDA 13.1) because official PyTorch wheels with CUDA 13 support start at 2.9.0 and only provide cu130, not cu131; fix by restoring a compatible lower bound (e.g., set torch>=2.9.0,<=2.9.1 or torch==2.9.*) so installed PyTorch has CUDA 13 support, or else change the NCCL package to a CUDA 12.x variant if you must allow torch 2.1–2.8, or if neither is acceptable, add a clear comment in requirements and README stating that PyTorch must be built from source for CUDA 13.1 and that the current pinned combos are incompatible.

EmmaQiaoCh · 2025-12-09T03:03:12Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-09T03:09:20Z

PR_Github #27405 [ run ] triggered by Bot. Commit: b065e62

EmmaQiaoCh · 2025-12-09T06:02:12Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-09T06:07:59Z

PR_Github #27436 [ run ] triggered by Bot. Commit: 66b5473

tensorrt-cicd · 2025-12-09T06:08:01Z

PR_Github #27405 [ run ] completed with state ABORTED. Commit: b065e62
LLM/main/L0_MergeRequest_PR #20939 (Blue Ocean) completed with status: ABORTED

EmmaQiaoCh · 2025-12-09T07:44:57Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-09T07:50:33Z

PR_Github #27462 [ run ] triggered by Bot. Commit: fdc2030

tensorrt-cicd · 2025-12-09T07:50:34Z

PR_Github #27462 [ run ] completed with state DISABLED
L0 testing is limited to prioritized users. User EmmaQiaoCh is not in the prioritized list. L0 testing cannot be triggered.

tensorrt-cicd · 2025-12-09T12:24:02Z

PR_Github #27436 [ run ] completed with state FAILURE. Commit: 66b5473
/LLM/main/L0_MergeRequest_PR pipeline #20964 (Partly Tested) completed with status: 'FAILURE'

EmmaQiaoCh · 2025-12-09T12:51:34Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-09T12:56:47Z

PR_Github #27495 [ run ] triggered by Bot. Commit: fdc2030

tensorrt-cicd · 2025-12-09T20:55:11Z

PR_Github #27495 [ run ] completed with state FAILURE. Commit: fdc2030
/LLM/main/L0_MergeRequest_PR pipeline #20975 (Partly Tested) completed with status: 'FAILURE'

EmmaQiaoCh · 2025-12-10T01:57:05Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-10T02:02:31Z

PR_Github #27593 [ run ] triggered by Bot. Commit: c47fa37

EmmaQiaoCh · 2025-12-10T08:13:02Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-10T08:37:54Z

PR_Github #27672 [ run ] triggered by Bot. Commit: 3996eef

tensorrt-cicd · 2025-12-10T08:37:56Z

PR_Github #27593 [ run ] completed with state ABORTED. Commit: c47fa37
LLM/main/L0_MergeRequest_PR #21063 (Blue Ocean) completed with status: ABORTED

tensorrt-cicd · 2025-12-10T09:20:16Z

PR_Github #27672 [ run ] completed with state FAILURE. Commit: 3996eef
/LLM/main/L0_MergeRequest_PR pipeline #21125 (Partly Tested) completed with status: 'FAILURE'

EmmaQiaoCh · 2025-12-10T09:49:38Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-10T09:56:05Z

PR_Github #27689 [ run ] triggered by Bot. Commit: 3996eef

EmmaQiaoCh · 2025-12-11T02:39:48Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-11T02:45:49Z

PR_Github #27778 [ run ] triggered by Bot. Commit: 5ab653d

tensorrt-cicd · 2025-12-25T03:13:57Z

PR_Github #29878 [ run ] triggered by Bot. Commit: 9fbe4c6

tensorrt-cicd · 2025-12-25T03:14:39Z

PR_Github #29879 [ run ] triggered by Bot. Commit: 9fbe4c6

tensorrt-cicd · 2025-12-25T03:14:41Z

PR_Github #29878 [ run ] completed with state ABORTED. Commit: 9fbe4c6

EmmaQiaoCh · 2025-12-25T05:51:24Z

/bot run --post-merge --disable-fail-fast

tensorrt-cicd · 2025-12-25T05:57:39Z

PR_Github #29903 [ run ] triggered by Bot. Commit: c7537d6

tensorrt-cicd · 2025-12-25T13:53:20Z

PR_Github #29903 [ run ] completed with state SUCCESS. Commit: c7537d6
/LLM/main/L0_MergeRequest_PR pipeline #22998 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: qqiao <[email protected]>

EmmaQiaoCh · 2025-12-30T04:07:33Z

/bot run --stage-list "Build-Docker-Images"

Signed-off-by: Emma Qiao <[email protected]>

EmmaQiaoCh · 2025-12-30T05:24:14Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-30T05:30:30Z

PR_Github #30133 [ run ] triggered by Bot. Commit: bfe1e20

tensorrt-cicd · 2025-12-30T08:38:55Z

PR_Github #30133 [ run ] completed with state FAILURE. Commit: bfe1e20
/LLM/main/L0_MergeRequest_PR pipeline #23187 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: qqiao <[email protected]>

EmmaQiaoCh · 2025-12-30T09:38:15Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-30T09:44:15Z

PR_Github #30165 [ run ] triggered by Bot. Commit: aabca98

tensorrt-cicd · 2025-12-30T09:49:27Z

PR_Github #30165 [ run ] completed with state FAILURE. Commit: aabca98
/LLM/main/L0_MergeRequest_PR pipeline #23213 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: qqiao <[email protected]>

EmmaQiaoCh · 2025-12-30T13:04:23Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-30T13:10:04Z

PR_Github #30177 [ run ] triggered by Bot. Commit: cb0057a

tensorrt-cicd · 2025-12-30T21:34:47Z

PR_Github #30177 [ run ] completed with state FAILURE. Commit: cb0057a
/LLM/main/L0_MergeRequest_PR pipeline #23224 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: qqiao <[email protected]>

Signed-off-by: Emma Qiao <[email protected]>

EmmaQiaoCh · 2025-12-31T02:12:18Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2025-12-31T02:18:21Z

PR_Github #30221 [ run ] triggered by Bot. Commit: 894b1bd

tensorrt-cicd · 2025-12-31T19:21:54Z

PR_Github #30221 [ run ] completed with state FAILURE. Commit: 894b1bd
/LLM/main/L0_MergeRequest_PR pipeline #23265 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: qqiao <[email protected]>

EmmaQiaoCh · 2026-01-01T03:37:37Z

/bot run --post-merge --disable-fail-fast

tensorrt-cicd · 2026-01-01T03:43:27Z

PR_Github #30315 [ run ] triggered by Bot. Commit: bff85f4

tensorrt-cicd · 2026-01-01T14:45:19Z

PR_Github #30315 [ run ] completed with state SUCCESS. Commit: bff85f4
/LLM/main/L0_MergeRequest_PR pipeline #23349 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

EmmaQiaoCh requested review from a team as code owners December 9, 2025 02:54

EmmaQiaoCh requested review from Shixiaowei02 and poweiw December 9, 2025 02:54

EmmaQiaoCh marked this pull request as draft December 9, 2025 02:54

coderabbitai bot reviewed Dec 9, 2025

View reviewed changes

Merge branch 'main' into emma/update_dependencies_1.2

c7537d6

Update to use public triton

3248684

Signed-off-by: qqiao <[email protected]>

Merge branch 'main' into emma/update_dependencies_1.2

bfe1e20

Signed-off-by: Emma Qiao <[email protected]>

Fix typo

aabca98

Signed-off-by: qqiao <[email protected]>

Fix typo and update sanity test image

cb0057a

Signed-off-by: qqiao <[email protected]>

EmmaQiaoCh added 2 commits December 31, 2025 02:09

Fix dockerfile

fc1e128

Signed-off-by: qqiao <[email protected]>

Merge branch 'main' into emma/update_dependencies_1.2

894b1bd

Signed-off-by: Emma Qiao <[email protected]>

Update image tags

bff85f4

Signed-off-by: qqiao <[email protected]>

Update dependencies #9818

Are you sure you want to change the base?

Update dependencies #9818

Uh oh!

Conversation

EmmaQiaoCh commented Dec 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Dec 9, 2025

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

EmmaQiaoCh commented Dec 9, 2025

Uh oh!

tensorrt-cicd commented Dec 9, 2025

Uh oh!

EmmaQiaoCh commented Dec 9, 2025

Uh oh!

tensorrt-cicd commented Dec 9, 2025

Uh oh!

tensorrt-cicd commented Dec 9, 2025

Uh oh!

EmmaQiaoCh commented Dec 9, 2025

Uh oh!

tensorrt-cicd commented Dec 9, 2025

Uh oh!

tensorrt-cicd commented Dec 9, 2025

Uh oh!

tensorrt-cicd commented Dec 9, 2025

Uh oh!

EmmaQiaoCh commented Dec 9, 2025

Uh oh!

tensorrt-cicd commented Dec 9, 2025

Uh oh!

tensorrt-cicd commented Dec 9, 2025

Uh oh!

EmmaQiaoCh commented Dec 10, 2025

Uh oh!

tensorrt-cicd commented Dec 10, 2025

Uh oh!

EmmaQiaoCh commented Dec 10, 2025

Uh oh!

tensorrt-cicd commented Dec 10, 2025

Uh oh!

tensorrt-cicd commented Dec 10, 2025

Uh oh!

tensorrt-cicd commented Dec 10, 2025

Uh oh!

EmmaQiaoCh commented Dec 10, 2025

Uh oh!

tensorrt-cicd commented Dec 10, 2025

Uh oh!

EmmaQiaoCh commented Dec 11, 2025

Uh oh!

tensorrt-cicd commented Dec 11, 2025

EmmaQiaoCh commented Dec 9, 2025 •

edited by coderabbitai bot

Loading