Release Torch-TensorRT v2.9.0 · pytorch/TensorRT

PyTorch 2.9, CUDA 13.0 TensorRT 10.13, Python 3.13

Torch-TensorRT 2.9.0 Linux x86-64 and Windows targets PyTorch 2.9, TensorRT 10.13, CUDA 13.0, 12.8, 12.6 and Python 3.10 ~ 3.13

Python

x86-64 Linux and Windows

CUDA 13.0 + Python 3.10-3.13 is Available via PyPI
- https://pypi.org/project/torch-tensorrt/
CUDA 12.6/12.8/13.0 + Python 3.10-3.13 is also Available via Pytorch Index
- https://download.pytorch.org/whl/torch-tensorrt

aarch64 SBSA Linux and Jetson Thor

CUDA 13.0 + Python 3.10–3.13 + Torch 2.9 + TensorRT 10.13 (Python 3.12 is the only version verified for Thor)
- Available via PyPI: https://pypi.org/project/torch-tensorrt/
- Available via PyTorch index: https://download.pytorch.org/whl/torch-tensorrt

NOTE: You must explicitly install TensorRT or use system installed TensorRT wheels for aarch64 platforms

uv pip install torch torch-tensorrt tensorrt

aarch64 Jetson Orin

 - no torch_tensorrt 2.9 release for Jetson Orin, please continue using torch_tensorrt 2.8 release

C++

x86-64 Linux and Windows

CUDA 13.0 Tarball / Zip

Deprecations

FX Frontend

The FX frontend was the precursor to the Dynamo frontend and a number of Dynamo components were shared between the two. Now that the Dynamo frontend is stable and all shared components have been decoupled we will no longer ship the FX frontend in binary releases starting in H1Y26. The FX frontend will remain in the source tree for the foreseeable future so source builds can reinstall the frontend if necessary.

New Features

LLM and VLM improvements

In this release, we’ve introduced several key enhancements:

Sliding Window Attention in SDPA Converter : Added support for sliding window attention, enabling successful compilation of the Gemma3 model (Gemma3-1B).
Dynamic Custom Lowering Passes
Refactored the lowering framework to allow users to dynamically register custom passes based on the configuration of Hugging Face models.
Vision-Language Model (VLM) Support
- Added support for Eagle2 and Qwen2.5-VL models via the new run_vlm.py utility.
- run_vlm.py enables compilation of both vision and language components of a VLM model. It also supports KV caching for efficient VLM generation.

See the documentation for detailed instructions on running these models.

TensorRT-RTX

TensorRT-RTX is a JIT-first version of TensorRT. Where as TensorRT will perform tactic selection and fusions during a build phase. TensorRT-RTX allows you to distribute builds prior to specializing for specific hardware so that one GPU agnostic package can be distributed to all users of your builds. Then on first use, TensorRT RTX will tune for the specific hardware your users are running. Torch-TensorRT-RTX is a build of Torch-TensorRT that uses the TensorRT-RTX compiler stack inplace of standard TensorRT. All APIs are identical to Torch-TensorRT, however, some features such as weak-typing and at compile time post training quantization are not supported.

Added exprimental support for Torch-TensorRT-RTX
You can check out the details on how to build and run here: https://docs.pytorch.org/TensorRT/getting_started/tensorrt_rtx.html

Improvements

Closed a number of performance gaps between Torch-TensorRT and ONNX TensorRT constructed graphs

What's Changed

fix the broken CC0 image link by @lanluo-nvidia in #3635
upgrade torch_tensorrt version from 2.8.0.dev to 2.9.0.dev by @lanluo-nvidia in #3639
Temporary fix to workaround the mutable decomposition error. by @lanluo-nvidia in #3636
Fix dynamo core test failure on Windows by @HolyWu in #3642
Closed the perf gap of resnet and enabled refit by @cehongwang in #3629
feat: Refactor LLM model zoo and add KV cache support by @peri044 in #3527
adding rotary embedding example, with graph rewrite for complex subgraph by @apbose in #3570
feat: Add bf16 support to cast converter by @peri044 in #3643
fix: replace add_identity by add_cast for type cast by @junstar92 in #3563
Refit debug patch by @cehongwang in #3620
fix compiler cl not found error in windows by @lanluo-nvidia in #3660
slice scatter support for dynamic cases by @apbose in #3513
fix the int8 quantization failure error by @lanluo-nvidia in #3663
chore(deps): bump transformers from 4.48.0 to 4.52.1 in /tests/modules by @dependabot[bot] in #3670
chore(deps): bump transformers from 4.50.0 to 4.51.0 in /examples/dynamo by @dependabot[bot] in #3669
chore(deps): bump transformers from 4.49.0 to 4.51.0 in /tests/py by @dependabot[bot] in #3668
remove tensorrt as build dependency by @lanluo-nvidia in #3681
disable jetpack build for now by @lanluo-nvidia in #3685
Fixed the CI problem by @cehongwang in #3680
fix windows build failure: add /utf-8 by @lanluo-nvidia in #3684
upgrade tensorrt from 10.11 to 10.12 by @lanluo-nvidia in #3686
Add Flux fp4 support by @lanluo-nvidia in #3689
feat: revert linear converter by @zewenli98 in #3703
Fixed python only runtime bug by @cehongwang in #3701
Disabled silu decomposition cast by @cehongwang in #3677
Jetson distributed fix by @apbose in #3716
Simplify the Group Norm converter by @zewenli98 in #3719
fix conv1d/deconv1d bug with stride more than 1 by @lanluo-nvidia in #3737
add test cases for strong typing by @lanluo-nvidia in #3739
Upgrade perf_run script to support TRT 10 and fix some issues by @zewenli98 in #3650
Fixed SDPA slow down and linear slow down by @cehongwang in #3700
remove breakpoint() by @lanluo-nvidia in #3750
add nvshmem in aarch64 by @lanluo-nvidia in #3769
chore(deps): bump transformers from 4.51.3 to 4.53.0 in /tools/perf by @dependabot[bot] in #3754
Cherry pick jetson enablement from 2.8 release branch to main by @lanluo-nvidia in #3765
Breaking Change: Remove the deprecated int8 calibrator related by @lanluo-nvidia in #3759
fix the typo by @lanluo-nvidia in #3773
Removal of BAZEL build files from python package and changes to make cpp tests work by @apbose in #3641
fix: atan2 strong type support & bug fix for integer dynamic shape by @chohk88 in #3751
upgrade torchvision from 0.23.0 to 0.24.0 by @lanluo-nvidia in #3772
chore: update resources in README.md by @peri044 in #3780
disable python 3.14 in CI by @lanluo-nvidia in #3787
fix: set example models to eval mode and follow the convention by @zewenli98 in #3770
fix: prelu perf gap on Unet by @zewenli98 in #3717
fix: batch norm issue encountered in RAFT by @zewenli98 in #3758
feat: Add support for Groot N1.5 model by @peri044 in #3736
skip flashinfer test due to torch upstream change by @lanluo-nvidia in #3794
Add support for TensorRT-RTX by @lanluo-nvidia in #3753
add fx deprecation notice + jetpack doc update by @lanluo-nvidia in #3795
addressing ngc aarch64 error by @apbose in #3705
fix pybind issue in windows by @lanluo-nvidia in #3801
llm: register sdpa variant by @lanluo-nvidia in #3802
fix bazel build //tests/core/runtime:runtime_tests issue by @lanluo-nvidia in #3804
Simplify Release workflow and Add windows zip in the release artifacts by @lanluo-nvidia in #3800
change llm model test from gemma3 to qwen to skip auth by @lanluo-nvidia in #3807
replace allow_complex_guards_as_runtime_assertswithprefer_deferred_ru… by @lanluo-nvidia in #3809
cherry pick 25.09 skip test to main by @lanluo-nvidia in #3810
feat: support dynamics for all inputs for embedding_bag converter by @zewenli98 in #3796
cherry pick is_thor from ngc/release/25.09 branch to main by @lanluo-nvidia in #3813
dlfw related changes by @lanluo-nvidia in #3814
fix guard_fn issue by @lanluo-nvidia in #3815
Same changes as #3812 chore: Add more models for benchmark and polish codes by @zewenli98 in #3822
Index converter dynamic cases fix by @apbose in #3694
add lowering pass to converter test by @lanluo-nvidia in #3820
Lluo/modelopt import restructure by @lanluo-nvidia in #3825
integrated vlm code for benchmark for Eagle2 by @chohk88 in #3698
chore: Upgrade TRT to 10.13.2.6 by @peri044 in #3791
enable cu130 by @lanluo-nvidia in #3808
release 2.9 branch cut by @lanluo-nvidia in #3828
cherry pick 3833 fix ci issue: from main to release/2.9 branch by @lanluo-nvidia in #3834
cherry pick of bug fix: #3837 by @peri044 in #3838
debug windows issue in release 2.9 by @lanluo-nvidia in #3836
fix thor tensorrt dependency issue by @lanluo-nvidia in #3843
fix test by @lanluo-nvidia in #3852
Lluo/cherry pick moe by @lanluo-nvidia in #3853
fix pkg_zip nested zip issue by @lanluo-nvidia in #3861

Full Changelog: v2.8.0...v2.9.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torch-TensorRT v2.9.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

PyTorch 2.9, CUDA 13.0 TensorRT 10.13, Python 3.13

Python

x86-64 Linux and Windows

aarch64 SBSA Linux and Jetson Thor

aarch64 Jetson Orin

C++

x86-64 Linux and Windows

Deprecations

FX Frontend

New Features

LLM and VLM improvements

TensorRT-RTX

Improvements

What's Changed

Contributors

Uh oh!