v0.21.0 #6606
QiJune
announced in
Announcements
v0.21.0
#6606
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
TensorRT-LLM Release 0.21.0
Key Features and Enhancements
Infrastructure Changes
nvcr.io/nvidia/pytorch:25.05-py3
.nvcr.io/nvidia/tritonserver:25.05-py3
.API Changes
Fixed Issues
Known Issues
What's Changed
test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False]
by @venkywonka in [cherry-pick] [CI] Waivetest_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False]
#5553New Contributors
Full Changelog: v0.21.0rc2...v0.21.0
This discussion was created from the release v0.21.0.
Beta Was this translation helpful? Give feedback.
All reactions