Update on the development branch #2563

kaiyux · 2024-12-11T08:36:53Z

kaiyux
Dec 11, 2024
Maintainer

Hi,

The TensorRT-LLM team is pleased to announce that we have pushed an update to the development branch (and the Triton backend) this Dec 11, 2024.

This update includes:

Features
- The LLM API
  - Added lookahead decoding support.
  - Added DeepSeek V1 support.
  - Added Medusa support.
- Added support for LogN scaling for Qwen models.
- Added quantization support for RecurrentGemma. Refer to examples/recurrentgemma/README.md.
- Added AutoAWQ checkpoints support for Qwen. Refer to the “INT4-AWQ” section in examples/qwen/README.md.
- Added allottedTimeMs to the C++ Request class to support per-request timeout.
API
- [BREAKING CHANGE] Chunked context is enabled by default when KV cache and paged context FMHA is enabled on non-RNN based models.
- [BREAKING CHANGE] Enable embedding sharing automatically when possible and remove the flag --use_embedding_sharing from convert checkpoints scripts.
- [BREAKING CHANGE] Cancelled requests now return empty results.
Bug fixes
- Fixed the in-place clamp operation usage in smooth quant. Thanks for the contribution from @StarrickLiu in The clamp in-place operation cannot modify the weight_scales tensor directly. #2485.
Infra
- The base Docker image for TensorRT-LLM is updated to nvcr.io/nvidia/pytorch:24.11-py3.
- The base Docker image for TensorRT-LLM Backend is updated to nvcr.io/nvidia/tritonserver:24.11-py3.
- The dependent TensorRT version is updated to 10.7.
- The dependent CUDA version is updated to 12.6.3.
- Starting from the latest release, TensorRT-LLM Python wheels available on PyPI support both Python 3.10 and Python 3.12.
Known Issues
- Windows build is broken and the team is working on it.

Thanks,
The TensorRT-LLM Engineering Team

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update on the development branch #2563

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Update on the development branch #2563

Uh oh!

Uh oh!

kaiyux Dec 11, 2024 Maintainer

Replies: 0 comments

kaiyux
Dec 11, 2024
Maintainer