Update on the development branch #1690
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this May 28, 2024.
This update includes:
examples/jais/README.md
.examples/dit/README.md
.Video NeVA
section inexamples/multimodal/README.md
.distil-whisper/distil-large-v3
, thanks to the contribution from @IbrahimAmin1 in [feat]: Add Option to convert and run distil-whisper large-v3 #1337.trtllm-build
command), see documents: examples/whisper/README.md.free_gpu_memory_fraction
inModelRunnerCpp
tokv_cache_free_gpu_memory_fraction
ModelRunnerCpp
, includingmax_tokens_in_paged_kv_cache
,kv_cache_enable_block_reuse
andenable_chunked_context
enable_executor
fromtensorrt_llm.LLM
API as it is using the C++Executor
API now.OutputConfig
ingenerate
API.BuildConfig
to thetensorrt_llm.LLM
API.LLM
construction phase, remove most of the trivial logs.SpeculativeDecodingMode.h
to choose between different speculative decoding techniques.SpeculativeDecodingModule.h
base class for speculative decoding techniquesdecodingMode.h
nvcr.io/nvidia/pytorch:24.04-py3
.nvcr.io/nvidia/tritonserver:24.04-py3
.Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions