Update on the development branch #2298
DanBlanaru
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
The TensorRT-LLM team is pleased to announce that we have pushed an update to the development branch (and the Triton backend) this Oct 08, 2024.
This #2297 includes:
examples/run.py
and documentation is inexamples/draft_target_model/README.md
.ModelRunnerCpp
class.isParticipant
method to the C++Executor
API to check if the current process is a participant in the executor instance.trtllm-build
command.strongly_typed=False
to build the fp16 vision engine for the multimodal example. TensorRT 10 made the defaultstrongly_typed=True
so fp32 vision engines are built, even if input ONNX files are fp16. This issue is now fixed.trtllm-build --fast-build
with fake or random weights. Thanks to @ZJLi2013 for flagging it in trtllm-build with --fast-build ignore transformer layers #2135.assistant_model
.customAllReduce
performance by using Lamport-style AllReduce + Norm fusion.memcpy
over MPI to the target model's process inorchestrator
mode. This reduces the latency between the end of the draft model generation and beginning of target inference.Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions