Update on the development branch #1316
kaiyux
announced in
Announcements
Replies: 1 comment
-
Does |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this March 19, 2024.
This update includes:
GptSession
without OpenMPI Run GptSession without openmpi? #1220executor
API, see documentation and examples inexamples/bindings
examples/gpt/README.md
for the latest commandsexamples/qwen/README.md
for the latest commands.trtllm-build
command, to generalize the feature better to more models.trtllm-build --max_prompt_embedding_table_size
instead.trtllm-build --world_size
flag to--auto_parallel
flag, the option is used for auto parallel planner only.AsyncLLMEngine
is removed,tensorrt_llm.GenerationExecutor
class is refactored to work with both explicitly launching withmpirun
in the application level, and accept an MPI communicator created bympi4py
examples/server
are removed, seeexamples/app
instead.SamplingConfig
tensors inModelRunnerCpp
ModelRunnerCpp
does not transferSamplingConfig
Tensor fields correctly #1183examples/run.py
only load one line from--input_file
benchmarks/cpp/README.md
nvcr.io/nvidia/pytorch:24.02-py3
nvcr.io/nvidia/tritonserver:24.02-py3
executor
API, seedocs/source/executor.md
Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions