You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
that will be used by the Triton server. The models can be found in the
165
-
[all_models](./tensorrt_llm/triton_backend/all_models) folder. The folder contains two groups of models:
166
-
-[`gpt`](./tensorrt_llm/triton_backend/all_models/gpt): Using TensorRT-LLM pure Python runtime.
165
+
[all_models](./tensorrt_llm/triton_backend/all_models) folder. The folder contains six groups of models:
166
+
-[`disaggregated_serving`](./tensorrt_llm/triton_backend/all_models/disaggregated_serving): Using the C++ TensorRT-LLM backend to run disaggregated serving.
167
+
-[`gpt`](./tensorrt_llm/triton_backend/all_models/gpt): Using TensorRT-LLM pure Python runtime. This model is deprecated and will be removed in a future release.
167
168
-[`inflight_batcher_llm`](./tensorrt_llm/triton_backend/all_models/inflight_batcher_llm/)`: Using the C++
168
169
TensorRT-LLM backend with the executor API, which includes the latest features
169
170
including inflight batching.
@@ -193,6 +194,9 @@ please see the [model config](./docs/model_config.md#tensorrt_llm_bls-model) sec
-[`llmapi`](./tensorrt_llm/triton_backend/all_models/llmapi/): Using TensorRT-LLM LLM API with pytorch backend.
198
+
-[`multimodal`](./tensorrt_llm/triton_backend/all_models/multimodal/): Using TensorRT-LLM python runtime for multimodal models. See [`multimodal.md`](./docs/multimodal.md) for more details.
199
+
-[`whisper`](./tensorrt_llm/triton_backend/all_models/whisper/): Using TensorRT-LLM python runtime for Whisper. See [`whisper.md`](./docs/whisper.md) for more details.
196
200
197
201
#### Modify the Model Configuration
198
202
Use the script to fill in the parameters in the model configuration files. For
0 commit comments