Skip to content

Commit b85ab13

Browse files
doc: add supported data modality and types on multimodal serve (NVIDIA#5988)
Signed-off-by: yechank <[email protected]>
1 parent 48ddc3d commit b85ab13

File tree

1 file changed

+77
-5
lines changed

1 file changed

+77
-5
lines changed

docs/source/commands/trtllm-serve.rst

Lines changed: 77 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -67,9 +67,14 @@ Another example uses ``curl``:
6767
:linenos:
6868

6969
Multimodal Serving
70-
~~~~~~~~~~~~~~~~~
70+
~~~~~~~~~~~~~~~~~~
7171

72-
For multimodal models (e.g., Qwen2-VL), you'll need to create a configuration file and start the server with additional options:
72+
For multimodal models, you need to create a configuration file and start the server with additional options due to the following limitations:
73+
74+
* TRT-LLM multimodal is currently not compatible with ``kv_cache_reuse``
75+
* Multimodal models require ``chat_template``, so only the Chat API is supported
76+
77+
To set up multimodal models:
7378

7479
First, create a configuration file:
7580

@@ -78,7 +83,6 @@ First, create a configuration file:
7883
cat >./extra-llm-api-config.yml<<EOF
7984
kv_cache_config:
8085
enable_block_reuse: false
81-
free_gpu_memory_fraction: 0.6
8286
EOF
8387
8488
Then, start the server with the configuration file:
@@ -89,8 +93,8 @@ Then, start the server with the configuration file:
8993
--extra_llm_api_options ./extra-llm-api-config.yml \
9094
--backend pytorch
9195
92-
Completions API
93-
~~~~~~~~~~~~~~~
96+
Multimodal Chat API
97+
~~~~~~~~~~~~~~~~~~~
9498
9599
You can query Completions API with any http clients, a typical example is OpenAI Python client:
96100
@@ -104,6 +108,74 @@ Another example uses ``curl``:
104108
:language: bash
105109
:linenos:
106110
111+
Multimodal Modality Coverage
112+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
113+
114+
TRT-LLM multimodal supports the following modalities and data types (depending on the model):
115+
116+
**Text**
117+
118+
* No type specified:
119+
120+
.. code-block:: json
121+
122+
{"role": "user", "content": "What's the capital of South Korea?"}
123+
124+
* Explicit "text" type:
125+
126+
.. code-block:: json
127+
128+
{"role": "user", "content": [{"type": "text", "text": "What's the capital of South Korea?"}]}
129+
130+
**Image**
131+
132+
* Using "image_url" with URL:
133+
134+
.. code-block:: json
135+
136+
{"role": "user", "content": [
137+
{"type": "text", "text": "What's in this image?"},
138+
{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}
139+
]}
140+
141+
* Using "image_url" with base64-encoded data:
142+
143+
.. code-block:: json
144+
145+
{"role": "user", "content": [
146+
{"type": "text", "text": "What's in this image?"},
147+
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{image_base64}"}}
148+
]}
149+
150+
.. note::
151+
To convert images to base64-encoded format, use the utility function
152+
:func:`tensorrt_llm.utils.load_base64_image`. Refer to the
153+
`load_base64_image utility <https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/utils/load_base64_image.py>`__
154+
for implementation details.
155+
156+
**Video**
157+
158+
* Using "video_url":
159+
160+
.. code-block:: json
161+
162+
{"role": "user", "content": [
163+
{"type": "text", "text": "What's in this video?"},
164+
{"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}}
165+
]}
166+
167+
**Audio**
168+
169+
* Using "audio_url":
170+
171+
.. code-block:: json
172+
173+
{"role": "user", "content": [
174+
{"type": "text", "text": "What's in this audio?"},
175+
{"type": "audio_url", "audio_url": {"url": "https://example.com/audio.mp3"}}
176+
]}
177+
178+
107179
Benchmark
108180
---------
109181

0 commit comments

Comments
 (0)