@@ -67,9 +67,14 @@ Another example uses ``curl``:
6767 :linenos:
6868
6969Multimodal Serving
70- ~~~~~~~~~~~~~~~~~
70+ ~~~~~~~~~~~~~~~~~~
7171
72- For multimodal models (e.g., Qwen2-VL), you'll need to create a configuration file and start the server with additional options:
72+ For multimodal models, you need to create a configuration file and start the server with additional options due to the following limitations:
73+
74+ * TRT-LLM multimodal is currently not compatible with ``kv_cache_reuse ``
75+ * Multimodal models require ``chat_template ``, so only the Chat API is supported
76+
77+ To set up multimodal models:
7378
7479First, create a configuration file:
7580
@@ -78,7 +83,6 @@ First, create a configuration file:
7883 cat > ./extra-llm-api-config.yml<< EOF
7984 kv_cache_config:
8085 enable_block_reuse: false
81- free_gpu_memory_fraction: 0.6
8286 EOF
8387
8488Then, start the server with the configuration file:
@@ -89,8 +93,8 @@ Then, start the server with the configuration file:
8993 --extra_llm_api_options ./extra-llm-api-config.yml \
9094 --backend pytorch
9195
92- Completions API
93- ~~~~~~~~~~~~~~~
96+ Multimodal Chat API
97+ ~~~~~~~~~~~~~~~~~~~
9498
9599You can query Completions API with any http clients, a typical example is OpenAI Python client:
96100
@@ -104,6 +108,74 @@ Another example uses ``curl``:
104108 :language: bash
105109 :linenos:
106110
111+ Multimodal Modality Coverage
112+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
113+
114+ TRT-LLM multimodal supports the following modalities and data types (depending on the model):
115+
116+ **Text**
117+
118+ * No type specified:
119+
120+ .. code-block:: json
121+
122+ {"role": "user", "content": "What's the capital of South Korea?"}
123+
124+ * Explicit "text" type:
125+
126+ .. code-block:: json
127+
128+ {"role": "user", "content": [{"type": "text", "text": "What's the capital of South Korea?"}]}
129+
130+ **Image**
131+
132+ * Using "image_url" with URL:
133+
134+ .. code-block:: json
135+
136+ {"role": "user", "content": [
137+ {"type": "text", "text": "What's in this image?"},
138+ {"type": "image_url", "image_url": {"url": "https://example.com/image.png"}}
139+ ]}
140+
141+ * Using "image_url" with base64-encoded data:
142+
143+ .. code-block:: json
144+
145+ {"role": "user", "content": [
146+ {"type": "text", "text": "What's in this image?"},
147+ {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{image_base64}"}}
148+ ]}
149+
150+ .. note::
151+ To convert images to base64-encoded format, use the utility function
152+ :func:` tensorrt_llm.utils.load_base64_image` . Refer to the
153+ ` load_base64_image utility < https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/utils/load_base64_image.py> ` __
154+ for implementation details.
155+
156+ **Video**
157+
158+ * Using "video_url":
159+
160+ .. code-block:: json
161+
162+ {"role": "user", "content": [
163+ {"type": "text", "text": "What's in this video?"},
164+ {"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}}
165+ ]}
166+
167+ **Audio**
168+
169+ * Using "audio_url":
170+
171+ .. code-block:: json
172+
173+ {"role": "user", "content": [
174+ {"type": "text", "text": "What's in this audio?"},
175+ {"type": "audio_url", "audio_url": {"url": "https://example.com/audio.mp3"}}
176+ ]}
177+
178+
107179Benchmark
108180---------
109181
0 commit comments