Skip to content

Commit 79f64fd

Browse files
committed
add multimodal
Signed-off-by: Roger Wang <[email protected]>
1 parent 6a30c64 commit 79f64fd

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

_posts/2025-04-05-llama4.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,9 @@ vllm serve meta-llama/Llama-4-Scout-17B-16E-Instruct \
5454
vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 \
5555
--tensor-parallel-size 8
5656
```
57+
**Multimodality:**
58+
59+
The Llama 4 models excel at image understanding up to 8-10 images. By default, vLLM server accepts 1 image per request. Please pass `--limit-mm-per-prompt image=10` to serve up to 10 images per request with OpenAI-compatible API. We also recommend checking out our multi-image offline inference example with Llama-4 [here](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/vision_language_multi_image.py).
5760

5861
**Performance:**
5962

0 commit comments

Comments
 (0)