You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: we highly recommend to turn on attn_temperature_tuning to improve accuracy for long contexts longer than 32K tokens, and VLLM_DISABLE_COMPILE_CACHE=1 is required.
60
-
61
59
**Multimodality:**
62
60
63
61
The Llama 4 models excel at image understanding up to 8-10 images. By default, vLLM server accepts 1 image per request. Please pass `--limit-mm-per-prompt image=10` to serve up to 10 images per request with OpenAI-compatible API. We also recommend checking out our multi-image offline inference example with Llama-4 [here](https://github.com/vllm-project/vllm/blob/v0.8.3/examples/offline_inference/vision_language_multi_image.py).
@@ -108,4 +106,3 @@ We extend our sincere thanks to the Meta team for their implementation of the mo
108
106
We also thank the AMD team for their support in enabling these models on MI300X: [Hongxia Yang](https://github.com/hongxiayang) and Weijun Jiang.
109
107
110
108
The vLLM team’s performance benchmarks were run on hardware generously provided by Nebius and NVIDIA.
0 commit comments