You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're excited to announce that vLLM now supports the [Llama 4 herd of models](https://ai.meta.com/blog/llama-4-multimodal-intelligence/): **Scout** (17B-16E) and **Maverick** (17B-128E). You can run these powerful long-context, natively multi-modal (up to 8 images!), mixture-of-experts models in vLLM today by updating to version v0.8.3 or later:
10
+
We're excited to announce that vLLM now supports the [Llama 4 herd of models](https://ai.meta.com/blog/llama-4-multimodal-intelligence/): **Scout** (17B-16E) and **Maverick** (17B-128E). You can run these powerful long-context, natively multi-modal (up to 8-10 images with good results), mixture-of-experts models in vLLM today by updating to version v0.8.3 or later:
With the configurations above, we observe the following output tokens/s:
63
+
With the configurations above, we observe the following output tokens/s. Note that Scout is smaller but runnning with bfloat 16 while Maverick is running with fp8.
64
64
65
65

66
+
66
67
While more performance enhancements are on the way, we believe the Llama 4 models' efficient architecture and relatively small size make them practical for scaled usage today.
67
68
68
69
**Tips for Performance and Long Context:**
@@ -74,7 +75,7 @@ While more performance enhancements are on the way, we believe the Llama 4 model
74
75
**Other Hardware Support & Quantizations:**
75
76
76
77
* A100: We have verified that the bf16 versions of the models work well on A100 GPUs.
77
-
* INT4: An INT4-quantized version of the model checkpoint is currently a work in progress. Stay tuned for updates.
78
+
* INT4: An INT4-quantized version of the Scout model checkpoint is currently a work in progress. Stay tuned for updates.
78
79
* AMD MI300X: You can run Llama 4 on AMD MI300X GPUs by building [vLLM from source](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=rocm) and using the same commands as above.
0 commit comments