[Doc] Add usage of implicit text-only mode (#22561)

Roger Wang · sfeng33 · web-flow · commit 23472ff51cdf · 2025-08-08T23:04:19.000-07:00
Signed-off-by: Roger Wang &lt;hey@rogerw.me&gt;
Co-authored-by: Flora Feng &lt;4florafeng@gmail.com&gt;
diff --git a/docs/models/supported_models.md b/docs/models/supported_models.md
@@ -583,6 +583,9 @@ See [this page](../features/multimodal_inputs.md) on how to pass multi-modal inp
 
     **This is no longer required if you are using vLLM V1.**
 
+!!! tip
+    For hybrid-only models such as Llama-4, Step3 and Mistral-3, a text-only mode can be enabled by setting all supported multimodal modalities to 0 (e.g, `--limit-mm-per-prompt '{"image":0}`) so that their multimodal modules will not be loaded to free up more GPU memory for KV cache.
+
 !!! note
     vLLM currently only supports adding LoRA to the language backbone of multimodal models.