You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/models/vlm.rst
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,9 +20,9 @@ The following :ref:`engine arguments <engine_args>` are specific to VLMs:
20
20
Currently, the support for vision language models on vLLM has the following limitations:
21
21
22
22
* Only single image input is supported per text prompt.
23
-
* Dynamic ``image_input_shape`` is not supported: the input image will be resized to the static ``image_input_shape``. This means model output might not exactly match the HuggingFace implementation.
23
+
* Dynamic ``image_input_shape`` is not supported: the input image will be resized to the static ``image_input_shape``. This means our LLaVA-NeXT output may not exactly match the huggingface implementation.
24
24
25
-
We are continuously improving user & developer experience for VLMs. Please raise an issue on GitHub if you have any feedback or feature requests.
25
+
We are continuously improving user & developer experience for VLMs. Please `open an issue on GitHub<https://github.com/vllm-project/vllm/issues/new/choose>`_ if you have any feedback or feature requests.
0 commit comments