You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/dev/multimodal/adding_multimodal_model.rst
+48-20Lines changed: 48 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,40 +51,68 @@ As usual, follow :ref:`these steps <adding_a_new_model>` to implement the model
51
51
2. Register input mappers
52
52
-------------------------
53
53
54
-
For each modality type to support, decorate the model class with :meth:`MULTIMODAL_REGISTRY.register_input_mapper <vllm.multimodal.MultiModalRegistry.register_input_mapper>`.
54
+
For each modality type that the model accepts as input, decorate the model class with :meth:`MULTIMODAL_REGISTRY.register_input_mapper <vllm.multimodal.MultiModalRegistry.register_input_mapper>`.
55
55
This decorator accepts a function that maps multi-modal inputs to the keyword arguments you have previously defined in :meth:`~torch.nn.Module.forward`.
56
56
57
57
.. code-block:: diff
58
58
59
-
from vllm.model_executor.models.interfaces import SupportsVision
59
+
from vllm.model_executor.models.interfaces import SupportsVision
class YourModelForImage2Seq(nn.Module, SupportsVision):
87
+
88
+
Here are some examples:
89
+
90
+
- Image inputs (static feature size): `LLaVA-1.5 Model <https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llava.py>`__
91
+
- Image inputs (dynamic feature size): `LLaVA-NeXT Model <https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llava_next.py>`__
92
+
93
+
.. seealso::
94
+
:ref:`input_processing_pipeline`
95
+
96
+
97
+
4. (Optional) Register dummy data
73
98
---------------------------------
74
99
75
100
During startup, dummy data is passed to the vLLM model to allocate memory. This only consists of text input by default, which may not be applicable to multi-modal models.
76
101
In such cases, you can define your own dummy data by registering a factory method via :meth:`INPUT_REGISTRY.register_dummy_data <vllm.inputs.registry.InputRegistry.register_dummy_data>`.
77
102
78
103
.. code-block:: diff
79
104
80
-
from vllm.inputs import INPUT_REGISTRY
81
-
from vllm.model_executor.models.interfaces import SupportsVision
82
-
from vllm.multimodal import MULTIMODAL_REGISTRY
105
+
from vllm.inputs import INPUT_REGISTRY
106
+
from vllm.model_executor.models.interfaces import SupportsVision
0 commit comments