You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make sure to review and adhere to the original code's copyright and licensing terms!
15
-
```
15
+
:::
16
16
17
17
## 2. Make your code compatible with vLLM
18
18
@@ -80,10 +80,10 @@ def forward(
80
80
...
81
81
```
82
82
83
-
```{note}
83
+
:::{note}
84
84
Currently, vLLM supports the basic multi-head attention mechanism and its variant with rotary positional embeddings.
85
85
If your model employs a different attention mechanism, you will need to implement a new attention layer in vLLM.
86
-
```
86
+
:::
87
87
88
88
For reference, check out our [Llama implementation](gh-file:vllm/model_executor/models/llama.py). vLLM already supports a large number of models. It is recommended to find a model similar to yours and adapt it to your model's architecture. Check out <gh-dir:vllm/model_executor/models> for more examples.
Copy file name to clipboardExpand all lines: docs/source/contributing/model/multimodal.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,9 +48,9 @@ Further update the model as follows:
48
48
return vision_embeddings
49
49
```
50
50
51
-
```{important}
51
+
:::{important}
52
52
The returned `multimodal_embeddings` must be either a **3D {class}`torch.Tensor`** of shape `(num_items, feature_size, hidden_size)`, or a **list/tuple of 2D {class}`torch.Tensor`'s** of shape `(feature_size, hidden_size)`, so that `multimodal_embeddings[i]` retrieves the embeddings generated from the `i`-th multimodal data item (e.g, image) of the request.
53
-
```
53
+
:::
54
54
55
55
- Implement {meth}`~vllm.model_executor.models.interfaces.SupportsMultiModal.get_input_embeddings` to merge `multimodal_embeddings`with text embeddings from the `input_ids`. If input processing for the model is implemented correctly (see sections below), then you can leverage the utility function we provide to easily merge the embeddings.
56
56
@@ -89,10 +89,10 @@ Further update the model as follows:
Copy file name to clipboardExpand all lines: docs/source/contributing/model/registration.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,17 +17,17 @@ After you have implemented your model (see [tutorial](#new-model-basic)), put it
17
17
Then, add your model class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is automatically registered upon importing vLLM.
18
18
Finally, update our [list of supported models](#supported-models) to promote your model!
19
19
20
-
```{important}
20
+
:::{important}
21
21
The list of models in each section should be maintained in alphabetical order.
22
-
```
22
+
:::
23
23
24
24
## Out-of-tree models
25
25
26
26
You can load an external model using a plugin without modifying the vLLM codebase.
27
27
28
-
```{seealso}
28
+
:::{seealso}
29
29
[vLLM's Plugin System](#plugin-system)
30
-
```
30
+
:::
31
31
32
32
To register the model, use the following code:
33
33
@@ -45,11 +45,11 @@ from vllm import ModelRegistry
If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface.
50
50
Read more about that [here](#supports-multimodal).
51
-
```
51
+
:::
52
52
53
-
```{note}
53
+
:::{note}
54
54
Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
0 commit comments