Skip to content

Commit e69832e

Browse files
committed
adding openai consumption and serving
Signed-off-by: ariG23498 <[email protected]>
1 parent c51dd61 commit e69832e

File tree

1 file changed

+38
-1
lines changed

1 file changed

+38
-1
lines changed

_posts/2025-04-11-transformers-backend.md

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,44 @@ This section will hold all the updates that have taken place since the blog post
2929
vLLM with the transformers backend now supports **Vision Language Models**. When user adds `model_impl="transformers"`,
3030
the correct class for text-only and multimodality will be deduced and loaded.
3131

32-
Here is how one would use the API.
32+
Here is how one can serve a multimodal model using the transformers backend.
33+
```bash
34+
vllm serve llava-hf/llava-onevision-qwen2-0.5b-ov-hf \
35+
--model_impl transformers \
36+
--disable-mm-preprocessor-cache \
37+
--no-enable-prefix-caching \
38+
--no-enable-chunked-prefill
39+
```
40+
41+
To consume the model one can use the `openai` API like so:
42+
```python
43+
from openai import OpenAI
44+
openai_api_key = "EMPTY"
45+
openai_api_base = "http://localhost:8000/v1"
46+
client = OpenAI(
47+
api_key=openai_api_key,
48+
base_url=openai_api_base,
49+
)
50+
chat_response = client.chat.completions.create(
51+
model="llava-hf/llava-onevision-qwen2-0.5b-ov-hf",
52+
messages=[{
53+
"role": "user",
54+
"content": [
55+
{"type": "text", "text": "What's in this image?"},
56+
{
57+
"type": "image_url",
58+
"image_url": {
59+
"url": "http://images.cocodataset.org/val2017/000000039769.jpg",
60+
},
61+
},
62+
],
63+
}],
64+
)
65+
print("Chat response:", chat_response)
66+
```
67+
68+
You can also directly initialize the vLLM engine using the `LLM` API. Here is the same model being
69+
served using the `LLM` API.
3370

3471
```python
3572
from vllm import LLM, SamplingParams

0 commit comments

Comments
 (0)