@@ -29,7 +29,44 @@ This section will hold all the updates that have taken place since the blog post
29
29
vLLM with the transformers backend now supports ** Vision Language Models** . When user adds ` model_impl="transformers" ` ,
30
30
the correct class for text-only and multimodality will be deduced and loaded.
31
31
32
- Here is how one would use the API.
32
+ Here is how one can serve a multimodal model using the transformers backend.
33
+ ``` bash
34
+ vllm serve llava-hf/llava-onevision-qwen2-0.5b-ov-hf \
35
+ --model_impl transformers \
36
+ --disable-mm-preprocessor-cache \
37
+ --no-enable-prefix-caching \
38
+ --no-enable-chunked-prefill
39
+ ```
40
+
41
+ To consume the model one can use the ` openai ` API like so:
42
+ ``` python
43
+ from openai import OpenAI
44
+ openai_api_key = " EMPTY"
45
+ openai_api_base = " http://localhost:8000/v1"
46
+ client = OpenAI(
47
+ api_key = openai_api_key,
48
+ base_url = openai_api_base,
49
+ )
50
+ chat_response = client.chat.completions.create(
51
+ model = " llava-hf/llava-onevision-qwen2-0.5b-ov-hf" ,
52
+ messages = [{
53
+ " role" : " user" ,
54
+ " content" : [
55
+ {" type" : " text" , " text" : " What's in this image?" },
56
+ {
57
+ " type" : " image_url" ,
58
+ " image_url" : {
59
+ " url" : " http://images.cocodataset.org/val2017/000000039769.jpg" ,
60
+ },
61
+ },
62
+ ],
63
+ }],
64
+ )
65
+ print (" Chat response:" , chat_response)
66
+ ```
67
+
68
+ You can also directly initialize the vLLM engine using the ` LLM ` API. Here is the same model being
69
+ served using the ` LLM ` API.
33
70
34
71
``` python
35
72
from vllm import LLM , SamplingParams
0 commit comments