Skip to content

Commit cbaf508

Browse files
wonjerrywonjae.lee0
andauthored
[Docs] Add API usage examples for modality control in online serving (vllm-project#411)
Signed-off-by: wonjae.lee0 <wonjae.lee0@navercorp.com> Co-authored-by: wonjae.lee0 <wonjae.lee0@navercorp.com>
1 parent cfc356f commit cbaf508

File tree

2 files changed

+148
-2
lines changed

2 files changed

+148
-2
lines changed

docs/user_guide/examples/online_serving/qwen2_5_omni.md

Lines changed: 74 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,13 +74,86 @@ bash run_curl_multimodal_generation.sh mixed_modalities
7474
```
7575

7676
## Modality control
77-
If you want to control output modalities, e.g. only output text, you can run the command below:
77+
78+
You can control output modalities to specify which types of output the model should generate. This is useful when you only need text output and want to skip audio generation stages for better performance.
79+
80+
### Supported modalities
81+
82+
| Modalities | Output |
83+
|------------|--------|
84+
| `["text"]` | Text only |
85+
| `["audio"]` | Text + Audio |
86+
| `["text", "audio"]` | Text + Audio |
87+
| Not specified | Text + Audio (default) |
88+
89+
### Using curl
90+
91+
#### Text only
92+
93+
```bash
94+
curl http://localhost:8091/v1/chat/completions \
95+
-H "Content-Type: application/json" \
96+
-d '{
97+
"model": "Qwen/Qwen2.5-Omni-7B",
98+
"messages": [{"role": "user", "content": "Describe vLLM in brief."}],
99+
"modalities": ["text"]
100+
}'
101+
```
102+
103+
#### Text + Audio
104+
105+
```bash
106+
curl http://localhost:8091/v1/chat/completions \
107+
-H "Content-Type: application/json" \
108+
-d '{
109+
"model": "Qwen/Qwen2.5-Omni-7B",
110+
"messages": [{"role": "user", "content": "Describe vLLM in brief."}],
111+
"modalities": ["audio"]
112+
}'
113+
```
114+
115+
### Using Python client
116+
78117
```bash
79118
python openai_chat_completion_client_for_multimodal_generation.py \
80119
--query-type mixed_modalities \
81120
--modalities text
82121
```
83122

123+
### Using OpenAI Python SDK
124+
125+
#### Text only
126+
127+
```python
128+
from openai import OpenAI
129+
130+
client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
131+
132+
response = client.chat.completions.create(
133+
model="Qwen/Qwen2.5-Omni-7B",
134+
messages=[{"role": "user", "content": "Describe vLLM in brief."}],
135+
modalities=["text"]
136+
)
137+
print(response.choices[0].message.content)
138+
```
139+
140+
#### Text + Audio
141+
142+
```python
143+
from openai import OpenAI
144+
145+
client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
146+
147+
response = client.chat.completions.create(
148+
model="Qwen/Qwen2.5-Omni-7B",
149+
messages=[{"role": "user", "content": "Describe vLLM in brief."}],
150+
modalities=["audio"]
151+
)
152+
# Response contains two choices: one with text, one with audio
153+
print(response.choices[0].message.content) # Text response
154+
print(response.choices[1].message.audio) # Audio response
155+
```
156+
84157
## Run Local Web UI Demo
85158

86159
This Web UI demo allows users to interact with the model through a web browser.

docs/user_guide/examples/online_serving/qwen3_omni.md

Lines changed: 74 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,86 @@ sudo apt install ffmpeg
8282
```
8383

8484
## Modality control
85-
If you want to control output modalities, e.g. only output text, you can run the command below:
85+
86+
You can control output modalities to specify which types of output the model should generate. This is useful when you only need text output and want to skip audio generation stages for better performance.
87+
88+
### Supported modalities
89+
90+
| Modalities | Output |
91+
|------------|--------|
92+
| `["text"]` | Text only |
93+
| `["audio"]` | Text + Audio |
94+
| `["text", "audio"]` | Text + Audio |
95+
| Not specified | Text + Audio (default) |
96+
97+
### Using curl
98+
99+
#### Text only
100+
101+
```bash
102+
curl http://localhost:8091/v1/chat/completions \
103+
-H "Content-Type: application/json" \
104+
-d '{
105+
"model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
106+
"messages": [{"role": "user", "content": "Describe vLLM in brief."}],
107+
"modalities": ["text"]
108+
}'
109+
```
110+
111+
#### Text + Audio
112+
113+
```bash
114+
curl http://localhost:8091/v1/chat/completions \
115+
-H "Content-Type: application/json" \
116+
-d '{
117+
"model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
118+
"messages": [{"role": "user", "content": "Describe vLLM in brief."}],
119+
"modalities": ["audio"]
120+
}'
121+
```
122+
123+
### Using Python client
124+
86125
```bash
87126
python openai_chat_completion_client_for_multimodal_generation.py \
88127
--query-type use_image \
89128
--modalities text
90129
```
91130

131+
### Using OpenAI Python SDK
132+
133+
#### Text only
134+
135+
```python
136+
from openai import OpenAI
137+
138+
client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
139+
140+
response = client.chat.completions.create(
141+
model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
142+
messages=[{"role": "user", "content": "Describe vLLM in brief."}],
143+
modalities=["text"]
144+
)
145+
print(response.choices[0].message.content)
146+
```
147+
148+
#### Text + Audio
149+
150+
```python
151+
from openai import OpenAI
152+
153+
client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
154+
155+
response = client.chat.completions.create(
156+
model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
157+
messages=[{"role": "user", "content": "Describe vLLM in brief."}],
158+
modalities=["audio"]
159+
)
160+
# Response contains two choices: one with text, one with audio
161+
print(response.choices[0].message.content) # Text response
162+
print(response.choices[1].message.audio) # Audio response
163+
```
164+
92165
## Run Local Web UI Demo
93166

94167
This Web UI demo allows users to interact with the model through a web browser.

0 commit comments

Comments
 (0)