Skip to content

Commit 08bf6a4

Browse files
Bounty-hunterdongbo910220
authored andcommitted
[FEATURE] /v1/images/edit interface (#1101)
Signed-off-by: dengyunyang <584797741@qq.com>
1 parent 3cf2803 commit 08bf6a4

File tree

6 files changed

+974
-140
lines changed

6 files changed

+974
-140
lines changed

docs/.nav.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ nav:
77
- Serving:
88
- OpenAI-Compatible API:
99
- Image Generation: serving/image_generation_api.md
10+
- Image Edit: serving/image_edit_api.md
1011
- Examples:
1112
- examples/README.md
1213
- Offline Inference:

docs/serving/image_edit_api.md

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
# Image Edit API
2+
3+
vLLM-Omni provides an OpenAI DALL-E compatible API for image edit using diffusion models.
4+
5+
Each server instance runs a single model (specified at startup via `vllm serve <model> --omni`).
6+
7+
## Quick Start
8+
9+
### Start the Server
10+
11+
For example...
12+
13+
```bash
14+
# Qwen-Image
15+
vllm serve Qwen/Qwen-Image-Edit-2511 --omni --port 8000
16+
17+
18+
### Generate Images
19+
20+
**Using curl:**
21+
22+
```bash
23+
curl -s -D >(grep -i x-request-id >&2) \
24+
-o >(jq -r '.data[0].b64_json' | base64 --decode > gift-basket.png) \
25+
-X POST "http://localhost:8000/v1/images/edits" \
26+
-F "model=xxx" \
27+
-F "image=@./xx.png" \
28+
-F "prompt='this bear is wearing sportwear. holding a basketball, and bending one leg.'" \
29+
-F "size=1024x1024" \
30+
-F "output_format=png"
31+
```
32+
33+
34+
**Using OpenAI SDK:**
35+
36+
```python
37+
import base64
38+
from openai import OpenAI
39+
from pathlib import Path
40+
client = OpenAI(
41+
api_key="None",
42+
base_url="http://localhost:8000/v1"
43+
)
44+
45+
input_image_url = "https://vllm-public-assets.s3.us-west-2.amazonaws.com/omni-assets/qwen-bear.png"
46+
47+
result = client.images.edit(
48+
image=[],
49+
model="Qwen-Image-Edit-2511",
50+
prompt="Change the bears in the two input images into walking together.",
51+
size='512x512',
52+
stream=False,
53+
output_format='jpeg',
54+
# url格式
55+
extra_body={
56+
"url": [input_image_url1,input_image_url],
57+
"num_inference_steps": 50,
58+
"guidance_scale": 1,
59+
"seed": 777,
60+
}
61+
)
62+
63+
image_base64 = result.data[0].b64_json
64+
image_bytes = base64.b64decode(image_base64)
65+
66+
# Save the image to a file
67+
with open("edit_out_http.jpeg", "wb") as f:
68+
f.write(image_bytes)
69+
```
70+
71+
## API Reference
72+
73+
### Endpoint
74+
75+
```
76+
POST /v1/images/edits
77+
Content-Type: multipart/form-data
78+
```
79+
80+
### Request Parameters
81+
82+
#### OpenAI Standard Parameters
83+
84+
| Parameter | Type | Default | Description |
85+
|-----------|------|---------|-------------|
86+
| `prompt` | string | **required** | A text description of the desired image |
87+
| `model` | string | server's model | Model to use (optional, should match server if specified) |
88+
| `image` | string or array | **required** | The image(s) to edit. |
89+
| `n` | integer | 1 | Number of images to generate (1-10) |
90+
| `size` | string | "auto" | Image dimensions in WxH format (e.g., "1024x1024", "512x512"), when set to auto, it decide size from first input image. |
91+
| `response_format` | string | "b64_json" | Response format (only "b64_json" supported) |
92+
| `user` | string | null | User identifier for tracking |
93+
| `output_format` | string | "png" | The format in which the generated images are returned. Must be one of "png", "jpg", "jpeg", "webp". |
94+
| `output_compression` | integer | 100 | The compression level (0-100%) for the generated images. |
95+
| `background` | string or null | "auto" | Allows to set transparency for the background of the generated image(s).
96+
97+
#### vllm-omni Extension Parameters
98+
99+
| Parameter | Type | Default | Description |
100+
|-----------|------|---------|-------------|
101+
| `url` | string or array | None | The image(s) to edit. |
102+
| `negative_prompt` | string | null | Text describing what to avoid in the image |
103+
| `num_inference_steps` | integer | model defaults | Number of diffusion steps |
104+
| `guidance_scale` | float | model defaults | Classifier-free guidance scale (typically 0.0-20.0) |
105+
| `true_cfg_scale` | float | model defaults | True CFG scale (model-specific parameter, may be ignored if not supported) |
106+
| `seed` | integer | null | Random seed for reproducibility |
107+
108+
### Response Format
109+
110+
```json
111+
{
112+
"created": 1701234567,
113+
"data": [
114+
{
115+
"b64_json": "<base64-encoded PNG>",
116+
"url": null,
117+
"revised_prompt": null
118+
}
119+
],
120+
"output_format": null,
121+
"size": null,
122+
}
123+
```
124+
125+
## Examples
126+
127+
### Multiple Images input
128+
129+
```bash
130+
curl -s -D >(grep -i x-request-id >&2) \
131+
-o >(jq -r '.data[0].b64_json' | base64 --decode > gift-basket.png) \
132+
-X POST "http://localhost:8000/v1/images/edits" \
133+
-F "model=xxx" \
134+
-F "image=@xx.png" \
135+
-F "image=@xx.png"
136+
-F "prompt='this bear is wearing sportwear. holding a basketball, and bending one leg.'" \
137+
-F "size=1024x1024" \
138+
-F "output_format=png"
139+
```
140+
141+
142+
## Parameter Handling
143+
144+
The API passes parameters directly to the diffusion pipeline without model-specific transformation:
145+
146+
- **Default values**: When parameters are not specified, the underlying model uses its own defaults
147+
- **Pass-through design**: User-provided values are forwarded directly to the diffusion engine
148+
- **Minimal validation**: Only basic type checking and range validation at the API level
149+
150+
### Parameter Compatibility
151+
152+
The API passes parameters directly to the diffusion pipeline without model-specific validation.
153+
154+
- Unsupported parameters may be silently ignored by the model
155+
- Incompatible values will result in errors from the underlying pipeline
156+
- Recommended values vary by model - consult model documentation
157+
158+
**Best Practice:** Start with the model's recommended parameters, then adjust based on your needs.
159+
160+
## Error Responses
161+
162+
### 400 Bad Request
163+
164+
Invalid parameters (e.g., model mismatch):
165+
166+
```json
167+
{
168+
"detail": "Invalid size format: '1024x'. Expected format: 'WIDTHxHEIGHT' (e.g., '1024x1024')."
169+
}
170+
```
171+
172+
### 422 Unprocessable Entity
173+
174+
Validation errors (missing required fields):
175+
176+
```json
177+
{
178+
"detail": "Field 'image' or 'url' is required"
179+
}
180+
```
181+
182+
## Troubleshooting
183+
184+
### Server Not Running
185+
186+
```bash
187+
# Check if server is responding
188+
curl -X http://localhost:8000/v1/images/edit \
189+
-F "prompt='test'"
190+
```
191+
192+
### Out of Memory
193+
194+
If you encounter OOM errors:
195+
1. Reduce image size: `"size": "512x512"`
196+
2. Reduce inference steps: `"num_inference_steps": 25`
197+
198+
## Development
199+
200+
Enable debug logging to see prompts and generation details:
201+
202+
```bash
203+
vllm serve Qwen/Qwen-Image-Edit-2511 --omni \
204+
--uvicorn-log-level debug
205+
```

0 commit comments

Comments
 (0)