Skip to content

Commit f4bd7a3

Browse files
authored
Image2image /images/edit endpoint (#3545)
🛠 Summary /v3/images/edit endpoint handling via multipart fields rather than JSON body. documentation & demo.
1 parent fcfa3ab commit f4bd7a3

21 files changed

+912
-163
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,5 @@ out
3636
*.errors.txt
3737
tmp/
3838
*.zip
39-
*.tar.gz
39+
*.tar.gz
40+
models

demos/image_generation/README.md

Lines changed: 40 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Image Generation with OpenAI API {#ovms_demos_image_generation}
22

3-
This demo shows how to deploy image generation models (Stable Diffusion/Stable Diffusion 3/Stable Diffusion XL/FLUX) in the OpenVINO Model Server.
4-
Image generation pipeline is exposed via [OpenAI API](https://platform.openai.com/docs/api-reference/images/create) `images/generations` endpoints.
3+
This demo shows how to deploy image generation models (Stable Diffusion/Stable Diffusion 3/Stable Diffusion XL/FLUX) to create and edit images with the OpenVINO Model Server.
4+
Image generation pipelines are exposed via [OpenAI API](https://platform.openai.com/docs/api-reference/images/create) `images/generations` and `images/edits` endpoints.
55

66
> **Note:** This demo was tested on Intel® Xeon®, Intel® Core®, Intel® Arc™ A770, Intel® Arc™ B580 on Ubuntu 22/24, RedHat 9 and Windows 11.
77
@@ -375,9 +375,9 @@ curl http://localhost:8000/v1/config
375375

376376
A single servable exposes following endpoints:
377377
- text to image: `images/generations`
378+
- image to image: `images/edits`
378379

379380
Endpoints unsupported for now:
380-
- image to image: `images/edits`
381381
- inpainting: `images/edits` with `mask` field
382382

383383
All requests are processed in unary format, with no streaming capabilities.
@@ -474,9 +474,45 @@ Output file (`output2.png`):
474474
![output2](./output2.png)
475475

476476

477+
### Requesting image edit with OpenAI Python package
478+
479+
```python
480+
from openai import OpenAI
481+
import base64
482+
from io import BytesIO
483+
from PIL import Image
484+
485+
client = OpenAI(
486+
base_url="http://localhost:8000/v3",
487+
api_key="unused"
488+
)
489+
490+
response = client.images.edit(
491+
model="OpenVINO/FLUX.1-schnell-int4-ov",
492+
image=open("output2.png", "rb"),
493+
prompt="pink cats",
494+
extra_body={
495+
"rng_seed": 60,
496+
"size": "512x512",
497+
"num_inference_steps": 3,
498+
"strength": 0.7
499+
}
500+
)
501+
base64_image = response.data[0].b64_json
502+
503+
image_data = base64.b64decode(base64_image)
504+
image = Image.open(BytesIO(image_data))
505+
image.save('edit_output.png')
506+
507+
```
508+
509+
Output file (`edit_output.png`):
510+
![edit_output](./edit_output.png)
511+
477512

478513

479514
## References
480515
- [Image Generation API](../../docs/model_server_rest_api_image_generation.md)
516+
- [Image Edit API](../../docs/model_server_rest_api_image_edit.md)
481517
- [Writing client code](../../docs/clients_genai.md)
482-
- [Image Generation calculator reference](../../docs/image_generation/reference.md)
518+
- [Image Generation/Edit calculator reference](../../docs/image_generation/reference.md)
429 KB
Loading

docs/clients_genai.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Completions API <ovms_docs_rest_api_completion>
1111
Embeddings API <ovms_docs_rest_api_embeddings>
1212
Reranking API <ovms_docs_rest_api_rerank>
1313
Image generation API <ovms_docs_rest_api_image_generation>
14+
Image generation API <ovms_docs_rest_api_image_edit>
1415
```
1516
## Introduction
1617
Beside Tensorflow Serving API (`/v1`) and KServe API (`/v2`) frontends, the model server supports a range of endpoints for generative use cases (`v3`). They are extendible using MediaPipe graphs.
@@ -21,6 +22,7 @@ OpenAI compatible endpoints:
2122
- [completions](./model_server_rest_api_completions.md)
2223
- [embeddings](./model_server_rest_api_embeddings.md)
2324
- [images/generations](./model_server_rest_api_image_generation.md)
25+
- [images/edit](./model_server_rest_api_image_edit.md)
2426

2527
Cohere Compatible endpoint:
2628
- [rerank](./model_server_rest_api_rerank.md)

docs/image-edit-zebras.png

571 KB
Loading

docs/image_generation/reference.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ struct HttpPayload {
1818
1919
The input JSON content should be compatible with the [Image Generation API](../model_server_rest_api_image_generation.md).
2020
21+
The input multi-part content should be compatible with the [Image Edit API](../model_server_rest_api_image_edit.md).
22+
2123
The input also includes a side packet with a reference to `IMAGE_GEN_NODE_RESOURCES` which is a shared object representing multiple OpenVINO GenAI pipelines built from OpenVINO models loaded into memory just once.
2224
2325
**Every node based on Image Generation Calculator MUST have exactly that specification of this side packet:**
@@ -52,14 +54,18 @@ Above node configuration should be used as a template since user is not expected
5254

5355
The calculator supports the following `node_options` for tuning the pipeline configuration:
5456
- `required string models_path` - location of the models and scheduler directory (can be relative);
55-
- `optional string device` - device to load models to. Supported values: "CPU", "GPU", "NPU" [default = "CPU"]
57+
- `optional string device` - device to load models to. Supported values: "CPU", "GPU", "NPU" [default = "CPU"] or mixed - space separated. Example: `CPU GPU NPU` equals to `text_encode=CPU denoise=GPU vae=NPU`
5658
- `optional string plugin_config` - [OpenVINO device plugin configuration](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes.html) and additional pipeline options. Should be provided in the same format for regular [models configuration](../parameters.md#model-configuration-options). The config is used for all models in the pipeline except for tokenizers (text encoders/decoders, unet, vae) [default = "{}"]
5759
- `optional string max_resolution` - maximum resolution allowed for generation. Requests exceeding this value will be rejected. [default = "4096x4096"];
5860
- `optional string default_resolution` - default resolution used for generation. If not specified, underlying model shape will determine final resolution.
5961
- `optional uint64 max_num_images_per_prompt` - maximum number of images generated per prompt. Requests exceeding this value will be rejected. [default = 10];
6062
- `optional uint64 default_num_inference_steps` - default number of inference steps used for generation, if not specified by the request [default = 50];
6163
- `optional uint64 max_num_inference_steps` - maximum number of inference steps allowed for generation. Requests exceeding this value will be rejected. [default = 100];
6264

65+
Static model resolution settings:
66+
- `optional string resolution` - enforces static resolution for all requests. When specified, underlying models are reshaped to this resolution.
67+
- `optional uint64 num_images_per_prompt` - used together with max_resolution, to define batch size in static model shape.
68+
- `optional float guidance_scale` - used together with max_resolution
6369

6470
## Models Directory
6571

@@ -161,4 +167,5 @@ Check [tested models](https://github.com/openvinotoolkit/openvino.genai/blob/mas
161167

162168
## References
163169
- [Image Generation API](../model_server_rest_api_image_generation.md)
170+
- [Image Edit API](../model_server_rest_api_image_edit.md)
164171
- Demos on [CPU/GPU](../../demos/image_generation/README.md)
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# OpenAI API image edit endpoint {#ovms_docs_rest_api_image_edit}
2+
3+
## API Reference
4+
OpenVINO Model Server includes now the `images/edits` endpoint using OpenAI API.
5+
It is used to execute [image2image](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/cpp/image_generation) and [inpainting](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/cpp/image_generation) tasks with OpenVINO GenAI pipelines.
6+
Please see the [OpenAI API Reference](https://platform.openai.com/docs/api-reference/images/createEdit) for more information on the API.
7+
The endpoint is exposed via a path:
8+
9+
<b>http://server_name:port/v3/images/edits</b>
10+
11+
Request body must be in `multipart/form-data` format.
12+
13+
### Example request
14+
```
15+
curl -X POST http://localhost:8000/v3/images/edits \
16+
-F "model=OpenVINO/stable-diffusion-v1-5-fp16-ov" \
17+
-F "image=@three_cats.png" \
18+
-F "strength=0.7" \
19+
-F 'prompt=Three zebras' \
20+
| jq -r '.data[0].b64_json' | base64 --decode > edit.png
21+
```
22+
23+
### Generate 4 variations of the edit
24+
```
25+
curl -X POST http://localhost:8000/v3/images/edits \
26+
-F "model=OpenVINO/stable-diffusion-v1-5-fp16-ov" \
27+
-F "image=@three_cats.png" \
28+
-F "strength=0.7" \
29+
-F "prompt=Three zebras" \
30+
-F "n=4" \
31+
| jq -r '.data[].b64_json' \
32+
| awk '{print > ("img" NR ".b64") } END { for(i=1;i<=NR;i++) system("base64 --decode img" i ".b64 > edit" i ".png && rm img" i ".b64") }'
33+
```
34+
35+
![image edit](./image-edit-zebras.png)
36+
37+
### Example response
38+
39+
```json
40+
{
41+
"data": [
42+
{
43+
"b64_json": "..."
44+
}
45+
]
46+
}
47+
```
48+
49+
50+
## Request
51+
52+
| Param | OpenVINO Model Server | OpenAI /images/edits API | Type | Description |
53+
|-----|----------|----------|---------|-----|
54+
| model ||| string (required) | Name of the model to use. Name assigned to a MediaPipe graph configured to schedule generation using desired embedding model. **Note**: This can also be omitted to fall back to URI based routing. Read more on routing topic **TODO** |
55+
| image | ⚠️ || string or array of strings (required) | The image to edit. Must be a single image (⚠️**Note**: Array of strings is not supported for now.) |
56+
| mask ||| string | Triggers inpainting pipeline type. An additional image whose fully transparent areas (e.g. where alpha is zero) indicate where `image` should be edited. Not supported for now. |
57+
| prompt ||| string (required) | A text description of the desired image(s). |
58+
| size ||| string or null (default: auto) | The size of the generated images. Must be in WxH format, example: `1024x768`. Default model W/H will be used when using `auto`. |
59+
| n ||| integer or null (default: `1`) | A number of images to generate. If you want to generate multiple images for the same combination of generation parameters and text prompts, you can use this parameter for better performance as internally computations will be performed with batch for Unet / Transformer models and text embeddings tensors will also be computed only once. |
60+
| input_fidelity | ⚠️ || string or null | Control how much effort the model will exert to match the style and features, especially facial features, of input images. (⚠️**Note**: Not supported for now - use `strength` parameter. |
61+
| background ||| string or null (default: auto) | Allows to set transparency for the background of the generated image(s). Not supported for now. |
62+
| stream ||| boolean or null (default: false) | Generate the image in streaming mode. Not supported for now. |
63+
| style ||| string or null (default: vivid) | The style of the generated images. Recognized OpenAI settings, but not supported: vivid, natural. |
64+
| moderation ||| string (default: auto) | Control the content-moderation level for images generated by endpoint. Either `low` or `auto`. Not supported for now. |
65+
| output_compression ||| integer or null (default: `100`) | The compression level (0-100%) for the generated images. Not supported for now. |
66+
| output_format ||| string or null | The format in which the generated images are returned. Not supported for now. |
67+
| partial_images ||| integer or null | The number of partial images to generate. This parameter is used for streaming responses that return partial images. Value must be between 0 and 3. When set to 0, the response will be a single image sent in one streaming event. Not supported for now. |
68+
| quality ||| string or null (default: auto) | Quality of the image that will be generated. Recognized OpenAI qualities, but currently not supported: auto, high, medium, low, hd, standard |
69+
| response_format | ⚠️ || string or null (default: b64_json) | The format of the images in output. Recognized options: b64_json or url. Only b64_json is supported for now (default). |
70+
| user ||| string (optional) | A unique identifier representing your end-user that allows to detect abuse. Not supported for now. |
71+
72+
### Parameters supported via `extra_body` field
73+
74+
| Param | OpenVINO Model Server | OpenAI /images/generations API | Type | Description |
75+
|-----|----------|----------|---------|-----|
76+
| prompt_2 ||| string (optional) | Prompt 2 for models which have at least two text encoders (SDXL/SD3/FLUX). |
77+
| prompt_3 ||| string (optional) | Prompt 3 for models which have at least three text encoders (SD3). |
78+
| negative_prompt ||| string (optional) | Negative prompt for models which support negative prompt (SD/SDXL/SD3). |
79+
| negative_prompt_2 ||| string (optional) | Negative prompt 2 for models which support negative prompt (SDXL/SD3). |
80+
| negative_prompt_3 ||| string (optional) | Negative prompt 3 for models which support negative prompt (SD3). |
81+
| num_images_per_prompt ||| integer (default: `1`) | The same as base parameter `n`. |
82+
| num_inference_steps ||| integer (default: `50`) | Defines denoising iteration count. Higher values increase quality and generation time, lower values generate faster with less detail. |
83+
| guidance_scale ||| float (optional) | Guidance scale parameter which controls how model sticks to text embeddings generated by text encoders within a pipeline. Higher value of guidance scale moves image generation towards text embeddings, but resulting image will be less natural and more augmented. |
84+
| strength ||| float (optional) min: 0.0, max: 1.0 | Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a starting point and more noise is added the higher the `strength`. The number of denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 essentially ignores `image` parameter. |
85+
| rng_seed ||| integer (optional) | Seed for random generator. |
86+
| max_sequence_length ||| integer (optional) | This parameters limits max sequence length for T5 encoder for SD3 and FLUX models. T5 tokenizer output is padded with pad tokens to 'max_sequence_length' within a pipeline. So, for better performance, you can specify this parameter to lower value to speed-up T5 encoder inference as well as inference of transformer denoising model. For optimal performance it can be set to a number of tokens for `prompt_3` / `negative_prompt_3` for SD3 or `prompt_2` for FLUX. |
87+
88+
## Response
89+
90+
| Param | OpenVINO Model Server | OpenAI /images/generations API | Type | Description |
91+
|-----|----------|----------|---------|-----|
92+
| data ||| array | A list of generated images. |
93+
| data.b64_json ||| string | The base64-encoded JSON of the generated image. |
94+
| data.url ||| string | The URL of the generated image if `response_format` is set to url. Unsupported for now. |
95+
| data.revised_prompt ||| string | The revised prompt that was used to generate the image. Unsupported for now. |
96+
| usage ||| dictionary | Info about assessed tokens. Unsupported for now. |
97+
| created ||| string | The Unix timestamp (in seconds) of when the image was created. Unsupported for now. |
98+
99+
100+
## Currently unsupported endpoints:
101+
- images/variations
102+
103+
## Error handling
104+
Endpoint can raise an error related to incorrect request in the following conditions:
105+
- Incorrect format of any of the fields based on the schema
106+
- Tokenized prompt exceeds the maximum length of the model context.
107+
- Model does not support requested width and height
108+
- Administrator defined min/max parameter value requirements are not met
109+
110+
## References
111+
112+
[End to end demo with image edit endpoint](../demos/image_generation/README.md)
113+
114+
[Code snippets](./clients_genai.md)

0 commit comments

Comments
 (0)