You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: demos/image_generation/README.md
+79-68Lines changed: 79 additions & 68 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,8 @@
3
3
This demo shows how to deploy image generation models (Stable Diffusion/Stable Diffusion 3/Stable Diffusion XL/FLUX) to create and edit images with the OpenVINO Model Server.
4
4
Image generation pipelines are exposed via [OpenAI API](https://platform.openai.com/docs/api-reference/images/create)`images/generations` and `images/edits` endpoints.
> **Note:** This demo was tested on Intel® Xeon®, Intel® Core®, Intel® Arc™ A770, Intel® Arc™ B580 on Ubuntu 22/24, RedHat 9 and Windows 11.
7
9
8
10
## Prerequisites
@@ -22,7 +24,7 @@ Image generation pipelines are exposed via [OpenAI API](https://platform.openai.
22
24
23
25
> **NOTE:** Model downloading feature is described in depth in separate documentation page: [Pulling HuggingFaces Models](../../docs/pull_hf_models.md).
24
26
25
-
This command pulls the `OpenVINO/FLUX.1-schnell-int4-ov` quantized model directly from HuggingFaces and starts the serving. If the model already exists locally, it will skip the downloading and immediately start the serving.
27
+
This command pulls the `OpenVINO/stable-diffusion-v1-5-int8-ov` quantized model directly from HuggingFaces and starts the serving. If the model already exists locally, it will skip the downloading and immediately start the serving.
26
28
27
29
> **NOTE:** Optionally, to only download the model and omit the serving part, use `--pull` parameter.
Image generation endpoints consist of 3 steps: text encoding, denoising and vae decoder. It is possible to select device for each step separately. In this example, we will use NPU for text encoding and denoising, and GPU for vae decoder. This is useful when the model is too large to fit into NPU memory, but the NPU can still be used for the first two steps.
116
+
Image generation endpoints consist of 3 models: vae encoder, denoising and vae decoder. It is possible to select device for each step separately. In this example, we will use NPU for text encoding and denoising, and GPU for vae decoder. This is useful when the model is too large to fit into NPU memory, but the NPU can still be used for the first two steps.
115
117
116
118
::::{tab-set}
117
119
:::{tab-item} Docker (Linux)
@@ -121,23 +123,26 @@ In this specific case, we also need to use `--device /dev/dri`, because we also
121
123
122
124
> **NOTE:** The NPU device requires the pipeline to be reshaped to static shape, this is why the `--resolution` parameter is used to define the input resolution.
123
125
124
-
> **NOTE:** This feature will be available in 2025.3 and later releases, so until next release, it is required to build the model server from source from the `main` branch.
125
-
126
+
> **NOTE:** In case the model loading phase takes too long, consider caching the model with `--cache_dir` parameter, as seen in example below.
126
127
127
128
It can be applied using the commands below:
128
129
```bash
129
130
mkdir -p models
131
+
mkdir -p cache
130
132
131
-
docker run -d --rm -p 8000:8000 -v $(pwd)/models:/models/:rw \
Run `export_model.py` script to download and quantize the model:
176
183
177
-
> **Note:** Before downloading the model, access must be requested. Follow the instructions on the [HuggingFace model page](https://huggingface.co/black-forest-labs/FLUX.1-schnell) to request access. When access is granted, create an authentication token in the HuggingFace account -> Settings -> Access Tokens page. Issue the following command and enter the authentication token. Authenticate via `huggingface-cli login`.
184
+
> **Note:** Before downloading the model, access must be requested. Follow the instructions on the [HuggingFace model page](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) to request access. When access is granted, create an authentication token in the HuggingFace account -> Settings -> Access Tokens page. Issue the following command and enter the authentication token. Authenticate via `huggingface-cli login`.
178
185
179
186
> **Note:** The users in China need to set environment variable HF_ENDPOINT="https://hf-mirror.com" before running the export script to connect to the HF Hub.
180
187
@@ -183,8 +190,8 @@ Run `export_model.py` script to download and quantize the model:
Image generation endpoints consist of 3 steps: text encoding, denoising and vae decoder. It is possible to select device for each step separately. In this example, we will use NPU for text encoding and denoising, and GPU for vae decoder. This is useful when the model is too large to fit into NPU memory, but the NPU can still be used for the first two steps.
215
+
Image generation endpoints consist of 3 models: vae encoder, denoising and vae decoder. It is possible to select device for each step separately. In this example, we will use NPU for all the steps.
209
216
210
217
> **NOTE:** The NPU device requires the pipeline to be reshaped to static shape, this is why the `--resolution` parameter is used to define the input resolution.
211
218
212
-
> **NOTE:**This feature will be available in 2025.3 and later releases, so until next release, it is required to use export script from the `main` branch.
219
+
> **NOTE:**In case the model loading phase takes too long, consider caching the model with `--cache_dir` parameter, as seen in example below.
-d "{\"model\": \"OpenVINO/FLUX.1-schnell-int4-ov\", \"prompt\": \"three cute cats sitting on a bench\", \"rng_seed\": 45, \"num_inference_steps\": 3, \"size\": \"512x512\"}"
426
+
-d "{\"model\": \"OpenVINO/stable-diffusion-v1-5-int8-ov\", \"prompt\": \"Three astronauts on the moon, cold color palette, muted colors, detailed, 8k\", \"rng_seed\": 409, \"num_inference_steps\": 50, \"size\": \"512x512\"}"
417
427
```
418
428
419
429
@@ -428,9 +438,9 @@ Expected Response
428
438
}
429
439
```
430
440
431
-
The commands will have the generated image saved in output.png.
441
+
The commands will have the generated image saved in generate_output.png.
432
442
433
-

443
+

434
444
435
445
436
446
### Requesting image generation with OpenAI Python package
@@ -454,28 +464,25 @@ client = OpenAI(
454
464
)
455
465
456
466
response = client.images.generate(
457
-
model="OpenVINO/FLUX.1-schnell-int4-ov",
458
-
prompt="three cute cats sitting on a bench",
467
+
model="OpenVINO/stable-diffusion-v1-5-int8-ov",
468
+
prompt="Three astronauts on the moon, cold color palette, muted colors, detailed, 8k",
459
469
extra_body={
460
-
"rng_seed": 60,
470
+
"rng_seed": 409,
461
471
"size": "512x512",
462
-
"num_inference_steps": 3
472
+
"num_inference_steps": 50
463
473
}
464
474
)
465
475
base64_image = response.data[0].b64_json
466
476
467
477
image_data = base64.b64decode(base64_image)
468
478
image = Image.open(BytesIO(image_data))
469
-
image.save('output2.png')
470
-
479
+
image.save('generate_output.png')
471
480
```
472
481
473
-
Output file (`output2.png`):
474
-

475
-
476
-
477
482
### Requesting image edit with OpenAI Python package
478
483
484
+
Example changing the previously generated image to: `Three astronauts in the jungle, vibrant color palette, live colors, detailed, 8k`:
485
+
479
486
```python
480
487
from openai import OpenAI
481
488
import base64
@@ -488,31 +495,35 @@ client = OpenAI(
488
495
)
489
496
490
497
response = client.images.edit(
491
-
model="OpenVINO/FLUX.1-schnell-int4-ov",
492
-
image=open("output2.png", "rb"),
493
-
prompt="pink cats",
498
+
model="OpenVINO/stable-diffusion-v1-5-int8-ov",
499
+
image=open("generate_output.png", "rb"),
500
+
prompt="Three astronauts in the jungle, vibrant color palette, live colors, detailed, 8k",
494
501
extra_body={
495
-
"rng_seed": 60,
502
+
"rng_seed": 409,
496
503
"size": "512x512",
497
-
"num_inference_steps": 3,
498
-
"strength": 0.7
504
+
"num_inference_steps": 50,
505
+
"strength": 0.67
499
506
}
500
507
)
501
508
base64_image = response.data[0].b64_json
502
509
503
510
image_data = base64.b64decode(base64_image)
504
511
image = Image.open(BytesIO(image_data))
505
512
image.save('edit_output.png')
506
-
507
513
```
508
514
509
515
Output file (`edit_output.png`):
510
516

511
517
518
+
### Strength influence on final damage
519
+
520
+

512
521
522
+
Please follow [OpenVINO notebook](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/image-to-image-genai/image-to-image-genai.ipynb) to understand how other parameters affect editing.
0 commit comments