You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -41,30 +40,289 @@ You can try setting the `height` and `width` parameters to generate images with
41
40
```py
42
41
import torch
43
42
from diffusers import OmniGenPipeline
43
+
44
44
pipe = OmniGenPipeline.from_pretrained(
45
45
"Shitao/OmniGen-v1-diffusers",
46
46
torch_dtype=torch.bfloat16
47
47
)
48
+
pipe.to("cuda")
48
49
49
-
prompt ="A young woman sits on a sofa, holding a book and facing the camera. She wears delicate silver hoop earrings adorned with tiny, sparkling diamonds that catch the light, with her long chestnut hair cascading over her shoulders. Her eyes are focused and gentle, framed by long, dark lashes. She is dressed in a cozy cream sweater, which complements her warm, inviting smile. Behind her, there is a table with a cup of water in a sleek, minimalist blue mug. The background is a serene indoor setting with soft natural light filtering through a window, adorned with tasteful art and flowers, creating a cozy and peaceful ambiance. 4K, HD."
50
-
pipe.enable_model_cpu_offload()
51
-
50
+
prompt ="Realistic photo. A young woman sits on a sofa, holding a book and facing the camera. She wears delicate silver hoop earrings adorned with tiny, sparkling diamonds that catch the light, with her long chestnut hair cascading over her shoulders. Her eyes are focused and gentle, framed by long, dark lashes. She is dressed in a cozy cream sweater, which complements her warm, inviting smile. Behind her, there is a table with a cup of water in a sleek, minimalist blue mug. The background is a serene indoor setting with soft natural light filtering through a window, adorned with tasteful art and flowers, creating a cozy and peaceful ambiance. 4K, HD."
<img src="https://github.com/VectorSpaceLab/OmniGen/blob/main/imgs/demo_cases/t2i_woman_with_book.png" alt="generated image of an astronaut in a jungle"/>
prompt="Generate a new photo using the following picture and text as conditions: <img><|image_1|></img>\n A young boy is sitting on a sofa in the library, holding a book. His hair is neatly combed, and a faint smile plays on his lips, with a few freckles scattered across his cheeks. The library is quiet, with rows of shelves filled with books stretching out behind him."
<figcaption class="mt-2 text-center text-sm text-gray-500">skeleton to image</figcaption>
172
+
</div>
173
+
</div>
174
+
175
+
176
+
OmniGen can also directly use relevant information from input images to generate new images.
177
+
```py
178
+
import torch
179
+
from diffusers import OmniGenPipeline
180
+
from diffusers.utils import load_image
181
+
182
+
pipe = OmniGenPipeline.from_pretrained(
183
+
"Shitao/OmniGen-v1-diffusers",
184
+
torch_dtype=torch.bfloat16
185
+
)
186
+
pipe.to("cuda")
187
+
188
+
prompt="Following the pose of this image <img><|image_1|></img>, generate a new photo: A young boy is sitting on a sofa in the library, holding a book. His hair is neatly combed, and a faint smile plays on his lips, with a few freckles scattered across his cheeks. The library is quiet, with rows of shelves filled with books stretching out behind him."
OmniGen can generate multiple images based on the people and objects in the input image and supports inputting multiple images simultaneously.
210
+
Additionally, OmniGen can extract desired objects from an image containing multiple objects based on instructions.
211
+
212
+
```py
213
+
import torch
214
+
from diffusers import OmniGenPipeline
215
+
from diffusers.utils import load_image
216
+
217
+
pipe = OmniGenPipeline.from_pretrained(
218
+
"Shitao/OmniGen-v1-diffusers",
219
+
torch_dtype=torch.bfloat16
220
+
)
221
+
pipe.to("cuda")
222
+
223
+
prompt="A man and a woman are sitting at a classroom desk. The man is the man with yellow hair in <img><|image_1|></img>. The woman is the woman on the left of <img><|image_2|></img>"
prompt="A woman is walking down the street, wearing a white long-sleeve blouse with lace details on the sleeves, paired with a blue pleated skirt. The woman is <img><|image_1|></img>. The long-sleeve blouse and a pleated skirt are <img><|image_2|></img>."
For text-to-image task, OmniGen requires minimal memory and time costs (9G memory and 31s for a 1024*1024 image on A800 GPU).
298
+
However, when using input images, the computational cost increases.
299
+
300
+
Here are some guidelines to help you reduce computational costs when input multiple images. The experiments are conducted on A800 GPU and input two images to OmniGen.
65
301
66
-
## Optimization
67
302
68
303
### inference speed
69
304
70
-
### Memory
305
+
-`use_kv_cache=True`:
306
+
`use_kv_cache` will store key and value states of the input conditions to compute attention without redundant computations.
307
+
The default value is True, and OmniGen will offload the kv cache to cpu default.
308
+
-`use_kv_cache=False`: the inference time is 3m21s.
309
+
-`use_kv_cache=True`: the inference time is 1m30s.
310
+
311
+
-`max_input_image_size`:
312
+
the maximum size of input image, which will be used to crop the input image
313
+
-`max_input_image_size=1024`: the inference time is 1m30s.
314
+
-`max_input_image_size=512`: the inference time is 58s.
315
+
316
+
### Memory
317
+
318
+
-`pipe.enable_model_cpu_offload()`:
319
+
- Without enabling cpu offloading, memory usage is `31 GB`
320
+
- With enabling cpu offloading, memory usage is `28 GB`
0 commit comments