Skip to content

Commit 6b52547

Browse files
committed
update docs
1 parent 236f14b commit 6b52547

File tree

7 files changed

+268
-700
lines changed

7 files changed

+268
-700
lines changed

docs/source/en/api/pipelines/omnigen.md

Lines changed: 36 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -56,44 +56,50 @@ First, load the pipeline:
5656

5757
```python
5858
import torch
59-
from diffusers import CogVideoXPipeline, CogVideoXImageToVideoPipeline
60-
from diffusers.utils import export_to_video,load_image
61-
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b").to("cuda") # or "THUDM/CogVideoX-2b"
59+
from diffusers import OmniGenPipeline
60+
pipe = OmniGenPipeline.from_pretrained(
61+
"Shitao/OmniGen-v1-diffusers",
62+
torch_dtype=torch.bfloat16
63+
)
64+
pipe.to("cuda")
6265
```
6366

64-
If you are using the image-to-video pipeline, load it as follows:
65-
66-
```python
67-
pipe = CogVideoXImageToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-I2V").to("cuda")
67+
For text-to-image, pass a text prompt. By default, OmniGen generates a 1024x1024 image.
68+
You can try setting the `height` and `width` parameters to generate images with different size.
69+
70+
```py
71+
prompt = "Realistic photo. A young woman sits on a sofa, holding a book and facing the camera. She wears delicate silver hoop earrings adorned with tiny, sparkling diamonds that catch the light, with her long chestnut hair cascading over her shoulders. Her eyes are focused and gentle, framed by long, dark lashes. She is dressed in a cozy cream sweater, which complements her warm, inviting smile. Behind her, there is a table with a cup of water in a sleek, minimalist blue mug. The background is a serene indoor setting with soft natural light filtering through a window, adorned with tasteful art and flowers, creating a cozy and peaceful ambiance. 4K, HD."
72+
image = pipe(
73+
prompt=prompt,
74+
height=1024,
75+
width=1024,
76+
guidance_scale=3,
77+
generator=torch.Generator(device="cpu").manual_seed(111),
78+
).images[0]
79+
image
6880
```
6981

70-
Then change the memory layout of the pipelines `transformer` component to `torch.channels_last`:
71-
72-
```python
73-
pipe.transformer.to(memory_format=torch.channels_last)
74-
```
75-
76-
Compile the components and run inference:
77-
78-
```python
79-
pipe.transformer = torch.compile(pipeline.transformer, mode="max-autotune", fullgraph=True)
80-
81-
# CogVideoX works well with long and well-described prompts
82-
prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
83-
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
84-
```
85-
86-
The [T2V benchmark](https://gist.github.com/a-r-r-o-w/5183d75e452a368fd17448fcc810bd3f) results on an 80GB A100 machine are:
87-
88-
```
89-
Without torch.compile(): Average inference time: 96.89 seconds.
90-
With torch.compile(): Average inference time: 76.27 seconds.
82+
OmniGen supports for multimodal inputs.
83+
When the input includes an image, you need to add a placeholder `<img><|image_1|></img>` in the text prompt to represent the image.
84+
It is recommended to enable 'use_input_image_size_as_output' to keep the edited image the same size as the original image.
85+
86+
```py
87+
prompt="<img><|image_1|></img> Remove the woman's earrings. Replace the mug with a clear glass filled with sparkling iced cola."
88+
input_images=[load_image("https://raw.githubusercontent.com/VectorSpaceLab/OmniGen/main/imgs/docs_img/t2i_woman_with_book.png")]
89+
image = pipe(
90+
prompt=prompt,
91+
input_images=input_images,
92+
guidance_scale=2,
93+
img_guidance_scale=1.6,
94+
use_input_image_size_as_output=True,
95+
generator=torch.Generator(device="cpu").manual_seed(222)).images[0]
96+
image
9197
```
9298

9399

94-
## CogVideoXPipeline
100+
## OmniGenPipeline
95101

96-
[[autodoc]] CogVideoXPipeline
102+
[[autodoc]] OmniGenPipeline
97103
- all
98104
- __call__
99105

0 commit comments

Comments
 (0)