You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For text-to-image, pass a text prompt. By default, OmniGen generates a 1024x1024 image.
68
+
You can try setting the `height` and `width` parameters to generate images with different size.
69
+
70
+
```py
71
+
prompt ="Realistic photo. A young woman sits on a sofa, holding a book and facing the camera. She wears delicate silver hoop earrings adorned with tiny, sparkling diamonds that catch the light, with her long chestnut hair cascading over her shoulders. Her eyes are focused and gentle, framed by long, dark lashes. She is dressed in a cozy cream sweater, which complements her warm, inviting smile. Behind her, there is a table with a cup of water in a sleek, minimalist blue mug. The background is a serene indoor setting with soft natural light filtering through a window, adorned with tasteful art and flowers, creating a cozy and peaceful ambiance. 4K, HD."
# CogVideoX works well with long and well-described prompts
82
-
prompt ="A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
83
-
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
84
-
```
85
-
86
-
The [T2V benchmark](https://gist.github.com/a-r-r-o-w/5183d75e452a368fd17448fcc810bd3f) results on an 80GB A100 machine are:
87
-
88
-
```
89
-
Without torch.compile(): Average inference time: 96.89 seconds.
90
-
With torch.compile(): Average inference time: 76.27 seconds.
82
+
OmniGen supports for multimodal inputs.
83
+
When the input includes an image, you need to add a placeholder `<img><|image_1|></img>` in the text prompt to represent the image.
84
+
It is recommended to enable 'use_input_image_size_as_output' to keep the edited image the same size as the original image.
85
+
86
+
```py
87
+
prompt="<img><|image_1|></img> Remove the woman's earrings. Replace the mug with a clear glass filled with sparkling iced cola."
0 commit comments