Skip to content

Commit a9aa3e1

Browse files
text to video synthesis bug fix (mindspore-lab#985)
Co-authored-by: jijiarong <jijiarong@huawei.com>
1 parent 49b439d commit a9aa3e1

File tree

1 file changed

+51
-0
lines changed

1 file changed

+51
-0
lines changed

docs/diffusers/api/pipelines/text_to_video_zero.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,57 @@ imageio.mimsave("video.mp4", result, fps=4)
9191
```
9292

9393

94+
### Text-To-Video with Pose Control
95+
To generate a video from prompt with additional pose control
96+
97+
1. Download a demo video
98+
99+
```python
100+
from huggingface_hub import hf_hub_download
101+
102+
filename = "__assets__/poses_skeleton_gifs/dance1_corr.mp4"
103+
repo_id = "PAIR/Text2Video-Zero"
104+
video_path = hf_hub_download(repo_type="space", repo_id=repo_id, filename=filename)
105+
```
106+
107+
108+
2. Read video containing extracted pose images
109+
```python
110+
from PIL import Image
111+
import imageio
112+
113+
reader = imageio.get_reader(video_path, "ffmpeg")
114+
frame_count = 8
115+
pose_images = [Image.fromarray(reader.get_data(i)) for i in range(frame_count)]
116+
```
117+
To extract pose from actual video, read [ControlNet documentation](controlnet).
118+
119+
3. Run `StableDiffusionControlNetPipeline` with our custom attention processor
120+
121+
```python
122+
import mindspore as ms
123+
from mindone.diffusers import StableDiffusionControlNetPipeline, ControlNetModel
124+
from mindone.diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_zero import CrossFrameAttnProcessor
125+
126+
model_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
127+
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=ms.float16)
128+
pipe = StableDiffusionControlNetPipeline.from_pretrained(
129+
model_id, controlnet=controlnet, torch_dtype=ms.float16
130+
)
131+
132+
# Set the attention processor
133+
pipe.unet.set_attn_processor(CrossFrameAttnProcessor(batch_size=2))
134+
pipe.controlnet.set_attn_processor(CrossFrameAttnProcessor(batch_size=2))
135+
136+
# fix latents for all frames
137+
latents = ms.ops.randn((1, 4, 64, 64), dtype=ms.float16).repeat(len(pose_images), 1, 1, 1)
138+
139+
prompt = "Darth Vader dancing in a desert"
140+
result = pipe(prompt=[prompt] * len(pose_images), image=pose_images, latents=latents).images
141+
imageio.mimsave("video.mp4", result, fps=4)
142+
```
143+
144+
94145
### Text-To-Video with Edge Control
95146

96147
To generate a video from prompt with additional Canny edge control, follow the same steps described above for pose-guided generation using [Canny edge ControlNet model](https://huggingface.co/lllyasviel/sd-controlnet-canny).

0 commit comments

Comments
 (0)