Skip to content

Commit fc63ebd

Browse files
authored
[Community Pipeline] Rerender-A-Video: Zero-Shot Video-to-Video Translation (#6332)
* upload codes and doc * lint * lint * lint * update code * remove blank lines * Fix load url
1 parent 8b6fae4 commit fc63ebd

File tree

2 files changed

+1265
-1
lines changed

2 files changed

+1265
-1
lines changed

examples/community/README.md

Lines changed: 87 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ prompt-to-prompt | change parts of a prompt and retain image structure (see [pap
5656
| AnimateDiff ControlNet Pipeline | Combines AnimateDiff with precise motion control using ControlNets | [AnimateDiff ControlNet Pipeline](#animatediff-controlnet-pipeline) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SKboYeGjEQmQPWoFC0aLYpBlYdHXkvAu?usp=sharing) | [Aryan V S](https://github.com/a-r-r-o-w) and [Edoardo Botta](https://github.com/EdoardoBotta) |
5757
| DemoFusion Pipeline | Implementation of [DemoFusion: Democratising High-Resolution Image Generation With No $$$](https://arxiv.org/abs/2311.16973) | [DemoFusion Pipeline](#DemoFusion) | - | [Ruoyi Du](https://github.com/RuoyiDu) |
5858
| Null-Text Inversion Pipeline | Implement [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://arxiv.org/abs/2211.09794) as a pipeline. | [Null-Text Inversion](https://github.com/google/prompt-to-prompt/) | - | [Junsheng Luan](https://github.com/Junsheng121) |
59-
59+
| Rerender A Video Pipeline | Implementation of [[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation](https://arxiv.org/abs/2306.07954) | [Rerender A Video Pipeline](#Rerender_A_Video) | - | [Yifan Zhou](https://github.com/SingleZombie) |
6060

6161
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
6262
```py
@@ -3185,5 +3185,91 @@ pipeline = NullTextPipeline.from_pretrained(model_path, scheduler = scheduler, t
31853185
#Saves the inverted_latent to save time
31863186
inverted_latent, uncond = pipeline.invert(input_image, invert_prompt, num_inner_steps=10, early_stop_epsilon= 1e-5, num_inference_steps = steps)
31873187
pipeline(prompt, uncond, inverted_latent, guidance_scale=7.5, num_inference_steps=steps).images[0].save(input_image+".output.jpg")
3188+
```
3189+
### Rerender_A_Video
3190+
3191+
This is the Diffusers implementation of zero-shot video-to-video translation pipeline [Rerender_A_Video](https://github.com/williamyang1991/Rerender_A_Video) (without Ebsynth postprocessing). To run the code, please install gmflow. Then modify the path in `examples/community/rerender_a_video.py`:
3192+
3193+
```py
3194+
gmflow_dir = "/path/to/gmflow"
3195+
```
31883196

3197+
After that, you can run the pipeline with:
3198+
3199+
```py
3200+
from diffusers import ControlNetModel, AutoencoderKL, DDIMScheduler
3201+
from diffusers.utils import export_to_video
3202+
import numpy as np
3203+
import torch
3204+
3205+
import cv2
3206+
from PIL import Image
3207+
3208+
def video_to_frame(video_path: str, interval: int):
3209+
vidcap = cv2.VideoCapture(video_path)
3210+
success = True
3211+
3212+
count = 0
3213+
res = []
3214+
while success:
3215+
count += 1
3216+
success, image = vidcap.read()
3217+
if count % interval != 1:
3218+
continue
3219+
if image is not None:
3220+
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
3221+
res.append(image)
3222+
3223+
vidcap.release()
3224+
return res
3225+
3226+
input_video_path = 'path/to/video'
3227+
input_interval = 10
3228+
frames = video_to_frame(
3229+
input_video_path, input_interval)
3230+
3231+
control_frames = []
3232+
# get canny image
3233+
for frame in frames:
3234+
np_image = cv2.Canny(frame, 50, 100)
3235+
np_image = np_image[:, :, None]
3236+
np_image = np.concatenate([np_image, np_image, np_image], axis=2)
3237+
canny_image = Image.fromarray(np_image)
3238+
control_frames.append(canny_image)
3239+
3240+
# You can use any ControlNet here
3241+
controlnet = ControlNetModel.from_pretrained(
3242+
"lllyasviel/sd-controlnet-canny").to('cuda')
3243+
3244+
# You can use any fintuned SD here
3245+
pipe = DiffusionPipeline.from_pretrained(
3246+
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, custom_pipeline='rerender_a_video').to('cuda')
3247+
3248+
# Optional: you can download vae-ft-mse-840000-ema-pruned.ckpt to enhance the results
3249+
# pipe.vae = AutoencoderKL.from_single_file(
3250+
# "path/to/vae-ft-mse-840000-ema-pruned.ckpt").to('cuda')
3251+
3252+
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
3253+
3254+
generator = torch.manual_seed(0)
3255+
frames = [Image.fromarray(frame) for frame in frames]
3256+
output_frames = pipe(
3257+
"a beautiful woman in CG style, best quality, extremely detailed",
3258+
3259+
frames,
3260+
control_frames,
3261+
num_inference_steps=20,
3262+
strength=0.75,
3263+
controlnet_conditioning_scale=0.7,
3264+
generator=generator,
3265+
warp_start=0.0,
3266+
warp_end=0.1,
3267+
mask_start=0.5,
3268+
mask_end=0.8,
3269+
mask_strength=0.5,
3270+
negative_prompt='longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'
3271+
).frames
3272+
3273+
export_to_video(
3274+
output_frames, "/path/to/video.mp4", 5)
31893275
```

0 commit comments

Comments
 (0)