- 
                Notifications
    You must be signed in to change notification settings 
- Fork 6.4k
Description
Feature request
Introduce a new pipeline to extend existing image inpainting capabilities (StableDiffusionInpaintPipeline) to videos. The goal is to provide a native, GPU-optimized API within Diffusers that performs temporally coherent video inpainting instead of independent per-frame processing.
Motivation
Current video inpainting approaches in the community simply loop over frames and call the image inpainting pipeline repeatedly.
This leads to:
- Temporal flicker and inconsistent textures between frames.
- Poor GPU utilization and high memory overhead.
- Lack of tools to maintain motion coherence or reuse diffusion latents across time.
- A built-in VideoInpaintPipeline would make it possible to remove objects, restore scenes, or creatively edit videos using diffusion models while keeping motion and lighting consistent across frames.
Your contribution
I plan to:
Implement VideoInpaintPipeline as a subclass of DiffusionPipeline, leveraging StableDiffusionInpaintPipeline under the hood.
Add temporal consistency mechanisms, such as latent reuse between frames and optional optical-flow–guided warping (RAFT / GMFlow).
Optimize performance through batched FP16 inference, scheduler noise reuse, and optional torch.compile acceleration.
Provide a clean user API compatible with existing pipelines:
from diffusers import VideoInpaintPipeline
pipe = VideoInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    use_optical_flow=True,
    compile=True,
)
result = pipe(
    video_path="input.mp4",
    mask_path="mask.mp4",
    prompt="replace background with a snowy mountain",
    num_inference_steps=10,
)
result.video.save("output.mp4")
Contribute documentation and tests demonstrating temporal coherence, performance benchmarks, and example notebooks for real-world use.