Skip to content

Serving Video Generation Models (Wan2.2) with Distributed Inference #2355

@jaideepsai-narayan

Description

@jaideepsai-narayan

Feature request

Description:

I would like to propose adding support for serving video generation models with distributed inference on Gaudi3 AI accelerators. Currently, there is support for serving video generation models using AMD GPUs, as detailed in the AMD blog post
However, support for video model serving on Gaudi3 is lacking. By adding support for Gaudi3, we can significantly improve the performance and scalability of video generation tasks.

Motivation

The demand for high-performance video generation models is growing, and while existing solutions rely heavily on GPUs for inference, Gaudi3 has the potential to dramatically improve both the scalability and efficiency of serving video generation models. Gaudi3's architecture is optimized for distributed workloads and parallel processing, but the current model serving mechanism could benefit from optimizations to ensure the efficient utilization of resources and improved throughput when handling large-scale video generation tasks.

Your contribution

I am happy to contribute to this PR 😊

Current Status: Distributed inference for video generation models on Gaudi3 is already in place, but there is room for improvement in the efficiency of serving and scaling the Wan models.

Any suggestions, insights, or shared articles related to optimizing model serving on Gaudi3, especially for video generation tasks, would be incredibly helpful for improving the serving pipeline and performance. Resources from the community or expert insights would be greatly appreciated 👍

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions