Serving Video Generation Models (Wan2.2) with Distributed Inference

### Feature request

Description:

I would like to propose adding support for serving video generation models with distributed inference on Gaudi3 AI accelerators. Currently, there is support for serving video generation models using AMD GPUs, as detailed in the [AMD blog post](https://rocm.blogs.amd.com/artificial-intelligence/serving-videogen-v1/README.html)
However, support for video model serving on Gaudi3 is lacking. By adding support for Gaudi3, we can significantly improve the performance and scalability of video generation tasks.



### Motivation

The demand for high-performance video generation models is growing, and while existing solutions rely heavily on GPUs for inference, Gaudi3 has the potential to dramatically improve both the scalability and efficiency of serving video generation models. Gaudi3's architecture is optimized for distributed workloads and parallel processing, but the current model serving mechanism could benefit from optimizations to ensure the efficient utilization of resources and improved throughput when handling large-scale video generation tasks.

### Your contribution

I am happy to contribute to this PR 😊

**Current Status**: Distributed inference for video generation models on Gaudi3 is already in place, but there is room for improvement in the efficiency of serving and scaling the Wan models.

**_Any suggestions, insights, or shared articles related to optimizing model serving on Gaudi3, especially for video generation tasks, would be incredibly helpful for improving the serving pipeline and performance. Resources from the community or expert insights would be greatly appreciated_** 👍 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serving Video Generation Models (Wan2.2) with Distributed Inference #2355

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Serving Video Generation Models (Wan2.2) with Distributed Inference #2355

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions