-
Notifications
You must be signed in to change notification settings - Fork 270
Description
Feature request
Description:
I would like to propose adding support for serving video generation models with distributed inference on Gaudi3 AI accelerators. Currently, there is support for serving video generation models using AMD GPUs, as detailed in the AMD blog post
However, support for video model serving on Gaudi3 is lacking. By adding support for Gaudi3, we can significantly improve the performance and scalability of video generation tasks.
Motivation
The demand for high-performance video generation models is growing, and while existing solutions rely heavily on GPUs for inference, Gaudi3 has the potential to dramatically improve both the scalability and efficiency of serving video generation models. Gaudi3's architecture is optimized for distributed workloads and parallel processing, but the current model serving mechanism could benefit from optimizations to ensure the efficient utilization of resources and improved throughput when handling large-scale video generation tasks.
Your contribution
I am happy to contribute to this PR 😊
Current Status: Distributed inference for video generation models on Gaudi3 is already in place, but there is room for improvement in the efficiency of serving and scaling the Wan models.
Any suggestions, insights, or shared articles related to optimizing model serving on Gaudi3, especially for video generation tasks, would be incredibly helpful for improving the serving pipeline and performance. Resources from the community or expert insights would be greatly appreciated 👍