-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Motivation.
We have found that Cache Dit now supports the backend for Ascend NPUs. However, currently, it only supports basic capabilities, and the acceleration features need to be enhanced to achieve an overall performance improvement.
MindIE SD serves as an acceleration suite for Ascend NPUs in the multimodal domain, which includes core acceleration operators (FA variants, DiTMoE (in planning)), dedicated multimodal fusion operators, and quantization capabilities. Based on these methods, the performance of flux.1-dev can be further improved by 20%.
To achieve the associated acceleration benefits, Cache Dit needs to support the backend for the MindIE SD acceleration suite.
Proposed Change
Considering that Cache Dit itself provides hardware-agnostic acceleration capabilities such as cache and parallelism, for the Ascend backend, it can further extend support for hardware-specific quantization, FA backends, and operator fusion. These features can be supported through the following solutions:
- Quantization: Prioritize support for dynamic quantization capabilities.
- FA backend: Cache Dit will define standard interfaces for different types of FA, and hardware vendors will integrate these interfaces based on their own implementations.
- Operator fusion: Utilize the mechanism of torch.compile to achieve automatic operator fusion. (Given the varying levels of support for compile across different vendors, custom extensions are required rather than directly using PyTorch's compile.)