Skip to content

[RFC] Cache Dit extension supports the MindIE SD acceleration suite to further accelerate the Ascend NPU backend. #733

@blian6

Description

@blian6

Motivation.
We have found that Cache Dit now supports the backend for Ascend NPUs. However, currently, it only supports basic capabilities, and the acceleration features need to be enhanced to achieve an overall performance improvement.

MindIE SD serves as an acceleration suite for Ascend NPUs in the multimodal domain, which includes core acceleration operators (FA variants, DiTMoE (in planning)), dedicated multimodal fusion operators, and quantization capabilities. Based on these methods, the performance of flux.1-dev can be further improved by 20%.

To achieve the associated acceleration benefits, Cache Dit needs to support the backend for the MindIE SD acceleration suite.

Proposed Change
Considering that Cache Dit itself provides hardware-agnostic acceleration capabilities such as cache and parallelism, for the Ascend backend, it can further extend support for hardware-specific quantization, FA backends, and operator fusion. These features can be supported through the following solutions:

  1. Quantization: Prioritize support for dynamic quantization capabilities.
  2. FA backend: Cache Dit will define standard interfaces for different types of FA, and hardware vendors will integrate these interfaces based on their own implementations.
  3. Operator fusion: Utilize the mechanism of torch.compile to achieve automatic operator fusion. (Given the varying levels of support for compile across different vendors, custom extensions are required rather than directly using PyTorch's compile.)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions