-
Couldn't load subscription status.
- Fork 6.5k
Add AITER attention backend #12549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AITER attention backend #12549
Conversation
7482105 to
89903c3
Compare
|
Thanks for this PR! Pardon my unwisdom, but for AMD devices, does this string not change? 👀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool PR!
| | attention family | main feature | | ||
| |---|---| | ||
| | FlashAttention | minimizes memory reads/writes through tiling and recomputation | | ||
| | AI Tensor Engine for ROCm | FlashAttention implementation optimized for AMD ROCm accelerators | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great project and would also make for a good follow-up, though perhaps best handled via separate issue/PR? If I understand it correctly, the kernel would first need to make it to kernels before integration to diffusers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% not related.
Existing PyTorch code that uses Anecdotally, over the last months running |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@bot /style |
|
Style bot fixed some files and pushed the changes. |
|
Let's go! Thanks a lot for adding this! |
What does this PR do?
AITER is AMD’s centralized repository to support high performance AI operators such as attention kernels for AMD ROCm enabled accelerators. This PR adds support for FlashAttention through AITER by introducing a new attention backend.
Test code for Flux inference below. Requires installation of
aiter>=0.15.0and a supported ROCm enabled accelerator.We are interested in following up this PR by eventually also enabling AITER backend support for context parallelism across multiple devices as the feature becomes more mature.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
cc: @sayakpaul @DN6 for review and any comments