Skip to content

Commit 6ff6490

Browse files
authored
[FA2] Release flash-attn-mma split-kv/q🎉 (#161)
* Update README.md * Update README.md
1 parent 5afd8c1 commit 6ff6490

File tree

2 files changed

+5
-0
lines changed

2 files changed

+5
-0
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ I have also implemented **FlashAttention-2** using pure MMA PTX instructions, wh
4646

4747
![flash-attn-mma](https://github.com/user-attachments/assets/6f66796d-44d5-4ec1-b224-af997bd152b2)
4848

49+
4950
|CUDA Cores|Sliced K (Loop over N/D)|Tile Block (Br, Bc, Bd)|MMA (m16n8k16)|
5051
|:---:|:---:|:---:|:---:|
5152
|✔️|✔️|✔️|✔️|
@@ -56,6 +57,8 @@ I have also implemented **FlashAttention-2** using pure MMA PTX instructions, wh
5657

5758
The `Split KV` and `Split Q` implementations have been carried out in [flash-attention-mma⚡️⚡️](./kernels/flash-attn) for performance comparison. The `Split KV` method, which involves splitting all QKV across MMA (Warps), is slower than `Split Q` policy, which splitting Q across MMA(Warps) and keep access KV for all MMA(Warps).
5859

60+
![flash-attn](https://github.com/user-attachments/assets/11490fbc-2a4a-4630-abe8-91a9d1251cba)
61+
5962
- 📚 Split KV (Basic, FlashAttention-1)
6063

6164
```C++

kernels/flash-attn/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ This repository's implementation of FlashAttention is intended solely for learni
2626

2727
The `Split KV` and `Split Q` implementations have been carried out in [flash-attention-mma⚡️⚡️](.) for performance comparison. The `Split KV` method, which involves splitting all QKV across MMA (Warps) using a naive matmul (MMA) and Warp tiling policy, is slower compared to the `Split Q` policy, which splitting Q across MMA(Warps) and keep access KV for all MMA(Warps).
2828

29+
![flash-attn](https://github.com/user-attachments/assets/11490fbc-2a4a-4630-abe8-91a9d1251cba)
30+
2931
## 📖 Split KV (Basic, FlashAttention-1)
3032
<div id="mma-split-kv"></div>
3133

0 commit comments

Comments
 (0)