Skip to content

Commit 58bfa87

Browse files
authored
Update README.md
1 parent 1442a80 commit 58bfa87

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -450,7 +450,7 @@ The kernels listed here will guide you through a step-by-step progression, rangi
450450

451451
💡NOTE: 🤖[ffpa-attn-mma](https://github.com/xlite-dev/ffpa-attn-mma): 📚FFPA - Yet another Faster Flash Prefill Attention with O(1)🎉SRAM complexity for headdim > 256, **1.8x~3x**🎉faster than SDPA EA: [📈L20 ~1.9x↑🎉](https://github.com/xlite-dev/ffpa-attn-mma?tab=readme-ov-file#L1-bench-l20), [📈 A30 ~1.8x↑🎉](https://github.com/xlite-dev/ffpa-attn-mma?tab=readme-ov-file#L1-bench-a30), [📈3080 ~2.9x↑🎉](https://github.com/xlite-dev/ffpa-attn-mma?tab=readme-ov-file#L1-bench-3080), [📈4090 ~2.1x↑🎉](https://github.com/xlite-dev/ffpa-attn-mma?tab=readme-ov-file#L1-bench-4090).
452452

453-
### 📚 Triton Kernel (OpenAI Triton) ⭐️⭐️⭐️ (©️back👆🏻)
453+
### 📚 Triton Kernel (OpenAI Triton) ([©️back👆🏻](#cuda-kernel))
454454

455455
<div id="triton-kernel"></div>
456456

@@ -460,7 +460,7 @@ The kernels listed here will guide you through a step-by-step progression, rangi
460460
| ✔️ [triton_merge_attn_states_kernel(w/ CUDA)](./kernels/openai-triton/merge-attn-states/)|f16/bf16/f32|f32|[link](./kernels/openai-triton/merge-attn-states/)|⭐️⭐️⭐️|
461461

462462

463-
### 📚 CUTLASS/CuTe Kernel ⭐️⭐️⭐️ (©️back👆🏻)
463+
### 📚 CUTLASS/CuTe Kernel ⭐️⭐️⭐️ ([©️back👆🏻](#cuda-kernel))
464464

465465
<div id="cutlass-kernel"></div>
466466

0 commit comments

Comments
 (0)