Skip to content

Commit c03a56e

Browse files
authored
Update README.md
1 parent 1ecf8ab commit c03a56e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
229229
|2024.11|🔥🔥[**SageAttention-2**] SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration(@thu-ml)|[[pdf]](https://arxiv.org/pdf/2411.10958)|[[SageAttention]](https://github.com/thu-ml/SageAttention) ![](https://img.shields.io/github/stars/thu-ml/SageAttention) | ⭐️⭐️ |
230230
|2024.11|🔥🔥[**Squeezed Attention**] SQUEEZED ATTENTION: Accelerating Long Context Length LLM Inference(@UC Berkeley) |[[pdf]](https://arxiv.org/pdf/2411.09688)|[[SqueezedAttention]](https://github.com/SqueezeAILab/SqueezedAttention) ![](https://img.shields.io/github/stars/SqueezeAILab/SqueezedAttention) | ⭐️⭐️ |
231231
|2024.12|🔥🔥[**TurboAttention**] TURBOATTENTION: EFFICIENT ATTENTION APPROXIMATION FOR HIGH THROUGHPUTS LLMS(@Microsoft)|[[pdf]](https://arxiv.org/pdf/2412.08585)| ⚠️ |⭐️⭐️ |
232-
|2025.01|🔥🔥[**FFPA**] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@DefTruth)|[[docs]](https://github.com/DefTruth/cuffpa-py)| [[cuffpa-py]](https://github.com/DefTruth/cuffpa-py) ![](https://img.shields.io/github/stars/DefTruth/cuffpa-py)|⭐️⭐️ |
232+
|2025.01|🔥🔥[**FFPA**] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@DefTruth)|[[docs]](https://github.com/DefTruth/ffpa-attn-mma)| [[ffpa-attn-mma]](https://github.com/DefTruth/ffpa-attn-mma) ![](https://img.shields.io/github/stars/DefTruth/ffpa-attn-mma)|⭐️⭐️ |
233233

234234

235235

0 commit comments

Comments
 (0)