You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|2024.12|🔥🔥[**TurboAttention**] TURBOATTENTION: EFFICIENT ATTENTION APPROXIMATION FOR HIGH THROUGHPUTS LLMS(@Microsoft)|[[pdf]](https://arxiv.org/pdf/2412.08585)| ⚠️ |⭐️⭐️ |
278
-
|2025.01|🔥🔥[**FFPA**] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@xlite-dev)|[[docs]](https://github.com/xlite-dev/ffpa-attn-mma)|[[ffpa-attn-mma]](https://github.com/xlite-dev/ffpa-attn-mma)|⭐️⭐️ |
278
+
|2025.01|🔥🔥[**FFPA**] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@xlite-dev)|[[docs]](https://github.com/xlite-dev/ffpa-attn)|[[ffpa-attn]](https://github.com/xlite-dev/ffpa-attn)|⭐️⭐️ |
279
279
|2025.03|🔥🔥[**SpargeAttention**] SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference(@thu-ml)|[[pdf]](https://arxiv.org/pdf/2502.18137)|[[SpargeAttn]](https://github.com/thu-ml/SpargeAttn)| ⭐️⭐️ |
280
280
|2025.04|🔥🔥[**MMInference**] MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention(@microsoft) |[[pdf]](https://arxiv.org/pdf/2504.16083)|[[MInference]](https://github.com/microsoft/MInference/)| ⭐️⭐️ |
281
281
|2025.04|🔥🔥[**Sparse Frontier**] The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs (@Cohere) |[[pdf]](https://arxiv.org/pdf/2504.17768)|[[SparseFrontier]](https://github.com/PiotrNawrot/sparse-frontier)| ⭐️⭐️ |
0 commit comments