🔥[SageAttention-3] Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training (#147)

DefTruth · web-flow · commit 7d153bdbb782 · 2025-05-24T10:42:26.000+08:00
diff --git a/README.md b/README.md
@@ -281,6 +281,8 @@ python3 download_pdfs.py # The code is generated by Doubao AI
 |2024.12|🔥🔥[**Flex Attention**] FLEX ATTENTION: A PROGRAMMING MODEL FOR GENERATING OPTIMIZED ATTENTION KERNELS(@pytorch) | [[pdf]](https://arxiv.org/pdf/2412.05496)|[[attention-gym]](https://github.com/pytorch-labs/attention-gym) ![](https://img.shields.io/github/stars/pytorch-labs/attention-gym) | ⭐️⭐️ |
 |2025.02| 🔥🔥🔥[**SeerAttention**] SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs(@microsoft) | [[pdf]](https://arxiv.org/abs/2410.13276) | [[SeerAttention]](https://github.com/microsoft/SeerAttention) ![](https://img.shields.io/github/stars/microsoft/SeerAttention.svg?style=social) | ⭐️⭐️⭐️ |
 |2025.03| [**Slim attention**] Slim attention: cut your context memory in half without loss of accuracy, K-cache is all you need for MHA(@OpenMachine.ai) | [[pdf]](https://arxiv.org/pdf/2503.05840) | [[OpenMchine]](https://github.com/OpenMachine-ai/transformer-tricks) ![](https://img.shields.io/github/stars/OpenMachine-ai/transformer-tricks.svg?style=social) | ⭐️⭐️⭐️ |
+|2025.05|🔥🔥[**SageAttention-3**] SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-bit Training(@thu-ml)|[[pdf]](https://arxiv.org/pdf/2505.11594)|[[SageAttention]](https://github.com/thu-ml/SageAttention) ![](https://img.shields.io/github/stars/thu-ml/SageAttention) | ⭐️⭐️ |
+
 
 
 ### 📖KV Cache Scheduling/Quantize/Dropping ([©️back👆🏻](#paperlist))