Skip to content

Commit 0525c4d

Browse files
authored
🔥[DeepSeek-NSA] Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (#119)
1 parent 1ddf093 commit 0525c4d

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
3838

3939
## 📖Contents
4040
* 📖[Trending LLM/VLM Topics](#Trending-LLM-VLM-Topics)🔥🔥🔥
41-
* 📖[Multi-head Latent Attention(MLA)](#mla)🔥🔥🔥
41+
* 📖[DeepSeek/Multi-head Latent Attention(MLA)](#mla)🔥🔥🔥
4242
* 📖[DP/MP/PP/TP/SP/CP Parallelism](#DP-MP-PP-TP-SP-CP)🔥🔥🔥
4343
* 📖[Disaggregating Prefill and Decoding](#P-D-Disaggregating)🔥🔥🔥
4444
* 📖[LLM Algorithmic/Eval Survey](#LLM-Algorithmic-Eval-Survey)
@@ -75,7 +75,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
7575
|2025.01|🔥🔥🔥 [**MiniMax-Text-01**] MiniMax-01: Scaling Foundation Models with Lightning Attention | [[report]](https://filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf) | [[MiniMax-01]](https://github.com/MiniMax-AI/MiniMax-01) ![](https://img.shields.io/github/stars/MiniMax-AI/MiniMax-01.svg?style=social) | ⭐️⭐️ |
7676
|2025.01|🔥🔥🔥[**DeepSeek-R1**] DeepSeek-R1 Technical Report(@deepseek-ai) | [[pdf]](https://arxiv.org/pdf/2501.12948v1) | [[DeepSeek-R1]](https://github.com/deepseek-ai/DeepSeek-R1) ![](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-R1.svg?style=social) | ⭐️⭐️ |
7777

78-
### 📖Multi-head Latent Attention(MLA) ([©️back👆🏻](#paperlist))
78+
### 📖DeepSeek/Multi-head Latent Attention(MLA) ([©️back👆🏻](#paperlist))
7979
<div id="mla"></div>
8080

8181
|Date|Title|Paper|Code|Recom|
@@ -84,7 +84,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
8484
|2024.12|🔥🔥🔥[**DeepSeek-V3**] DeepSeek-V3 Technical Report(@deepseek-ai) | [[pdf]](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf) | [[DeepSeek-V3]](https://github.com/deepseek-ai/DeepSeek-V3) ![](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-V3.svg?style=social) | ⭐️⭐️ |
8585
|2025.01|🔥🔥🔥[**DeepSeek-R1**] DeepSeek-R1 Technical Report(@deepseek-ai) | [[pdf]](https://arxiv.org/pdf/2501.12948v1) | [[DeepSeek-R1]](https://github.com/deepseek-ai/DeepSeek-R1) ![](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-R1.svg?style=social) | ⭐️⭐️ |
8686
|2025.02|🔥🔥🔥[**TransMLA**] TransMLA: Multi-head Latent Attention Is All You Need(@PKU)|[[pdf]](https://arxiv.org/pdf/2502.07864)|[[TransMLA]](https://github.com/fxmeng/TransMLA) ![](https://img.shields.io/github/stars/fxmeng/TransMLA.svg?style=social) | ⭐️⭐️ |
87-
87+
|2025.02|🔥🔥🔥[**DeepSeek-NSA**] Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention(@deepseek-ai)| [[pdf]](https://arxiv.org/pdf/2502.11089)| ⚠️|⭐️⭐️ |
8888

8989
### 📖DP/MP/PP/TP/SP/CP Parallelism ([©️back👆🏻](#paperlist))
9090
<div id="DP-MP-PP-TP-SP-CP"></div>

0 commit comments

Comments
 (0)