Skip to content

Commit 1ddf093

Browse files
authored
Add Multi-head Latent Attention(MLA) topic (#118)
1 parent d7914c0 commit 1ddf093

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
3838

3939
## 📖Contents
4040
* 📖[Trending LLM/VLM Topics](#Trending-LLM-VLM-Topics)🔥🔥🔥
41+
* 📖[Multi-head Latent Attention(MLA)](#mla)🔥🔥🔥
4142
* 📖[DP/MP/PP/TP/SP/CP Parallelism](#DP-MP-PP-TP-SP-CP)🔥🔥🔥
4243
* 📖[Disaggregating Prefill and Decoding](#P-D-Disaggregating)🔥🔥🔥
4344
* 📖[LLM Algorithmic/Eval Survey](#LLM-Algorithmic-Eval-Survey)
@@ -74,6 +75,17 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
7475
|2025.01|🔥🔥🔥 [**MiniMax-Text-01**] MiniMax-01: Scaling Foundation Models with Lightning Attention | [[report]](https://filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf) | [[MiniMax-01]](https://github.com/MiniMax-AI/MiniMax-01) ![](https://img.shields.io/github/stars/MiniMax-AI/MiniMax-01.svg?style=social) | ⭐️⭐️ |
7576
|2025.01|🔥🔥🔥[**DeepSeek-R1**] DeepSeek-R1 Technical Report(@deepseek-ai) | [[pdf]](https://arxiv.org/pdf/2501.12948v1) | [[DeepSeek-R1]](https://github.com/deepseek-ai/DeepSeek-R1) ![](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-R1.svg?style=social) | ⭐️⭐️ |
7677

78+
### 📖Multi-head Latent Attention(MLA) ([©️back👆🏻](#paperlist))
79+
<div id="mla"></div>
80+
81+
|Date|Title|Paper|Code|Recom|
82+
|:---:|:---:|:---:|:---:|:---:|
83+
|2024.05| 🔥🔥🔥[DeepSeek-V2] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model(@DeepSeek-AI)|[[pdf]](https://arxiv.org/pdf/2405.04434) | [[DeepSeek-V2]](https://github.com/deepseek-ai/DeepSeek-V2) ![](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-V2.svg?style=social)| ⭐️⭐️ |
84+
|2024.12|🔥🔥🔥[**DeepSeek-V3**] DeepSeek-V3 Technical Report(@deepseek-ai) | [[pdf]](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf) | [[DeepSeek-V3]](https://github.com/deepseek-ai/DeepSeek-V3) ![](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-V3.svg?style=social) | ⭐️⭐️ |
85+
|2025.01|🔥🔥🔥[**DeepSeek-R1**] DeepSeek-R1 Technical Report(@deepseek-ai) | [[pdf]](https://arxiv.org/pdf/2501.12948v1) | [[DeepSeek-R1]](https://github.com/deepseek-ai/DeepSeek-R1) ![](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-R1.svg?style=social) | ⭐️⭐️ |
86+
|2025.02|🔥🔥🔥[**TransMLA**] TransMLA: Multi-head Latent Attention Is All You Need(@PKU)|[[pdf]](https://arxiv.org/pdf/2502.07864)|[[TransMLA]](https://github.com/fxmeng/TransMLA) ![](https://img.shields.io/github/stars/fxmeng/TransMLA.svg?style=social) | ⭐️⭐️ |
87+
88+
7789
### 📖DP/MP/PP/TP/SP/CP Parallelism ([©️back👆🏻](#paperlist))
7890
<div id="DP-MP-PP-TP-SP-CP"></div>
7991

0 commit comments

Comments
 (0)