🔥[KV Cache Prefetch] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching (#133)

DefTruth · web-flow · commit 8a0ae90fb3b4 · 2025-04-12T11:39:02.000+08:00
diff --git a/README.md b/README.md
@@ -325,9 +325,10 @@ python3 download_pdfs.py # The code is generated by Doubao AI
 |2024.10|🔥[**AdaKV**] Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference (@USTC)|[[pdf]](https://arxiv.org/abs/2407.11550)|[[AdaKV]](https://github.com/FFY0/AdaKV) ![](https://img.shields.io/github/stars/FFY0/AdaKV.svg?style=social&label=Star)|⭐️⭐️| 
 |2024.11|🔥[**KV Cache Recomputation**] Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation(@University of Southern California)|[[pdf]](https://arxiv.org/pdf/2411.17089)|⚠️|⭐️⭐️ |  
 |2024.12|🔥[**ClusterKV**] ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression(@sjtu)|[[pdf]](https://arxiv.org/pdf/2412.03213)|⚠️|⭐️⭐️ | 
-|2024.12| 🔥[**DynamicKV**] DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs(@xiabinzhou0625 etc)|[[pdf]](https://arxiv.org/pdf/2412.14838)|⚠️|⭐️⭐️ |
-|2025.02| 🔥[**DynamicLLaVA**] [ICLR2025] Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification (@ECNU, Xiaohongshu)|[[pdf]](https://arxiv.org/pdf/2412.00876)|[[DynamicLLaVA]](https://github.com/Osilly/dynamic_llava) ![](https://img.shields.io/github/stars/Osilly/dynamic_llava.svg?style=social&label=Star)|⭐️⭐️| 
+|2024.12|🔥[**DynamicKV**] DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs(@xiabinzhou0625 etc)|[[pdf]](https://arxiv.org/pdf/2412.14838)|⚠️|⭐️⭐️ |
+|2025.02|🔥[**DynamicLLaVA**] [ICLR2025] Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification (@ECNU, Xiaohongshu)|[[pdf]](https://arxiv.org/pdf/2412.00876)|[[DynamicLLaVA]](https://github.com/Osilly/dynamic_llava) ![](https://img.shields.io/github/stars/Osilly/dynamic_llava.svg?style=social&label=Star)|⭐️⭐️| 
 |2025.02|🔥[**CacheCraft**] Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation(@Adobe Research)|[[pdf]](https://www.arxiv.org/pdf/2502.15734)|⚠️|⭐️⭐️ |
+|2025.04|🔥[**KV Cache Prefetch**] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching(@Alibaba)|[[pdf]](https://arxiv.org/pdf/2504.06319)|⚠️|⭐️⭐️ |
 
 ### 📖Prompt/Context/KV Compression ([©️back👆🏻](#paperlist))    
 <div id="Context-Compression"></div>