Skip to content

Commit 8a0ae90

Browse files
authored
🔥[KV Cache Prefetch] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching (#133)
1 parent d419c4c commit 8a0ae90

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -325,9 +325,10 @@ python3 download_pdfs.py # The code is generated by Doubao AI
325325
|2024.10|🔥[**AdaKV**] Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference (@USTC)|[[pdf]](https://arxiv.org/abs/2407.11550)|[[AdaKV]](https://github.com/FFY0/AdaKV) ![](https://img.shields.io/github/stars/FFY0/AdaKV.svg?style=social&label=Star)|⭐️⭐️|
326326
|2024.11|🔥[**KV Cache Recomputation**] Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation(@University of Southern California)|[[pdf]](https://arxiv.org/pdf/2411.17089)|⚠️|⭐️⭐️ |
327327
|2024.12|🔥[**ClusterKV**] ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression(@sjtu)|[[pdf]](https://arxiv.org/pdf/2412.03213)|⚠️|⭐️⭐️ |
328-
|2024.12| 🔥[**DynamicKV**] DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs(@xiabinzhou0625 etc)|[[pdf]](https://arxiv.org/pdf/2412.14838)|⚠️|⭐️⭐️ |
329-
|2025.02| 🔥[**DynamicLLaVA**] [ICLR2025] Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification (@ECNU, Xiaohongshu)|[[pdf]](https://arxiv.org/pdf/2412.00876)|[[DynamicLLaVA]](https://github.com/Osilly/dynamic_llava) ![](https://img.shields.io/github/stars/Osilly/dynamic_llava.svg?style=social&label=Star)|⭐️⭐️|
328+
|2024.12|🔥[**DynamicKV**] DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs(@xiabinzhou0625 etc)|[[pdf]](https://arxiv.org/pdf/2412.14838)|⚠️|⭐️⭐️ |
329+
|2025.02|🔥[**DynamicLLaVA**] [ICLR2025] Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification (@ECNU, Xiaohongshu)|[[pdf]](https://arxiv.org/pdf/2412.00876)|[[DynamicLLaVA]](https://github.com/Osilly/dynamic_llava) ![](https://img.shields.io/github/stars/Osilly/dynamic_llava.svg?style=social&label=Star)|⭐️⭐️|
330330
|2025.02|🔥[**CacheCraft**] Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation(@Adobe Research)|[[pdf]](https://www.arxiv.org/pdf/2502.15734)|⚠️|⭐️⭐️ |
331+
|2025.04|🔥[**KV Cache Prefetch**] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching(@Alibaba)|[[pdf]](https://arxiv.org/pdf/2504.06319)|⚠️|⭐️⭐️ |
331332

332333
### 📖Prompt/Context/KV Compression ([©️back👆🏻](#paperlist))
333334
<div id="Context-Compression"></div>

0 commit comments

Comments
 (0)