Skip to content

Commit 1250b60

Browse files
authored
Update README.md (#152)
Adding Inference-Time Hyper-Scaling with KV Cache Compression
1 parent ecd9a45 commit 1250b60

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -351,6 +351,7 @@ python3 download_pdfs.py # The code is generated by Doubao AI
351351
|2025.02|🔥[**CacheCraft**] Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation(@Adobe Research)|[[pdf]](https://www.arxiv.org/pdf/2502.15734)|⚠️|⭐️⭐️ |
352352
|2025.04|🔥[**KV Cache Prefetch**] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching(@Alibaba)|[[pdf]](https://arxiv.org/pdf/2504.06319)|⚠️|⭐️⭐️ |
353353
|2025.05|🔥[**KVzip**] KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction (@SNU)|[[pdf]](https://arxiv.org/abs/2505.23416)|[[KVzip]](https://github.com/snu-mllab/KVzip) ![](https://img.shields.io/github/stars/snu-mllab/KVzip.svg?style=social&label=Star)|⭐️⭐️|
354+
|2025.06|🔥🔥[**Inference-Time Hyper-Scaling**] Inference-Time Hyper-Scaling with KV Cache Compression (@NVIDIA)|[[pdf]](https://arxiv.org/pdf/2506.05345)|⚠️|⭐️⭐️ |
354355

355356
### 📖Prompt/Context/KV Compression ([©️back👆🏻](#paperlist))
356357
<div id="Context-Compression"></div>

0 commit comments

Comments
 (0)