Update README.md (#152)

CStanKonrad · web-flow · commit 1250b6073dd6 · 2025-06-16T20:12:38.000+08:00
Adding Inference-Time Hyper-Scaling with KV Cache Compression
diff --git a/README.md b/README.md
@@ -351,6 +351,7 @@ python3 download_pdfs.py # The code is generated by Doubao AI
 |2025.02|🔥[**CacheCraft**] Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation(@Adobe Research)|[[pdf]](https://www.arxiv.org/pdf/2502.15734)|⚠️|⭐️⭐️ |
 |2025.04|🔥[**KV Cache Prefetch**] Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching(@Alibaba)|[[pdf]](https://arxiv.org/pdf/2504.06319)|⚠️|⭐️⭐️ |
 |2025.05|🔥[**KVzip**] KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction (@SNU)|[[pdf]](https://arxiv.org/abs/2505.23416)|[[KVzip]](https://github.com/snu-mllab/KVzip) ![](https://img.shields.io/github/stars/snu-mllab/KVzip.svg?style=social&label=Star)|⭐️⭐️|
+|2025.06|🔥🔥[**Inference-Time Hyper-Scaling**] Inference-Time Hyper-Scaling with KV Cache Compression (@NVIDIA)|[[pdf]](https://arxiv.org/pdf/2506.05345)|⚠️|⭐️⭐️ |
 
 ### 📖Prompt/Context/KV Compression ([©️back👆🏻](#paperlist))
 <div id="Context-Compression"></div>