You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -253,6 +253,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
253
253
|2024.08|🔥[Zero-Delay QKV Compression] Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference(@University of Virginia)|[[pdf]](https://arxiv.org/pdf/2408.04107)|⚠️|⭐️⭐️ |
254
254
|2024.09|🔥[**AlignedKV**] AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization(@Tsinghua University)|[[pdf]](https://arxiv.org/pdf/2409.16546)|[[AlignedKV]](https://github.com/AlignedQuant/AlignedKV)|⭐️ |
255
255
|2024.10|🔥[**LayerKV**] Optimizing Large Language Model Serving with Layer-wise KV Cache Management(@Ant Group)|[[pdf]](https://arxiv.org/pdf/2410.00428)|⚠️|⭐️⭐️ |
256
+
|2024.10|🔥[**AdaKV**] Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference (@USTC)|[[pdf]](https://arxiv.org/abs/2407.11550)|[[AdaKV]](https://github.com/FFY0/AdaKV)|⭐️⭐️|
0 commit comments