You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -206,8 +206,9 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
206
206
|2024.04|[SqueezeAttention] SQUEEZEATTENTION: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget(@lzu.edu.cn etc)|[[pdf]](https://arxiv.org/pdf/2404.04793.pdf)|[[SqueezeAttention]](https://github.com/hetailang/SqueezeAttention)|⭐️⭐️ |
207
207
|2024.04|[SnapKV] SnapKV: LLM Knows What You are Looking for Before Generation(@UIUC)|[[pdf]](https://arxiv.org/pdf/2404.14469)|[[SnapKV]](https://github.com/FasterDecoding/SnapKV)|⭐️ |
208
208
|2024.05|🔥[vAttention] vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention(@Microsoft Research India)|[[pdf]](https://arxiv.org/pdf/2405.04437)|⚠️|⭐️⭐️ |
209
-
|2024.05|[KVCache-1Bit] KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization(@Rice University)|[[pdf]](https://arxiv.org/pdf/2405.03917)|⚠️|⭐️⭐️ |
|2024.05|🔥[KVCache-1Bit] KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization(@Rice University)|[[pdf]](https://arxiv.org/pdf/2405.03917)|⚠️|⭐️⭐️ |
|2024.05|🔥[ZipCache] ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification(@Zhejiang University etc)|[[pdf]](https://arxiv.org/pdf/2405.14256)|⚠️|⭐️⭐️ |
0 commit comments