Skip to content

Commit 32fdb84

Browse files
authored
🔥[BatchLLM] BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching (#104)
🔥[BatchLLM] BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
1 parent 9bb3f6a commit 32fdb84

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,8 +151,9 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
151151
|2023.10|[LightSeq] LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers(@UC Berkeley etc)|[[pdf]](https://arxiv.org/pdf/2310.03294.pdf)|[[LightSeq]](https://github.com/RulinShao/LightSeq) ![](https://img.shields.io/github/stars/RulinShao/LightSeq.svg?style=social)|⭐️ |
152152
|2024.05|🔥[vAttention] vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention(@Microsoft Research India)|[[pdf]](https://arxiv.org/pdf/2405.04437)|[[vAttention]](https://github.com/microsoft/vattention) ![](https://img.shields.io/github/stars/microsoft/vattention.svg?style=social)|⭐️⭐️ |
153153
|2024.07|🔥🔥[**vTensor**] vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving(@Shanghai Jiao Tong University etc)|[[pdf]](https://arxiv.org/pdf/2407.15309)|[[vTensor]](https://github.com/intelligent-machine-learning/glake/tree/master/GLakeServe) ![](https://img.shields.io/github/stars/intelligent-machine-learning/glake.svg?style=social)|⭐️⭐️ |
154-
|2024.08| 🔥[Automatic Inference Engine Tuning] Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning(@Nanjing University etc)|[[pdf]](https://arxiv.org/pdf/2408.04323)|⚠️|⭐️⭐️ |
154+
|2024.08|🔥[Automatic Inference Engine Tuning] Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning(@Nanjing University etc)|[[pdf]](https://arxiv.org/pdf/2408.04323)|⚠️|⭐️⭐️ |
155155
|2024.08|🔥[**SJF Scheduling**] Efficient LLM Scheduling by Learning to Rank(@UCSD etc)|[[pdf]](https://arxiv.org/pdf/2408.15792)|⚠️|⭐️⭐️ |
156+
|2024.12|🔥[**BatchLLM**] BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching(@Microsoft)|[[pdf]](https://arxiv.org/pdf/2412.03594)|⚠️|⭐️⭐️ |
156157

157158
### 📖Weight/Activation Quantize/Compress ([©️back👆🏻](#paperlist))
158159
<div id="Weight-Activation-Quantize-Compress"></div>

0 commit comments

Comments
 (0)