Skip to content

Commit d26c24a

Browse files
authored
[Token Recycling] Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling (#38)
1 parent 7994773 commit d26c24a

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -319,6 +319,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
319319
|2024.05|🔥[S3D] S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs(@lge.com)|[[pdf]](https://arxiv.org/pdf/2405.20314)| ⚠️ |⭐️|
320320
|2024.06|🔥[**Parallel Decoding**] Exploring and Improving Drafts in Blockwise Parallel Decoding(@KAIST&Google Research)| [[pdf]](https://arxiv.org/pdf/2404.09221)|⚠️ |⭐️⭐️ |
321321
|2024.07| [Multi-Token Speculative Decoding] Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference(@University of California, etc)| [[pdf]](https://arxiv.org/pdf/2404.09221)|⚠️ |⭐️⭐️ |
322+
|2024.08| [Token Recycling] Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling(@ir.hit.edu.cn etc) | [[pdf]](https://arxiv.org/pdf/2408.08696)|⚠️ |⭐️⭐️ |
322323

323324

324325
### 📖Structured Prune/KD/Weight Sparse ([©️back👆🏻](#paperlist))

0 commit comments

Comments
 (0)