Skip to content

Commit 53ca177

Browse files
authored
early exit of LLM inference (#85)
* add some early exit work * add some early exit work
1 parent 7ba03a6 commit 53ca177

File tree

1 file changed

+11
-8
lines changed

1 file changed

+11
-8
lines changed

README.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -308,14 +308,17 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
308308
<div id="Early-Exit"></div>
309309

310310
|Date|Title|Paper|Code|Recom|
311-
|:---:|:---:|:---:|:---:|:---:|
312-
|2020.04|[DeeBERT] DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference(@uwaterloo.ca)|[[pdf]](https://arxiv.org/pdf/2004.12993.pdf)|⚠️|⭐️ |
313-
|2021.06|[BERxiT] BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression(@uwaterloo.ca)|[[pdf]](https://aclanthology.org/2021.eacl-main.8.pdf)|[[berxit]](https://github.com/castorini/berxit) ![](https://img.shields.io/github/stars/castorini/berxit.svg?style=social)|⭐️ |
314-
|2023.10|🔥[**LITE**] Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE(@Arizona State University) | [[pdf]](https://arxiv.org/pdf/2310.18581v2.pdf)|⚠️|⭐️⭐️ |
315-
|2023.12|🔥🔥[**EE-LLM**] EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism(@alibaba-inc.com) | [[pdf]](https://arxiv.org/pdf/2312.04916.pdf)| [[EE-LLM]](https://github.com/pan-x-c/EE-LLM) ![](https://img.shields.io/github/stars/pan-x-c/EE-LLM.svg?style=social) |⭐️⭐️ |
316-
|2023.10|🔥[**FREE**] Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding(@KAIST AI&AWS AI)|[[pdf]](https://arxiv.org/pdf/2310.05424.pdf)| [[fast_robust_early_exit]](https://github.com/raymin0223/fast_robust_early_exit) ![](https://img.shields.io/github/stars/raymin0223/fast_robust_early_exit.svg?style=social) |⭐️⭐️ |
317-
|2024.07| [Skip Attention] Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models(@University College London)| [[pdf]](https://arxiv.org/pdf/2407.15516)|⚠️|⭐️⭐️ |
318-
|2024.08| [**KOALA**] KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning(@Dalian University)| [[pdf]](https://arxiv.org/pdf/2408.08146)|⚠️|⭐️⭐️ |
311+
|:---:|:---:|:---:|:---:|:---:|
312+
|2020.04|[DeeBERT] DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference(@uwaterloo.ca)|[[pdf]](https://arxiv.org/pdf/2004.12993.pdf)|⚠️|⭐️ |
313+
|2020.04|[FastBERT] FastBERT: a Self-distilling BERT with Adaptive Inference Time(@PKU)|[[pdf]](https://aclanthology.org/2020.acl-main.537.pdf)|[[FastBERT]](https://github.com/autoliuweijie/FastBERT) ![](https://img.shields.io/github/stars/autoliuweijie/FastBERT.svg?style=social)|⭐️ |
314+
|2021.06|[BERxiT] BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression(@uwaterloo.ca)|[[pdf]](https://aclanthology.org/2021.eacl-main.8.pdf)|[[berxit]](https://github.com/castorini/berxit) ![](https://img.shields.io/github/stars/castorini/berxit.svg?style=social)|⭐️ |
315+
|2023.06|🔥[**SkipDecode**] SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference(@Microsoft) | [[pdf]](https://arxiv.org/pdf/2307.02628) |⚠️|⭐️ |
316+
|2023.10|🔥[**LITE**] Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE(@Arizona State University) | [[pdf]](https://arxiv.org/pdf/2310.18581v2.pdf)|⚠️|⭐️⭐️ |
317+
|2023.12|🔥🔥[**EE-LLM**] EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism(@alibaba-inc.com) | [[pdf]](https://arxiv.org/pdf/2312.04916.pdf)| [[EE-LLM]](https://github.com/pan-x-c/EE-LLM) ![](https://img.shields.io/github/stars/pan-x-c/EE-LLM.svg?style=social) |⭐️⭐️ |
318+
|2023.10|🔥[**FREE**] Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding(@KAIST AI&AWS AI)|[[pdf]](https://arxiv.org/pdf/2310.05424.pdf)| [[fast_robust_early_exit]](https://github.com/raymin0223/fast_robust_early_exit) ![](https://img.shields.io/github/stars/raymin0223/fast_robust_early_exit.svg?style=social) |⭐️⭐️ |
319+
|2024.02|🔥[**EE-Tuning**] EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models(@alibaba-inc.com)|[[pdf]](https://arxiv.org/pdf/2402.00518)| [[EE-Tuning]](https://github.com/pan-x-c/EE-LLM) ![](https://img.shields.io/github/stars/pan-x-c/EE-LLM.svg?style=social) |⭐️⭐️ |
320+
|2024.07| [Skip Attention] Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models(@University College London)| [[pdf]](https://arxiv.org/pdf/2407.15516)|⚠️|⭐️⭐️ |
321+
|2024.08| [**KOALA**] KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning(@Dalian University)| [[pdf]](https://arxiv.org/pdf/2408.08146)|⚠️|⭐️⭐️ |
319322

320323
### 📖Parallel Decoding/Sampling ([©️back👆🏻](#paperlist))
321324
<div id="Parallel-Decoding-Sampling"></div>

0 commit comments

Comments
 (0)