You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-8Lines changed: 11 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -308,14 +308,17 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
308
308
<divid="Early-Exit"></div>
309
309
310
310
|Date|Title|Paper|Code|Recom|
311
-
|:---:|:---:|:---:|:---:|:---:|
312
-
|2020.04|[DeeBERT] DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference(@uwaterloo.ca)|[[pdf]](https://arxiv.org/pdf/2004.12993.pdf)|⚠️|⭐️ |
313
-
|2021.06|[BERxiT] BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression(@uwaterloo.ca)|[[pdf]](https://aclanthology.org/2021.eacl-main.8.pdf)|[[berxit]](https://github.com/castorini/berxit)|⭐️ |
314
-
|2023.10|🔥[**LITE**] Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE(@Arizona State University) |[[pdf]](https://arxiv.org/pdf/2310.18581v2.pdf)|⚠️|⭐️⭐️ |
315
-
|2023.12|🔥🔥[**EE-LLM**] EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism(@alibaba-inc.com) |[[pdf]](https://arxiv.org/pdf/2312.04916.pdf)|[[EE-LLM]](https://github.com/pan-x-c/EE-LLM)|⭐️⭐️ |
316
-
|2023.10|🔥[**FREE**] Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding(@KAIST AI&AWS AI)|[[pdf]](https://arxiv.org/pdf/2310.05424.pdf)|[[fast_robust_early_exit]](https://github.com/raymin0223/fast_robust_early_exit)|⭐️⭐️ |
317
-
|2024.07|[Skip Attention] Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models(@University College London)|[[pdf]](https://arxiv.org/pdf/2407.15516)|⚠️|⭐️⭐️ |
318
-
|2024.08|[**KOALA**] KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning(@Dalian University)|[[pdf]](https://arxiv.org/pdf/2408.08146)|⚠️|⭐️⭐️ |
311
+
|:---:|:---:|:---:|:---:|:---:|
312
+
|2020.04|[DeeBERT] DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference(@uwaterloo.ca)|[[pdf]](https://arxiv.org/pdf/2004.12993.pdf)|⚠️|⭐️ |
313
+
|2020.04|[FastBERT] FastBERT: a Self-distilling BERT with Adaptive Inference Time(@PKU)|[[pdf]](https://aclanthology.org/2020.acl-main.537.pdf)|[[FastBERT]](https://github.com/autoliuweijie/FastBERT)|⭐️ |
314
+
|2021.06|[BERxiT] BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression(@uwaterloo.ca)|[[pdf]](https://aclanthology.org/2021.eacl-main.8.pdf)|[[berxit]](https://github.com/castorini/berxit)|⭐️ |
315
+
|2023.06|🔥[**SkipDecode**] SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference(@Microsoft) |[[pdf]](https://arxiv.org/pdf/2307.02628)|⚠️|⭐️ |
316
+
|2023.10|🔥[**LITE**] Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE(@Arizona State University) |[[pdf]](https://arxiv.org/pdf/2310.18581v2.pdf)|⚠️|⭐️⭐️ |
317
+
|2023.12|🔥🔥[**EE-LLM**] EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism(@alibaba-inc.com) |[[pdf]](https://arxiv.org/pdf/2312.04916.pdf)|[[EE-LLM]](https://github.com/pan-x-c/EE-LLM)|⭐️⭐️ |
318
+
|2023.10|🔥[**FREE**] Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding(@KAIST AI&AWS AI)|[[pdf]](https://arxiv.org/pdf/2310.05424.pdf)|[[fast_robust_early_exit]](https://github.com/raymin0223/fast_robust_early_exit)|⭐️⭐️ |
319
+
|2024.02|🔥[**EE-Tuning**] EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models(@alibaba-inc.com)|[[pdf]](https://arxiv.org/pdf/2402.00518)|[[EE-Tuning]](https://github.com/pan-x-c/EE-LLM)|⭐️⭐️ |
320
+
|2024.07|[Skip Attention] Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models(@University College London)|[[pdf]](https://arxiv.org/pdf/2407.15516)|⚠️|⭐️⭐️ |
321
+
|2024.08|[**KOALA**] KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning(@Dalian University)|[[pdf]](https://arxiv.org/pdf/2408.08146)|⚠️|⭐️⭐️ |
0 commit comments