You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- 🌀 [24/07/03] We're excited to announce the release of [MInference](https://aka.ms/MInference) to speed up Long-context LLMs' inference, reduces inference latency by up to **10X** for pre-filling on an A100 while maintaining accuracy in **1M tokens prompt**! For more information, check out our [paper](https://arxiv.org/abs/2407.02490), visit the [project page](https://aka.ms/MInference).
23
23
- 🧩 LLMLingua has been integrated into [Prompt flow](https://microsoft.github.io/promptflow/integrations/tools/llmlingua-prompt-compression-tool.html), a streamlined tool framework for LLM-based AI applications.
24
-
- 🦚 We're excited to announce the release of **LLMLingua-2**, boasting a 3x-6x speed improvement over LLMLingua! For more information, check out our [paper](https://arxiv.org/abs/2403.12968), visit the [project page](https://llmlingua.com/llmlingua2.html), and explore our [demo](https://huggingface.co/spaces/microsoft/LLMLingua-2).
24
+
- 🦚 We're excited to announce the release of **LLMLingua-2**, boasting a 3x-6x speed improvement over LLMLingua! For more information, check out our [paper](https://aclanthology.org/2024.findings-acl.57/), visit the [project page](https://llmlingua.com/llmlingua2.html), and explore our [demo](https://huggingface.co/spaces/microsoft/LLMLingua-2).
25
25
- 👾 LLMLingua has been integrated into [LangChain](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/retrievers/llmlingua.ipynb) and [LlamaIndex](https://github.com/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/LongLLMLingua.ipynb), two widely-used RAG frameworks.
26
26
- 🤳 Talk slides are available in [AI Time Jan, 24](https://drive.google.com/file/d/1fzK3wOvy2boF7XzaYuq2bQ3jFeP1WMk3/view?usp=sharing).
27
27
- 🖥 EMNLP'23 slides are available in [Session 5](https://drive.google.com/file/d/1GxQLAEN8bBB2yiEdQdW4UKoJzZc0es9t/view) and [BoF-6](https://drive.google.com/file/d/1LJBUfJrKxbpdkwo13SgPOqugk-UjLVIF/view).
@@ -38,12 +38,12 @@ LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLa
38
38
39
39
LongLLMLingua mitigates the 'lost in the middle' issue in LLMs, enhancing long-context information processing. It reduces costs and boosts efficiency with prompt compression, improving RAG performance by up to 21.4% using only 1/4 of the tokens.
40
40
41
-
-[LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://arxiv.org/abs/2310.06839) (ACL 2024 and ICLR ME-FoMo 2024)<br>
41
+
-[LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://aclanthology.org/2024.acl-long.91/) (ACL 2024 and ICLR ME-FoMo 2024)<br>
42
42
_Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
43
43
44
44
LLMLingua-2, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.
45
45
46
-
-[LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression](https://arxiv.org/abs/2403.12968) (ACL 2024 Findings)<br>
46
+
-[LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression](https://aclanthology.org/2024.findings-acl.57/) (ACL 2024 Findings)<br>
title = "{LLML}ingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression",
110
120
author = "Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Menglin Xia and Xufang Luo and Jue Zhang and Qingwei Lin and Victor Ruhle and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Dongmei Zhang",
111
-
url = "https://arxiv.org/abs/2403.12968",
112
-
journal = "ArXiv preprint",
113
-
volume = "abs/2403.12968",
121
+
editor = "Ku, Lun-Wei and
122
+
Martins, Andre and
123
+
Srikumar, Vivek",
124
+
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
125
+
month = aug,
114
126
year = "2024",
127
+
address = "Bangkok, Thailand and virtual meeting",
128
+
publisher = "Association for Computational Linguistics",
Copy file name to clipboardExpand all lines: examples/LLMLingua2.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@
20
20
"cell_type": "markdown",
21
21
"metadata": {},
22
22
"source": [
23
-
"<a target=\"_blank\" href=\"https://arxiv.org/abs/2403.12968\">LLMLingua-2</a> focuses on task-agnostic prompt compression for better generalizability and efficiency. It is a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in <b>task-agnostic compression</b>. It surpasses LLMLingua in handling <b>out-of-domain data</b>, offering <b>3x-6x faster</b> performance.\n",
23
+
"<a target=\"_blank\" href=\"https://aclanthology.org/2024.findings-acl.57/\">LLMLingua-2</a> focuses on task-agnostic prompt compression for better generalizability and efficiency. It is a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in <b>task-agnostic compression</b>. It surpasses LLMLingua in handling <b>out-of-domain data</b>, offering <b>3x-6x faster</b> performance.\n",
24
24
"\n",
25
25
"Below, We showcase the usage and compression results of <i>LLMLingua-2</i> on both <b>in-domain</b> and <b>out-of-domain</b> datasets, including various tasks such as single-document QA, multi-document QA, summarization and in-context learning.\n"
Copy file name to clipboardExpand all lines: examples/RAG.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@
39
39
"id": "0b39b33f-5860-4825-8f00-d60aed0dce86",
40
40
"metadata": {},
41
41
"source": [
42
-
"To address this, we propose [**LongLLMLingua**](https://arxiv.org/abs/2310.06839), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the <font color='red'>**red line**</font>, which significantly improves the original curve:\n",
42
+
"To address this, we propose [**LongLLMLingua**](https://aclanthology.org/2024.acl-long.91/), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the <font color='red'>**red line**</font>, which significantly improves the original curve:\n",
43
43
"\n",
44
44
"- Coarse-grained compression through document-level perplexity;\n",
45
45
"- Fine-grained compression of the remaining text using token perplexity;"
Copy file name to clipboardExpand all lines: examples/RAGLlamaIndex.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -82,7 +82,7 @@
82
82
"id": "0b39b33f-5860-4825-8f00-d60aed0dce86",
83
83
"metadata": {},
84
84
"source": [
85
-
"To address this, we propose [**LongLLMLingua**](https://arxiv.org/abs/2310.06839), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the <font color='red'>**red line**</font>, which significantly improves the original curve:\n",
85
+
"To address this, we propose [**LongLLMLingua**](https://aclanthology.org/2024.acl-long.91/), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the <font color='red'>**red line**</font>, which significantly improves the original curve:\n",
86
86
"\n",
87
87
"- Coarse-grained compression through document-level perplexity;\n",
88
88
"- Fine-grained compression of the remaining text using token perplexity;"
0 commit comments