Skip to content

Commit 2dbdbd3

Browse files
iofu728QianhuiWuXufangLuopzs19mydmdm
authored
Feature(LLMLingua): update ACL links (#176)
* Feature(LLMLingua): update ACL links * Fix(LLMLingua): fix the nltk in unittest * Fix(LLMLingua): fix unitest Co-authored-by: Qianhui Wu <[email protected]> Co-authored-by: Xufang Luo <[email protected]> Co-authored-by: panzs <[email protected]> Co-authored-by: Yuqing Yang <[email protected]>
1 parent 9814309 commit 2dbdbd3

File tree

5 files changed

+36
-19
lines changed

5 files changed

+36
-19
lines changed

.github/workflows/unittest.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,8 @@ jobs:
3232
- name: Install packages and dependencies for all tests
3333
run: |
3434
python -m pip install --upgrade pip wheel
35-
pip install pytest pytest-xdist
35+
pip install pytest pytest-xdist nltk
36+
python -c "import nltk; nltk.download('punkt_tab')"
3637
3738
- name: Install packages
3839
run: |

README.md

Lines changed: 31 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@
1010
<p align="center">
1111
| <a href="https://llmlingua.com/"><b>Project Page</b></a> |
1212
<a href="https://aclanthology.org/2023.emnlp-main.825/"><b>LLMLingua</b></a> |
13-
<a href="https://arxiv.org/abs/2310.06839"><b>LongLLMLingua</b></a> |
14-
<a href="https://arxiv.org/abs/2403.12968"><b>LLMLingua-2</b></a> |
13+
<a href="https://aclanthology.org/2024.acl-long.91/"><b>LongLLMLingua</b></a> |
14+
<a href="https://aclanthology.org/2024.findings-acl.57/"><b>LLMLingua-2</b></a> |
1515
<a href="https://huggingface.co/spaces/microsoft/LLMLingua"><b>LLMLingua Demo</b></a> |
1616
<a href="https://huggingface.co/spaces/microsoft/LLMLingua-2"><b>LLMLingua-2 Demo</b></a> |
1717
</p>
@@ -21,7 +21,7 @@ https://github.com/microsoft/LLMLingua/assets/30883354/eb0ea70d-6d4c-4aa7-8977-6
2121
## News
2222
- 🌀 [24/07/03] We're excited to announce the release of [MInference](https://aka.ms/MInference) to speed up Long-context LLMs' inference, reduces inference latency by up to **10X** for pre-filling on an A100 while maintaining accuracy in **1M tokens prompt**! For more information, check out our [paper](https://arxiv.org/abs/2407.02490), visit the [project page](https://aka.ms/MInference).
2323
- 🧩 LLMLingua has been integrated into [Prompt flow](https://microsoft.github.io/promptflow/integrations/tools/llmlingua-prompt-compression-tool.html), a streamlined tool framework for LLM-based AI applications.
24-
- 🦚 We're excited to announce the release of **LLMLingua-2**, boasting a 3x-6x speed improvement over LLMLingua! For more information, check out our [paper](https://arxiv.org/abs/2403.12968), visit the [project page](https://llmlingua.com/llmlingua2.html), and explore our [demo](https://huggingface.co/spaces/microsoft/LLMLingua-2).
24+
- 🦚 We're excited to announce the release of **LLMLingua-2**, boasting a 3x-6x speed improvement over LLMLingua! For more information, check out our [paper](https://aclanthology.org/2024.findings-acl.57/), visit the [project page](https://llmlingua.com/llmlingua2.html), and explore our [demo](https://huggingface.co/spaces/microsoft/LLMLingua-2).
2525
- 👾 LLMLingua has been integrated into [LangChain](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/retrievers/llmlingua.ipynb) and [LlamaIndex](https://github.com/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/LongLLMLingua.ipynb), two widely-used RAG frameworks.
2626
- 🤳 Talk slides are available in [AI Time Jan, 24](https://drive.google.com/file/d/1fzK3wOvy2boF7XzaYuq2bQ3jFeP1WMk3/view?usp=sharing).
2727
- 🖥 EMNLP'23 slides are available in [Session 5](https://drive.google.com/file/d/1GxQLAEN8bBB2yiEdQdW4UKoJzZc0es9t/view) and [BoF-6](https://drive.google.com/file/d/1LJBUfJrKxbpdkwo13SgPOqugk-UjLVIF/view).
@@ -38,12 +38,12 @@ LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLa
3838

3939
LongLLMLingua mitigates the 'lost in the middle' issue in LLMs, enhancing long-context information processing. It reduces costs and boosts efficiency with prompt compression, improving RAG performance by up to 21.4% using only 1/4 of the tokens.
4040

41-
- [LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://arxiv.org/abs/2310.06839) (ACL 2024 and ICLR ME-FoMo 2024)<br>
41+
- [LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://aclanthology.org/2024.acl-long.91/) (ACL 2024 and ICLR ME-FoMo 2024)<br>
4242
_Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
4343

4444
LLMLingua-2, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.
4545

46-
- [LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression](https://arxiv.org/abs/2403.12968) (ACL 2024 Findings)<br>
46+
- [LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression](https://aclanthology.org/2024.findings-acl.57/) (ACL 2024 Findings)<br>
4747
_Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang_
4848

4949
## 🎥 Overview
@@ -83,9 +83,13 @@ If you find this repo helpful, please cite the following papers:
8383
@inproceedings{jiang-etal-2023-llmlingua,
8484
title = "{LLML}ingua: Compressing Prompts for Accelerated Inference of Large Language Models",
8585
author = "Huiqiang Jiang and Qianhui Wu and Chin-Yew Lin and Yuqing Yang and Lili Qiu",
86+
editor = "Bouamor, Houda and
87+
Pino, Juan and
88+
Bali, Kalika",
8689
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
8790
month = dec,
8891
year = "2023",
92+
address = "Singapore",
8993
publisher = "Association for Computational Linguistics",
9094
url = "https://aclanthology.org/2023.emnlp-main.825",
9195
doi = "10.18653/v1/2023.emnlp-main.825",
@@ -94,24 +98,36 @@ If you find this repo helpful, please cite the following papers:
9498
```
9599

96100
```bibtex
97-
@article{jiang-etal-2023-longllmlingua,
98-
title = "{L}ong{LLML}ingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression",
101+
@inproceedings{jiang-etal-2024-longllmlingua,
102+
title = "{L}ong{LLML}ingua: Accelerating and Enhancing {LLM}s in Long Context Scenarios via Prompt Compression",
99103
author = "Huiqiang Jiang and Qianhui Wu and and Xufang Luo and Dongsheng Li and Chin-Yew Lin and Yuqing Yang and Lili Qiu",
100-
url = "https://arxiv.org/abs/2310.06839",
101-
journal = "ArXiv preprint",
102-
volume = "abs/2310.06839",
103-
year = "2023",
104+
editor = "Ku, Lun-Wei and
105+
Martins, Andre and
106+
Srikumar, Vivek",
107+
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
108+
month = aug,
109+
year = "2024",
110+
address = "Bangkok, Thailand",
111+
publisher = "Association for Computational Linguistics",
112+
url = "https://aclanthology.org/2024.acl-long.91",
113+
pages = "1658--1677",
104114
}
105115
```
106116

107117
```bibtex
108-
@article{wu2024llmlingua2,
118+
@inproceedings{pan-etal-2024-llmlingua,
109119
title = "{LLML}ingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression",
110120
author = "Zhuoshi Pan and Qianhui Wu and Huiqiang Jiang and Menglin Xia and Xufang Luo and Jue Zhang and Qingwei Lin and Victor Ruhle and Yuqing Yang and Chin-Yew Lin and H. Vicky Zhao and Lili Qiu and Dongmei Zhang",
111-
url = "https://arxiv.org/abs/2403.12968",
112-
journal = "ArXiv preprint",
113-
volume = "abs/2403.12968",
121+
editor = "Ku, Lun-Wei and
122+
Martins, Andre and
123+
Srikumar, Vivek",
124+
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
125+
month = aug,
114126
year = "2024",
127+
address = "Bangkok, Thailand and virtual meeting",
128+
publisher = "Association for Computational Linguistics",
129+
url = "https://aclanthology.org/2024.findings-acl.57",
130+
pages = "963--981",
115131
}
116132
```
117133

examples/LLMLingua2.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
"cell_type": "markdown",
2121
"metadata": {},
2222
"source": [
23-
"<a target=\"_blank\" href=\"https://arxiv.org/abs/2403.12968\">LLMLingua-2</a> focuses on task-agnostic prompt compression for better generalizability and efficiency. It is a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in <b>task-agnostic compression</b>. It surpasses LLMLingua in handling <b>out-of-domain data</b>, offering <b>3x-6x faster</b> performance.\n",
23+
"<a target=\"_blank\" href=\"https://aclanthology.org/2024.findings-acl.57/\">LLMLingua-2</a> focuses on task-agnostic prompt compression for better generalizability and efficiency. It is a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in <b>task-agnostic compression</b>. It surpasses LLMLingua in handling <b>out-of-domain data</b>, offering <b>3x-6x faster</b> performance.\n",
2424
"\n",
2525
"Below, We showcase the usage and compression results of <i>LLMLingua-2</i> on both <b>in-domain</b> and <b>out-of-domain</b> datasets, including various tasks such as single-document QA, multi-document QA, summarization and in-context learning.\n"
2626
]

examples/RAG.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
"id": "0b39b33f-5860-4825-8f00-d60aed0dce86",
4040
"metadata": {},
4141
"source": [
42-
"To address this, we propose [**LongLLMLingua**](https://arxiv.org/abs/2310.06839), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the <font color='red'>**red line**</font>, which significantly improves the original curve:\n",
42+
"To address this, we propose [**LongLLMLingua**](https://aclanthology.org/2024.acl-long.91/), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the <font color='red'>**red line**</font>, which significantly improves the original curve:\n",
4343
"\n",
4444
"- Coarse-grained compression through document-level perplexity;\n",
4545
"- Fine-grained compression of the remaining text using token perplexity;"

examples/RAGLlamaIndex.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@
8282
"id": "0b39b33f-5860-4825-8f00-d60aed0dce86",
8383
"metadata": {},
8484
"source": [
85-
"To address this, we propose [**LongLLMLingua**](https://arxiv.org/abs/2310.06839), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the <font color='red'>**red line**</font>, which significantly improves the original curve:\n",
85+
"To address this, we propose [**LongLLMLingua**](https://aclanthology.org/2024.acl-long.91/), which specifically tackles the low information density problem in long context scenarios via prompt compression, making it particularly suitable for RAG tasks. The main ideas involve a two-stage compression process, as shown by the <font color='red'>**red line**</font>, which significantly improves the original curve:\n",
8686
"\n",
8787
"- Coarse-grained compression through document-level perplexity;\n",
8888
"- Fine-grained compression of the remaining text using token perplexity;"

0 commit comments

Comments
 (0)