Skip to content

Commit 1f11a37

Browse files
committed
update README
1 parent 636f874 commit 1f11a37

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+22
-8580
lines changed

recipes/experimental/long-context/H2O/README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,9 @@ Besides, LLMs usually have poor generation to long sequence during inference. H2
88

99
Current implementation supports llama-1/2/3, from 7B to 70B. Since H2O only maintains the most important KV pairs, it might missing some important information in the middle content for some knowlege-intensive tasks.
1010

11-
More details please refer to Paper: https://arxiv.org/pdf/2306.14048; Blog: https://allenz.work/?p=11.
11+
More details please refer to Paper: **https://arxiv.org/pdf/2306.14048**; Blog: **https://allenz.work/?p=11**.
1212

13-
### Environments:
14-
15-
transformers == 4.39.0
13+
**Note: this implementation is tested with transformers == 4.39.0**
1614

1715
### Evaluation on Summarization Tasks
1816

@@ -28,20 +26,22 @@ python run_summarization.py \
2826

2927
##### **Results**
3028

31-
Expected results on XSUM (Rouge-2 score) from the above scripts on Llama-2/3 models. The sequence length of inputs are ~2k, thus KV cache size larger than 2048 represents the full cache performance.
29+
Expected results on XSUM (Rouge-2 score, ther higher the better) from the above scripts on Llama-2/3 models. The sequence length of inputs are ~2k. Here we constrains the size of KV cache, allowing only n KVs to be write/read after the prefilling stage. n ranges from **64** to **full** where we maintain all the KV pairs. With 128 KVs, the performance can be matched as the full baseline (~2k KVs) while performance degradation is observed with 64 KVs. Also, maintaining a smaller KV cache reduces the I/O cost of KVs, thus we can achieve better throughput.
3230

33-
| KV Cache Size | 64 | 128 | 256 | 512 | 1024 | 2048 | 4096 | 8192 |
34-
| ------------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
35-
| Llama-2-7B | 0.0439 | 0.1127 | 0.1148 | 0.1182 | 0.1170 | 0.1164 | 0.1164 | 0.1164 |
36-
| Llama-2-13B | 0.1180 | 0.1217 | 0.1243 | 0.1291 | 0.1302 | 0.1332 | 0.1332 | 0.1332 |
37-
| Llama-3-8B | 0.1107 | 0.1189 | 0.1200 | 0.1347 | 0.1290 | 0.1311 | 0.1311 | 0.1311 |
31+
| KV Cache Size | 64 | 128 | 256 | 512 | 1024 | Full |
32+
| ------------- | ------ | ------ | ------ | ------ | ------ | ------ |
33+
| Llama-2-7B | 0.0439 | 0.1127 | 0.1148 | 0.1182 | 0.1170 | 0.1164 |
34+
| Llama-2-13B | 0.1180 | 0.1217 | 0.1243 | 0.1291 | 0.1302 | 0.1332 |
35+
| Llama-3-8B | 0.1107 | 0.1189 | 0.1200 | 0.1347 | 0.1290 | 0.1311 |
3836

3937
### Evaluation on "Needle in a Haystack" Analysis
4038

4139
The following example runs inference of Llama-3-8b-instruct on "Needle in a haystack" test. The test is modified from [https://github.com/gkamradt/LLMTest_NeedleInAHaystack](). Please follow the original repository for installing necessary packages. We're using `--enable_h2o_generation` to enable H2O algorithm that only keeps heavy-hitter and the local KV pairs. Use `--num_heavy_hitter_tokens` to decide the number of heavy-hitter KV pairs and `--num_window_length `for the KV cache size. The number of local KV pairs equals num_window_length - num_heavy_hitter_tokens. Also, use --enable_position_rolling to enable position rolling in the KV cache size that assign the positions in the KV cache instead of the ones in original sequences. Enabling postional rolling is important when sequence length exceeds the pretrained context windows, e.g., 4K in Llama-2.
4240

4341
```
4442
# step 1: obtain prompts for evaluation
43+
# download the dataset from https://github.com/gkamradt/LLMTest_NeedleInAHaystack/tree/main/needlehaystack/PaulGrahamEssays
44+
# modify the data-path in utils/needle_test/config-prompt.yaml (line 3: haystack_dir: "data/PaulGrahamEssays")
4545
python utils/needle_test/prompt.py --model_name meta-llama/Meta-Llama-3-8B-Instruct
4646
# modify utils/needle_test/config-prompt.yaml to adjust the min/max sequence length for the test
4747

recipes/experimental/long-context/H2O/data/PaulGrahamEssays/addiction.txt

Lines changed: 0 additions & 116 deletions
This file was deleted.

recipes/experimental/long-context/H2O/data/PaulGrahamEssays/aord.txt

Lines changed: 0 additions & 126 deletions
This file was deleted.

0 commit comments

Comments
 (0)