Skip to content

Commit 34e43a5

Browse files
committed
remove haystack
1 parent 3677dac commit 34e43a5

File tree

7 files changed

+0
-745
lines changed

7 files changed

+0
-745
lines changed

recipes/experimental/long-context/H2O/README.md

Lines changed: 0 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -34,47 +34,6 @@ Expected results on XSUM (Rouge-2 score, ther higher the better) from the above
3434
| Llama-2-13B | 0.1180 | 0.1217 | 0.1243 | 0.1291 | 0.1302 | 0.1332 |
3535
| Llama-3-8B | 0.1107 | 0.1189 | 0.1200 | 0.1347 | 0.1290 | 0.1311 |
3636

37-
### Evaluation on "Needle in a Haystack" Analysis
38-
39-
The following example runs inference of Llama-3-8b-instruct on "Needle in a haystack" test. The test is modified from [https://github.com/gkamradt/LLMTest_NeedleInAHaystack](). Please follow the original repository for installing necessary packages. We're using `--enable_h2o_generation` to enable H2O algorithm that only keeps heavy-hitter and the local KV pairs. Use `--num_window_length ` to decide the KV cache size. Also, use --enable_position_rolling to enable position rolling in the KV cache size that assign the positions in the KV cache instead of the ones in original sequences. Enabling postional rolling is important when sequence length exceeds the pretrained context windows, e.g., 8K in Llama-3.
40-
41-
```
42-
# step 1: obtain prompts for evaluation
43-
# download the dataset from https://github.com/gkamradt/LLMTest_NeedleInAHaystack/tree/main/needlehaystack/PaulGrahamEssays
44-
# modify the data-path in utils/needle_test/config-prompt.yaml (line 3: haystack_dir: "data/PaulGrahamEssays")
45-
python utils/needle_test/prompt.py --model_name meta-llama/Meta-Llama-3-8B-Instruct
46-
# modify utils/needle_test/config-prompt.yaml to adjust the min/max sequence length for the test
47-
48-
49-
# step 2: generation predictions of each prompt
50-
# full model
51-
python run_needle_haystack_test.py \
52-
--input-path data/needle_test/Huggingface \
53-
--output-path needle_test_results/huggingface/llama-3-8b-instruct/ \
54-
--model-name meta-llama/Meta-Llama-3-8B-Instruct
55-
56-
# h2o with 2048 kv cache
57-
python run_needle_haystack_test.py \
58-
--input-path data/needle_test/Huggingface \
59-
--output-path needle_test_results/huggingface/llama-3-8b-instruct-h2o-4096/ \
60-
--model-name meta-llama/Meta-Llama-3-8B-Instruct \
61-
--enable_h2o_generation \
62-
--num_window_length 4096 \
63-
--num_heavy_hitter_tokens 2048
64-
65-
66-
# step 3: scoring with gpt4
67-
export OPENAI_API_KEY=YOUR_API_KEY
68-
python utils/needle_test/eval.py \
69-
--input-path needle_test_results/huggingface/llama-3-8b-instruct-h2o-4096\ #path for the prediction results
70-
--output-path needle_test_results/huggingface/llama-3-8b-instruct-h2o-4096_eval
71-
72-
73-
# step 4: visualization
74-
python utils/needle_test/vis.py \
75-
--input-path needle_test_results/huggingface/llama-3-8b-instruct-h2o-4096_eval
76-
```
77-
7837
### One Demo on Streaming to "Infinite" Context Length
7938

8039
The following example demonstrates the generation process of "infinite" sequence length. We use MT-Bench data and generate the context sample-by-sample. The KV Cache will keep the KV pairs from the previous samples while maintain a fixed size. Results can be found on [Demo](https://allenz.work/?p=11) (Video 1).

recipes/experimental/long-context/H2O/run_needle_haystack_test.py

Lines changed: 0 additions & 116 deletions
This file was deleted.

recipes/experimental/long-context/H2O/utils/needle_test/config-eval.yaml

Lines changed: 0 additions & 7 deletions
This file was deleted.

recipes/experimental/long-context/H2O/utils/needle_test/config-prompt.yaml

Lines changed: 0 additions & 22 deletions
This file was deleted.

recipes/experimental/long-context/H2O/utils/needle_test/eval.py

Lines changed: 0 additions & 136 deletions
This file was deleted.

0 commit comments

Comments
 (0)