nerdy-tech-com-gitub
diff --git a/‎recipes/experimental/long-context/H2O/README.md
Lines changed: 0 additions & 41 deletions b/‎recipes/experimental/long-context/H2O/README.md
Lines changed: 0 additions & 41 deletions
diff --git a/‎recipes/experimental/long-context/H2O/run_needle_haystack_test.py
Lines changed: 0 additions & 116 deletions b/‎recipes/experimental/long-context/H2O/run_needle_haystack_test.py
Lines changed: 0 additions & 116 deletions
diff --git a/‎recipes/experimental/long-context/H2O/utils/needle_test/config-eval.yaml
Lines changed: 0 additions & 7 deletions b/‎recipes/experimental/long-context/H2O/utils/needle_test/config-eval.yaml
Lines changed: 0 additions & 7 deletions
diff --git a/‎recipes/experimental/long-context/H2O/utils/needle_test/config-prompt.yaml
Lines changed: 0 additions & 22 deletions b/‎recipes/experimental/long-context/H2O/utils/needle_test/config-prompt.yaml
Lines changed: 0 additions & 22 deletions
diff --git a/‎recipes/experimental/long-context/H2O/utils/needle_test/eval.py
Lines changed: 0 additions & 136 deletions b/‎recipes/experimental/long-context/H2O/utils/needle_test/eval.py
Lines changed: 0 additions & 136 deletions
@@ -34,47 +34,6 @@ Expected results on XSUM (Rouge-2 score, ther higher the better) from the above
 | Llama-2-13B   | 0.1180 | 0.1217 | 0.1243 | 0.1291 | 0.1302 | 0.1332 |
 | Llama-3-8B    | 0.1107 | 0.1189 | 0.1200 | 0.1347 | 0.1290 | 0.1311 |
 
-### Evaluation on "Needle in a Haystack" Analysis
-
-The following example runs inference of Llama-3-8b-instruct on "Needle in a haystack" test. The test is modified from [https://github.com/gkamradt/LLMTest_NeedleInAHaystack](). Please follow the original repository for installing necessary packages. We're using `--enable_h2o_generation` to enable H2O algorithm that only keeps heavy-hitter and the local KV pairs. Use `--num_window_length ` to decide the KV cache size. Also, use --enable_position_rolling to enable position rolling in the KV cache size that assign the positions in the KV cache instead of the ones in original sequences. Enabling postional rolling is important when sequence length exceeds the pretrained context windows, e.g., 8K in Llama-3.
-
-```
-# step 1: obtain prompts for evaluation
-# download the dataset from https://github.com/gkamradt/LLMTest_NeedleInAHaystack/tree/main/needlehaystack/PaulGrahamEssays
-# modify the data-path in utils/needle_test/config-prompt.yaml (line 3: haystack_dir: "data/PaulGrahamEssays")
-python utils/needle_test/prompt.py --model_name meta-llama/Meta-Llama-3-8B-Instruct
-# modify utils/needle_test/config-prompt.yaml to adjust the min/max sequence length for the test
-
-
-# step 2: generation predictions of each prompt
-# full model
-python run_needle_haystack_test.py \
---input-path data/needle_test/Huggingface \
---output-path needle_test_results/huggingface/llama-3-8b-instruct/ \
---model-name meta-llama/Meta-Llama-3-8B-Instruct 
-
-# h2o with 2048 kv cache
-python run_needle_haystack_test.py \
---input-path data/needle_test/Huggingface \
---output-path needle_test_results/huggingface/llama-3-8b-instruct-h2o-4096/ \
---model-name meta-llama/Meta-Llama-3-8B-Instruct \
---enable_h2o_generation \
---num_window_length 4096 \
---num_heavy_hitter_tokens 2048
-
-
-# step 3: scoring with gpt4
-export OPENAI_API_KEY=YOUR_API_KEY
-python utils/needle_test/eval.py \
---input-path needle_test_results/huggingface/llama-3-8b-instruct-h2o-4096\ #path for the prediction results
---output-path needle_test_results/huggingface/llama-3-8b-instruct-h2o-4096_eval
-
-
-# step 4: visualization
-python utils/needle_test/vis.py \
---input-path needle_test_results/huggingface/llama-3-8b-instruct-h2o-4096_eval
-```
-
 ### One Demo on Streaming to "Infinite" Context Length
 
 The following example demonstrates the generation process of "infinite" sequence length. We use MT-Bench data and generate the context sample-by-sample. The KV Cache will keep the KV pairs from the previous samples while maintain a fixed size. Results can be found on [Demo](https://allenz.work/?p=11) (Video 1).