Update on "add eval for attention sink"

helunwencser · helunwencser · commit 2f4641f64e86 · 2024-12-02T11:12:18.000-08:00
This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled. This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled. Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/) Perplexity measured for llama 3.2 1B and 1B_Instruct model up to 40k tokens with AttentionSink enabled: <img width="966" alt="Screenshot 2024-11-25 at 2 46 04 PM" src="https://github.com/user-attachments/assets/ba7118f9-b5d7-4de8-b1fa-7d2ba0646515"> [ghstack-poisoned]
diff --git a/examples/models/llama/eval_llama_lib.py b/examples/models/llama/eval_llama_lib.py
@@ -318,9 +318,7 @@ def eval_llama(
         print(f"{task}: {res}")
 
 
-def eval_llama_with_attention_sink(
-    model_name: str, args: argparse.ArgumentParser
-):
+def eval_llama_with_attention_sink(model_name: str, args: argparse.ArgumentParser):
     """
     Evaluate the model's perplexity when AttentionSink is enabled.