Update on "add eval for attention sink"

helunwencser · helunwencser · commit 42f22823c6cb · 2024-12-02T11:51:10.000-08:00
This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled. This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled. Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/) Perplexity measured for llama 3.2 1B and 1B_Instruct model up to 40k tokens with AttentionSink enabled: <img width="966" alt="Screenshot 2024-11-25 at 2 46 04 PM" src="https://github.com/user-attachments/assets/ba7118f9-b5d7-4de8-b1fa-7d2ba0646515"> [ghstack-poisoned]
diff --git a/examples/models/llama/source_transformation/attention_sink.py b/examples/models/llama/source_transformation/attention_sink.py
@@ -266,7 +266,7 @@ def _replace_attention(
     for _, child_module in module._modules.items():
         if len(list(child_module.children())) > 0:  # pyre-ignore [16]
             _replace_attention(
-                module=child_module,
+                module=child_module, # pyre-ignore [6]
                 rope_with_attention_sink=rope_with_attention_sink,
                 sink_size=sink_size,
                 window_size=window_size,