Skip to content

Commit 42f2282

Browse files
committed
Update on "add eval for attention sink"
This PR adds the function to evaluate the model's perplexity when AttentionSink is enabled. This is mostly copied from https://github.com/mit-han-lab/streaming-llm/blob/main/examples/eval_long_ppl.py which is used by the AttentionSink paper to evaluate the model's perplexity when AttentionSink is enabled. Differential Revision: [D66474732](https://our.internmc.facebook.com/intern/diff/D66474732/) Perplexity measured for llama 3.2 1B and 1B_Instruct model up to 40k tokens with AttentionSink enabled: <img width="966" alt="Screenshot 2024-11-25 at 2 46 04 PM" src="https://github.com/user-attachments/assets/ba7118f9-b5d7-4de8-b1fa-7d2ba0646515"> [ghstack-poisoned]
2 parents 2f4641f + 174771f commit 42f2282

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/models/llama/source_transformation/attention_sink.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -266,7 +266,7 @@ def _replace_attention(
266266
for _, child_module in module._modules.items():
267267
if len(list(child_module.children())) > 0: # pyre-ignore [16]
268268
_replace_attention(
269-
module=child_module,
269+
module=child_module, # pyre-ignore [6]
270270
rope_with_attention_sink=rope_with_attention_sink,
271271
sink_size=sink_size,
272272
window_size=window_size,

0 commit comments

Comments
 (0)