Skip to content

Commit 58245af

Browse files
committed
Update on "[Excutorch][Llama] Decouple input sequence length from kv cache context length"
Decouple max sequence length, for shape dynamism in torch.export, from sequence length used for kv cache sizing. Differential Revision: [D68448334](https://our.internmc.facebook.com/intern/diff/D68448334/) [ghstack-poisoned]
2 parents 90f75b3 + 67bd5d9 commit 58245af

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/models/llama/source_transformation/attention_sink.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ def __init__(
4545
else:
4646
self.apply_rotary_emb_to_k = apply_rotary_emb_to_k
4747
self.max_context_length = window_size + sink_size
48-
assert self.max_context_length == self.params.max_context_length
48+
assert self.max_context_length == self.params.max_context_len
4949
self.eviction_batch_size = eviction_batch_size
5050
self.position_shift = 0
5151

0 commit comments

Comments
 (0)