Skip to content

Commit f187574

Browse files
committed
Update on "Dont quantize the current token for attention"
Differential Revision: [D63497872](https://our.internmc.facebook.com/intern/diff/D63497872/) [ghstack-poisoned]
2 parents 622e928 + 7e7edaf commit f187574

File tree

1 file changed

+0
-1
lines changed

1 file changed

+0
-1
lines changed

examples/models/llama/source_transformation/quantized_kv_cache.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,6 @@ def update(self, input_pos, k_val, v_val):
198198
seq_length = k_val.size(dim_to_slice)
199199
narrowed_k = k_out.narrow(dim_to_slice, start_pos, seq_length)
200200
narrowed_k.copy_(k_val)
201-
# pyre-ignore: Incompatible parameter type [6]
202201
narrowed_v = v_out.narrow(dim_to_slice, start_pos, seq_length)
203202
narrowed_v.copy_(v_val)
204203
else:

0 commit comments

Comments
 (0)