You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# maybe average the compressed attention across each grouped queries (per key / values)
374
377
375
378
ifself.query_heads_share_selected_kv:
@@ -383,13 +386,16 @@ def forward(
383
386
# cannot parse their equation, so will just improvise
384
387
# first we expand all the compressed scores to the full sequence length, then average within each fine / selection block size - pad on the right to 0s, which should be fine as sliding window convers the local anyways
0 commit comments