You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reworks dynamic masking to consume precomputed attention bias plus optional boolean causal mask and a window size, using top‑k selection within the window and honoring causality. Removes ZOH/dt_proj/A dependency to simplify masking and reduce coupling.
Aligns CUDA, Triton, Flex, and SDPA wrapper to a unified interface, adds GQA support via KV repetition, and ensures consistent tensor layout. Detaches top‑k selection to avoid unintended gradients.
Updates benchmarks to generate attention bias and boolean causal masks, renames keep_window_size to window_size, and adjusts configs/loops accordingly for consistent evaluation across backends.
Improves clarity, consistency, and extensibility of the attention backward benchmarks.
0 commit comments