Skip to content

Commit 1cbd2f9

Browse files
committed
Unifies attention kernels with bias+mask windowing
Refactors attention paths to accept external attention bias and boolean causal mask, replacing zoh/dt-based masking and cache-position logic. Introduces a generic mask preparer that applies top-k windowing (optionally causal-aware), and standardizes interfaces across SDPA, Flash, Triton, and Flex implementations. Removes zoh/dt projection and related params, repeats KV artifacts for GQA, and consistently applies additive masks. Updates benchmarks to generate bias/mask inputs, rename keep_window_size to window_size, adjust head dims, and harmonize result handling and output labeling. Improves API consistency, simplifies experimentation with custom biases, and aligns masking semantics across kernels for more reliable benchmarking.
1 parent bcdf7ad commit 1cbd2f9

File tree

1 file changed

+199
-251
lines changed

1 file changed

+199
-251
lines changed

0 commit comments

Comments
 (0)