Commit 1cbd2f9

committed

Unifies attention kernels with bias+mask windowing

Refactors attention paths to accept external attention bias and boolean causal mask, replacing zoh/dt-based masking and cache-position logic. Introduces a generic mask preparer that applies top-k windowing (optionally causal-aware), and standardizes interfaces across SDPA, Flash, Triton, and Flex implementations. Removes zoh/dt projection and related params, repeats KV artifacts for GQA, and consistently applies additive masks. Updates benchmarks to generate bias/mask inputs, rename keep_window_size to window_size, adjust head dims, and harmonize result handling and output labeling. Improves API consistency, simplifies experimentation with custom biases, and aligns masking semantics across kernels for more reliable benchmarking.

1 parent bcdf7ad commit 1cbd2f9Copy full SHA for 1cbd2f9

1 file changed

+199

-251

lines changed

benchmarks
- forward_performance.py

1 file changed

+199

-251

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 1cbd2f9

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments