Skip to content

Commit c1815ca

Browse files
committed
Adds varlen FDMA and padding-free attention
Introduces lazy resolution of attention kernels and padding helpers, plus a compile-friendly kwarg processor that adapts to kernel feature support. Enables variable-length execution via unpad/repad when masks are 2D, and padding-free/packed flows using position ids or precomputed sequence offsets. Adjusts is_causal for single-token queries and supports windowed attention with bias-safe top-k selection. Improves compatibility across kernel versions and torch.compile, adds deterministic control via env var, handles PEFT dtype quirks, and includes minor device safeguards. Raises a clear error when incompatible mask/bias shapes are mixed.
1 parent 3488d06 commit c1815ca

File tree

1 file changed

+569
-44
lines changed

1 file changed

+569
-44
lines changed

0 commit comments

Comments
 (0)