Commit c1815ca

committed

Adds varlen FDMA and padding-free attention

Introduces lazy resolution of attention kernels and padding helpers, plus a compile-friendly kwarg processor that adapts to kernel feature support. Enables variable-length execution via unpad/repad when masks are 2D, and padding-free/packed flows using position ids or precomputed sequence offsets. Adjusts is_causal for single-token queries and supports windowed attention with bias-safe top-k selection. Improves compatibility across kernel versions and torch.compile, adds deterministic control via env var, handles PEFT dtype quirks, and includes minor device safeguards. Raises a clear error when incompatible mask/bias shapes are mixed.

1 parent 3488d06 commit c1815caCopy full SHA for c1815ca

1 file changed

+569

-44

lines changed

flash_dmattn/integrations
- modeling_flash_dynamic_mask_attention_utils.py

1 file changed

+569

-44

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit c1815ca

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments