You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduces lazy resolution of attention kernels and padding helpers, plus a compile-friendly kwarg processor that adapts to kernel feature support.
Enables variable-length execution via unpad/repad when masks are 2D, and padding-free/packed flows using position ids or precomputed sequence offsets. Adjusts is_causal for single-token queries and supports windowed attention with bias-safe top-k selection.
Improves compatibility across kernel versions and torch.compile, adds deterministic control via env var, handles PEFT dtype quirks, and includes minor device safeguards. Raises a clear error when incompatible mask/bias shapes are mixed.
0 commit comments