What's Changed
- Add selectable masking strategies for attention by @LoserCheems in #204
- Refactor attention block smoothing for consistency by @LoserCheems in #205
- Optimize triton version: GQA, mask/bias broadcasting, skip inactive tiles, and stability fixes by @LoserCheems in #200
- [FEATURE SUPPORT] Triton special compact dynamic-mask attention: 1.6× faster fwd+bwd, numerically equivalent by @LoserCheems in #206
- Fix documentation and references for Flash Sparse Attention by @LoserCheems in #207
Full Changelog: v1.2.2...v1.2.3
What's Changed
- Add selectable masking strategies for attention by @LoserCheems in #204
- Refactor attention block smoothing for consistency by @LoserCheems in #205
- Optimize triton version: GQA, mask/bias broadcasting, skip inactive tiles, and stability fixes by @LoserCheems in #200
- [FEATURE SUPPORT] Triton special compact dynamic-mask attention: 1.6× faster fwd+bwd, numerically equivalent by @LoserCheems in #206
- Fix documentation and references for Flash Sparse Attention by @LoserCheems in #207
Full Changelog: v1.2.2...v1.2.3