Last attn_mask version
We will adopt a new strategy to alleviate the memory bottleneck of attn_mask. This is the last version with attn_mask. Future versions will not pass attn_mask.
What's Changed
- Chore/sync after move by @LoserCheems in #208
- [BUG FIX] Unify masking utilities and improve performance by @LoserCheems in #209
- Corrects issue links in README guides by @LoserCheems in #212
- [BUG FIX] Correct causal mask handling for longer KV pairs by @LoserCheems in #213
- Add gradient computation for bias and token-level KV sparsity support by @LoserCheems in #214
- Add rotary-aware attention modules for improved inference by @LoserCheems in #215
- Improve code readability and linting workflow by @LoserCheems in #216
- Simplify attention mechanisms by @LoserCheems in #217
- Refactor create_mask function parameters by @LoserCheems in #218
Full Changelog: v1.2.3...v1.2.4
What's Changed
- Chore/sync after move by @LoserCheems in #208
- [BUG FIX] Unify masking utilities and improve performance by @LoserCheems in #209
- Corrects issue links in README guides by @LoserCheems in #212
- [BUG FIX] Correct causal mask handling for longer KV pairs by @LoserCheems in #213
- Add gradient computation for bias and token-level KV sparsity support by @LoserCheems in #214
- Add rotary-aware attention modules for improved inference by @LoserCheems in #215
- Improve code readability and linting workflow by @LoserCheems in #216
- Simplify attention mechanisms by @LoserCheems in #217
- Refactor create_mask function parameters by @LoserCheems in #218
Full Changelog: v1.2.3...v1.2.4