Last attn_mask version

We will adopt a new strategy to alleviate the memory bottleneck of attn_mask. This is the last version with attn_mask. Future versions will not pass attn_mask.

What's Changed

Chore/sync after move by @LoserCheems in #208
[BUG FIX] Unify masking utilities and improve performance by @LoserCheems in #209
Corrects issue links in README guides by @LoserCheems in #212
[BUG FIX] Correct causal mask handling for longer KV pairs by @LoserCheems in #213
Add gradient computation for bias and token-level KV sparsity support by @LoserCheems in #214
Add rotary-aware attention modules for improved inference by @LoserCheems in #215
Improve code readability and linting workflow by @LoserCheems in #216
Simplify attention mechanisms by @LoserCheems in #217
Refactor create_mask function parameters by @LoserCheems in #218

Full Changelog: v1.2.3...v1.2.4

What's Changed

Chore/sync after move by @LoserCheems in #208
[BUG FIX] Unify masking utilities and improve performance by @LoserCheems in #209
Corrects issue links in README guides by @LoserCheems in #212
[BUG FIX] Correct causal mask handling for longer KV pairs by @LoserCheems in #213
Add gradient computation for bias and token-level KV sparsity support by @LoserCheems in #214
Add rotary-aware attention modules for improved inference by @LoserCheems in #215
Improve code readability and linting workflow by @LoserCheems in #216
Simplify attention mechanisms by @LoserCheems in #217
Refactor create_mask function parameters by @LoserCheems in #218

Full Changelog: v1.2.3...v1.2.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.2.4

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Last attn_mask version

What's Changed

What's Changed

Contributors

Uh oh!