Releases: lucidrains/native-sparse-attention-pytorch
Releases · lucidrains/native-sparse-attention-pytorch
0.0.23
first take care of the block diagonal causal in fine attention
0.0.22
make a decision to deviate from their diagram, where the last token o…
0.0.21
start accumulating some different compression network ideas
0.0.20
just improvise a solution for compress and selection block sizes not …
0.0.19
make sure deepseek proposal can be compared to attention with gqa
0.0.18]
Full Changelog: 0.0.17...0.0.18]
0.0.18
Full Changelog: 0.0.17...0.0.18
0.0.17
Full Changelog: 0.0.16...0.0.17
0.0.16
Full Changelog: 0.0.15...0.0.16
0.0.15
handle rotary embeddings for sliding windows explicitly