Skip to content

Commit 8a652aa

Browse files
authored
Merge pull request #214 from flash-algo:update-feature-decs
Add gradient computation for bias and token-level KV sparsity support
2 parents 25edcc1 + ab83408 commit 8a652aa

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,8 @@ Thus, a more effective approach is sparse attention: interacting each query with
3535
- Grouped Query Attention and Multi Query Attention
3636
- Flexible Mask and Bias
3737
- Skipping memory access and computation for masked regions
38-
- Gradient computation for bias
38+
- Gradient computation for bias to support learnable attention sink
39+
- Token-level KV sparsity for each Q
3940

4041
### Features We Aim to Support
4142

README_zh.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,8 @@ Flash-Sparse-Attention 是一个高性能的可训练稀疏注意力实现, 将
3535
- 分组查询注意力和多查询注意力
3636
- 灵活的掩码与偏置
3737
- 跳过掩码区域的访存与计算
38-
- 偏置的梯度计算
38+
- 偏置的梯度计算以支持可学习 attention sink
39+
- 对于每个 Q 有 token 级别的 KV 稀疏性
3940

4041
### 我们想要支持的功能
4142

0 commit comments

Comments
 (0)