Skip to content

Commit 5ef4591

Browse files
committed
Refines README to clarify mask and bias support for attention tensors
1 parent 268e657 commit 5ef4591

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Flash-DMA is a high-performance attention implementation that integrates Flash A
1818
## Key Features
1919

2020
### 🎯 Core Kernel Advantages
21-
- **4D Mask & Bias Support**: Native support for `(batch_size, num_kv_heads, query_len, key_len)` shaped attention mask and attention bias tensors
21+
- **Mask & Bias Support**: Native support for `(batch_size, {1|num_kv_heads|num_heads}, {0|query_len}, key_len)` shaped attention mask and attention bias tensors
2222
- **Intelligent Computation Skipping**: Block-level automatic skipping mechanism based on masks, completely bypassing computation and memory access for zero-mask blocks
2323
- **Complete Gradient Support**: Built-in full gradient computation path for attention bias, supporting end-to-end training
2424

@@ -31,7 +31,7 @@ Flash-DMA is a high-performance attention implementation that integrates Flash A
3131

3232
## Performance
3333

34-
We present expected speedup of Flash-DMA over standard PyTorch SDPA.
34+
We present the expected speedup of Flash-DMA over standard PyTorch SDPA under mask and bias conditions.
3535

3636
---
3737

README_zh.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Flash-DMA 是一个高性能的注意力实现,将 Flash Attention 的内存
1818
## 主要特性
1919

2020
### 🎯 核心内核优势
21-
- **4D Mask & Bias 支持**: 原生支持 `(batch_size, num_kv_heads, query_len, key_len)` 形状的 attention_mask 和 attention_bias 张量
21+
- **Mask & Bias 支持**: 原生支持 `(batch_size, {1|num_kv_heads|num_heads}, {0|query_len}, key_len)` 形状的 attention_mask 和 attention_bias 张量
2222
- **智能计算跳过**: 基于 attention_mask 的 block-level 自动跳过机制,完全跳过全零 mask 区块的计算和内存访问
2323
- **完整梯度支持**: 内置 attention_bias 的完整梯度计算路径,支持端到端训练
2424

@@ -31,7 +31,7 @@ Flash-DMA 是一个高性能的注意力实现,将 Flash Attention 的内存
3131

3232
## 性能
3333

34-
我们展示了 Flash-DMA 相对于标准 PyTorch SDPA 的预期加速效果。
34+
我们展示了带有mask与bias条件下 Flash-DMA 相对于标准 PyTorch SDPA 的预期加速效果。
3535

3636
---
3737

0 commit comments

Comments
 (0)