Skip to content

Commit 0594911

Browse files
committed
Refines README to update mask and bias support description for clarity
1 parent 5ef4591 commit 0594911

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,7 @@ Flash-DMA integrates the efficient memory access patterns of Flash Attention wit
234234

235235
### Core Technology Integration
236236

237-
- **🎯 Native 4D Mask & Bias Support**: Kernels directly process `(batch_size, num_kv_heads, query_len, key_len)` shaped tensors
237+
- **🎯 Native Mask & Bias Support**: Kernels directly process `(batch_size, {1|num_kv_heads|num_heads}, {0|query_len}, key_len)` shaped tensors
238238
- **⚡ Block-level Intelligent Skipping**: Unified OR-reduction skipping logic based on masks, completely avoiding computation and memory access for zero blocks
239239
- **🔄 Complete Gradient Chain**: Built-in attention bias gradient computation (dbias) supporting end-to-end differentiable training
240240

README_zh.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,7 @@ Flash-DMA 通过将 Flash Attention 的高效内存访问模式与动态掩码
234234

235235
### 核心技术融合
236236

237-
- **🎯 4D Mask & Bias 原生支持**: 内核直接处理 `(batch_size, num_kv_heads, query_len, key_len)` 形状的张量
237+
- **🎯 Mask & Bias 原生支持**: 内核直接处理 `(batch_size, {1|num_kv_heads|num_heads}, {0|query_len}, key_len)` 形状的张量
238238
- **⚡ Block-level 智能跳过**: 基于 mask 的统一 OR-reduction 跳过逻辑,完全避免全零区块的计算和内存访问
239239
- **🔄 完整梯度链路**: 内置 attention bias 梯度计算,支持端到端可微分训练
240240

0 commit comments

Comments
 (0)