Refines README to update mask and bias support description for clarity

algo-home · algo-home · commit 059491136982 · 2025-09-20T12:43:06.000+08:00
diff --git a/README.md b/README.md
@@ -234,7 +234,7 @@ Flash-DMA integrates the efficient memory access patterns of Flash Attention wit
 
 ### Core Technology Integration
 
-- **🎯 Native 4D Mask & Bias Support**: Kernels directly process `(batch_size, num_kv_heads, query_len, key_len)` shaped tensors
+- **🎯 Native Mask & Bias Support**: Kernels directly process `(batch_size, {1|num_kv_heads|num_heads}, {0|query_len}, key_len)` shaped tensors
 - **⚡ Block-level Intelligent Skipping**: Unified OR-reduction skipping logic based on masks, completely avoiding computation and memory access for zero blocks
 - **🔄 Complete Gradient Chain**: Built-in attention bias gradient computation (dbias) supporting end-to-end differentiable training
 
diff --git a/README_zh.md b/README_zh.md
@@ -234,7 +234,7 @@ Flash-DMA 通过将 Flash Attention 的高效内存访问模式与动态掩码
 
 ### 核心技术融合
 
-- **🎯 4D Mask & Bias 原生支持**: 内核直接处理 `(batch_size, num_kv_heads, query_len, key_len)` 形状的张量
+- **🎯 Mask & Bias 原生支持**: 内核直接处理 `(batch_size, {1|num_kv_heads|num_heads}, {0|query_len}, key_len)` 形状的张量
 - **⚡ Block-level 智能跳过**: 基于 mask 的统一 OR-reduction 跳过逻辑，完全避免全零区块的计算和内存访问
 - **🔄 完整梯度链路**: 内置 attention bias 梯度计算，支持端到端可微分训练