Fix attention_mask and attention_bias shape descriptions in docstring

algo-home · algo-home · commit 42e118d49cce · 2025-10-13T15:58:44.000+08:00
diff --git a/flash_dmattn/integrations/flash_dynamic_mask_attention.py b/flash_dmattn/integrations/flash_dynamic_mask_attention.py
@@ -30,10 +30,10 @@ def flash_dynamic_mask_attention_forward(
         query (torch.Tensor): The query tensor of shape (batch_size, num_heads, query_len, head_dim).
         key (torch.Tensor): The key tensor of shape (batch_size, num_kv_heads, key_len, head_dim).
         value (torch.Tensor): The value tensor of shape (batch_size, num_kv_heads, key_len, head_dim).
-        attention_mask (Optional[torch.Tensor]): The attention mask boolean tensor of shape 
-        (batch_size, seq_len) or (batch_size, {num_heads|num_kv_heads|1}, {query_len|0}, key_len).
+        attention_mask (Optional[torch.Tensor]): The attention mask boolean tensor of shape
+        (batch_size, seq_len) or ({batch_size|1}, {num_heads|num_kv_heads|1}, {query_len|1}, {key_len|1}).
         attention_bias (Optional[torch.Tensor]): The attention bias float tensor of shape
-        (batch_size, {num_heads|num_kv_heads|1}, {query_len|0}, key_len).
+        ({batch_size|1}, {num_heads|num_kv_heads|1}, {query_len|1}, {key_len|1}).
         scaling (Optional[float]): The scaling factor for the attention scores.
         window_size (Optional[int]): The size of the window to keep.
         softcap (Optional[float]): The softcap value for the attention scores.