Update docstrings in attention functions for consistency by LoserCheems · Pull Request #259 · HKUSTDial/flash-sparse-attention

LoserCheems · 2026-03-23T04:27:30Z

Summary

Improve consistency in return value descriptions across attention functions.

Root Cause

Inconsistent documentation regarding return values led to confusion.

Changes

Standardized return value descriptions in the docstrings of various attention functions.

Reproduction

Review the docstrings in the attention function implementations.

Tests

No new tests required; documentation change only.

Compatibility

No backward compatibility issues.

Checklist

Linked issue provided
Adds or updates tests
Updates docs if needed
No perf regressions

…lue descriptions

Copilot

Pull request overview

This PR updates the docstrings of the Triton attention interface functions to make the return-value descriptions more consistent, especially around the optional return_lse behavior.

Changes:

Replaced separate :return out: / :return lse: entries with a single conditional :returns: description for multiple attention wrapper functions.
Standardized wording describing the return_lse tuple return behavior across dense/sparse/gated variants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T04:31:09Z

flash_sparse_attn/ops/triton/interface.py


-    :return out: Attention output tensor of shape [total_seqlen_q, num_heads_q, head_dim].
-    :return lse: Logsumexp tensor of shape [total_seqlen_q, num_heads_q] if return_lse is True. Otherwise, not returned.
+    :returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].


In varlen mode, the returned lse shape is documented as [total_seqlen_q, num_heads_q], but _flash_dense_attn_varlen_base_forward allocates/returns it as (num_heads_q, total_seqlen_q). Please update the docstring so the axes match the actual return value (or transpose lse before returning, but that would be an API change).

Suggested change

:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].

:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [num_heads_q, total_seqlen_q].

Copilot · 2026-03-23T04:31:09Z

flash_sparse_attn/ops/triton/interface.py


-    :return out: Attention output tensor of shape [total_seqlen_q, num_heads_q, head_dim].
-    :return lse: Logsumexp tensor of shape [total_seqlen_q, num_heads_q] if return_lse is True. Otherwise, not returned.
+    :returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].


In varlen mode, the docstring states lse has shape [total_seqlen_q, num_heads_q], but _flash_sparse_attn_varlen_base_forward returns lse with shape (num_heads_q, total_seqlen_q). Please correct the documented shape to match the actual output.

Suggested change

:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].

:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [num_heads_q, total_seqlen_q].

Copilot · 2026-03-23T04:31:09Z

flash_sparse_attn/ops/triton/interface.py


-    :return out: Attention output tensor of shape [total_seqlen_q, num_heads_q, head_dim].
-    :return lse: Logsumexp tensor of shape [total_seqlen_q, num_heads_q] if return_lse is True. Otherwise, not returned.
+    :returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].


The varlen gated attention path returns lse with shape (num_heads_q, total_seqlen_q) (see _flash_gated_attn_varlen_base_forward), but the docstring documents it as [total_seqlen_q, num_heads_q]. Please update the return description so consumers don’t swap axes.

Suggested change

:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].

:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [num_heads_q, total_seqlen_q].

Copilot · 2026-03-23T04:31:10Z

flash_sparse_attn/ops/triton/interface.py


-    :return out: Attention output tensor of shape [batch_size, seqlen_q, num_heads, head_dim].
-    :return lse: Logsumexp tensor of shape [batch_size, num_heads, seqlen_q] if return_lse is True. Otherwise, not returned.
+    :returns: If return_lse is False, returns out with shape [batch_size, seqlen_q, num_heads, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [batch_size, num_heads, seqlen_q].


This file (and most of the repo) uses reST field lists like :return ...: (e.g., flash_sparse_attn/ops/triton/activations.py:33-38), but these docstrings now introduce :returns:. Unless there’s a doc build requirement for :returns:, consider keeping :return:/:return <name>: to stay consistent with existing documentation style across the codebase.

Suggested change

:returns: If return_lse is False, returns out with shape [batch_size, seqlen_q, num_heads, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [batch_size, num_heads, seqlen_q].

:return: If return_lse is False, returns out with shape [batch_size, seqlen_q, num_heads, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [batch_size, num_heads, seqlen_q].

Update docstrings in attention functions for consistency in return va…

c79c27e

…lue descriptions

Copilot AI review requested due to automatic review settings March 23, 2026 04:27

LoserCheems merged commit 24968d2 into main Mar 23, 2026
2 checks passed

Copilot started reviewing on behalf of LoserCheems March 23, 2026 04:27 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update docstrings in attention functions for consistency#259

Update docstrings in attention functions for consistency#259
LoserCheems merged 1 commit intomainfrom
optim_triton_version

LoserCheems commented Mar 23, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].
	:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [num_heads_q, total_seqlen_q].

	:returns: If return_lse is False, returns out with shape [batch_size, seqlen_q, num_heads, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [batch_size, num_heads, seqlen_q].
	:return: If return_lse is False, returns out with shape [batch_size, seqlen_q, num_heads, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [batch_size, num_heads, seqlen_q].

Conversation

LoserCheems commented Mar 23, 2026

Summary

Root Cause

Changes

Reproduction

Tests

Compatibility

Checklist

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants