Skip to content

Update docstrings in attention functions for consistency#259

Merged
LoserCheems merged 1 commit intomainfrom
optim_triton_version
Mar 23, 2026
Merged

Update docstrings in attention functions for consistency#259
LoserCheems merged 1 commit intomainfrom
optim_triton_version

Conversation

@LoserCheems
Copy link
Collaborator

Summary

  • Improve consistency in return value descriptions across attention functions.

Root Cause

  • Inconsistent documentation regarding return values led to confusion.

Changes

  • Standardized return value descriptions in the docstrings of various attention functions.

Reproduction

  • Review the docstrings in the attention function implementations.

Tests

  • No new tests required; documentation change only.

Compatibility

  • No backward compatibility issues.

Checklist

  • Linked issue provided
  • Adds or updates tests
  • Updates docs if needed
  • No perf regressions

Copilot AI review requested due to automatic review settings March 23, 2026 04:27
@LoserCheems LoserCheems merged commit 24968d2 into main Mar 23, 2026
2 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the docstrings of the Triton attention interface functions to make the return-value descriptions more consistent, especially around the optional return_lse behavior.

Changes:

  • Replaced separate :return out: / :return lse: entries with a single conditional :returns: description for multiple attention wrapper functions.
  • Standardized wording describing the return_lse tuple return behavior across dense/sparse/gated variants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


:return out: Attention output tensor of shape [total_seqlen_q, num_heads_q, head_dim].
:return lse: Logsumexp tensor of shape [total_seqlen_q, num_heads_q] if return_lse is True. Otherwise, not returned.
:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In varlen mode, the returned lse shape is documented as [total_seqlen_q, num_heads_q], but _flash_dense_attn_varlen_base_forward allocates/returns it as (num_heads_q, total_seqlen_q). Please update the docstring so the axes match the actual return value (or transpose lse before returning, but that would be an API change).

Suggested change
:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].
:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [num_heads_q, total_seqlen_q].

Copilot uses AI. Check for mistakes.

:return out: Attention output tensor of shape [total_seqlen_q, num_heads_q, head_dim].
:return lse: Logsumexp tensor of shape [total_seqlen_q, num_heads_q] if return_lse is True. Otherwise, not returned.
:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In varlen mode, the docstring states lse has shape [total_seqlen_q, num_heads_q], but _flash_sparse_attn_varlen_base_forward returns lse with shape (num_heads_q, total_seqlen_q). Please correct the documented shape to match the actual output.

Suggested change
:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].
:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [num_heads_q, total_seqlen_q].

Copilot uses AI. Check for mistakes.

:return out: Attention output tensor of shape [total_seqlen_q, num_heads_q, head_dim].
:return lse: Logsumexp tensor of shape [total_seqlen_q, num_heads_q] if return_lse is True. Otherwise, not returned.
:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The varlen gated attention path returns lse with shape (num_heads_q, total_seqlen_q) (see _flash_gated_attn_varlen_base_forward), but the docstring documents it as [total_seqlen_q, num_heads_q]. Please update the return description so consumers don’t swap axes.

Suggested change
:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [total_seqlen_q, num_heads_q].
:returns: If return_lse is False, returns out with shape [total_seqlen_q, num_heads_q, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [num_heads_q, total_seqlen_q].

Copilot uses AI. Check for mistakes.

:return out: Attention output tensor of shape [batch_size, seqlen_q, num_heads, head_dim].
:return lse: Logsumexp tensor of shape [batch_size, num_heads, seqlen_q] if return_lse is True. Otherwise, not returned.
:returns: If return_lse is False, returns out with shape [batch_size, seqlen_q, num_heads, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [batch_size, num_heads, seqlen_q].
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file (and most of the repo) uses reST field lists like :return ...: (e.g., flash_sparse_attn/ops/triton/activations.py:33-38), but these docstrings now introduce :returns:. Unless there’s a doc build requirement for :returns:, consider keeping :return:/:return <name>: to stay consistent with existing documentation style across the codebase.

Suggested change
:returns: If return_lse is False, returns out with shape [batch_size, seqlen_q, num_heads, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [batch_size, num_heads, seqlen_q].
:return: If return_lse is False, returns out with shape [batch_size, seqlen_q, num_heads, head_dim]. If return_lse is True, returns a tuple (out, lse), where lse has shape [batch_size, num_heads, seqlen_q].

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants