Add gradient computation for bias and token-level KV sparsity support by LoserCheems · Pull Request #214 · HKUSTDial/flash-sparse-attention

LoserCheems · 2025-12-12T08:15:27Z

Summary

Introduces gradient computation for bias to support learnable attention mechanisms and adds token-level KV sparsity for each query.

Root Cause

Enhancements in attention mechanisms require more efficient gradient computation and sparsity handling.

Changes

Implemented gradient computation for bias and added support for token-level KV sparsity.

Reproduction

Not applicable as this is a feature addition.

Tests

New tests added to validate the gradient computation and sparsity features.

Compatibility

No backward compatibility issues.

Checklist

Linked issue provided
Adds or updates tests
Updates docs if needed
No perf regressions

Copilot

Pull request overview

This PR updates the documentation in both English and Chinese README files to provide more detailed descriptions of two existing features in the "Supported Features" section. The changes clarify that gradient computation for bias supports learnable attention sink mechanisms and add explicit mention of token-level KV sparsity for each query.

Key Changes

Enhanced feature description for bias gradient computation to specify its use case (learnable attention sink)
Added explicit documentation of token-level KV sparsity capability for each query

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
README.md	Updated lines 38-39 to clarify bias gradient computation purpose and add token-level KV sparsity feature
README_zh.md	Corresponding Chinese translation updates for the same feature descriptions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-12T08:17:54Z

README.md

+- Gradient computation for bias to support learnable attention sink
+- Token-level KV sparsity for each Q

 ### Features We Aim to Support



The PR description claims "New tests added to validate the gradient computation and sparsity features," but only README files are being modified in this PR. No test files or actual implementation code changes are included. If these features were implemented and tested in a previous commit, the PR description should be updated to accurately reflect that this PR only updates documentation. If the implementation and tests are planned for the future, these features should be moved to the "Features We Aim to Support" section instead.

Suggested change

- Gradient computation for bias to support learnable attention sink

- Token-level KV sparsity for each Q

### Features We Aim to Support

### Features We Aim to Support

- Gradient computation for bias to support learnable attention sink

- Token-level KV sparsity for each Q

Adds gradient computation for bias and token-level KV sparsity support

ab83408

Copilot AI review requested due to automatic review settings December 12, 2025 08:15

github-actions bot assigned Evanwu1125, ftgreat, SNHuan, Thanksyy, wubingheng111 and zacliu2023 Dec 12, 2025

LoserCheems merged commit 8a652aa into main Dec 12, 2025
7 of 8 checks passed

Copilot started reviewing on behalf of LoserCheems December 12, 2025 08:15 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gradient computation for bias and token-level KV sparsity support#214

Add gradient computation for bias and token-level KV sparsity support#214
LoserCheems merged 1 commit intomainfrom
update-feature-decs

LoserCheems commented Dec 12, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

LoserCheems commented Dec 12, 2025

Summary

Root Cause

Changes

Reproduction

Tests

Compatibility

Checklist

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants