Skip to content
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
4198523
Removes mask path from sparse attn
LoserCheems Dec 20, 2025
7410441
Fix delta_states calculation by using key_states instead of value_states
LoserCheems Dec 20, 2025
7184c6b
Stabilizes softmax normalization
LoserCheems Dec 21, 2025
84d06ac
Refactor Flash Sparse Attention Kernel Instantiations
LoserCheems Dec 21, 2025
fbda069
Format return statement in triton_sparse_attn_func for improved reada…
LoserCheems Dec 21, 2025
f96e5da
Streamlines sparse attn bias handling
LoserCheems Dec 21, 2025
1cd621d
Fix formatting of FLASH_NAMESPACE_SCOPE macro for consistency
LoserCheems Dec 21, 2025
ff118ff
Refactor CHECK_CUDA macro for improved readability and consistency
LoserCheems Dec 21, 2025
9a387c7
Improve formatting of macro definitions for enhanced readability
LoserCheems Dec 21, 2025
f916ee1
Streamlines BlockInfo formatting
LoserCheems Dec 21, 2025
b43d8b0
Unifies attention params and templates
LoserCheems Dec 21, 2025
a4abb55
Simplifies BSP buffers in kernel traits
LoserCheems Dec 21, 2025
2094435
Removes explicit mask tensor handling
LoserCheems Dec 21, 2025
69dca0d
Cleans up softmax helper formatting
LoserCheems Dec 21, 2025
2755de7
Cleans up CUDA utils formatting
LoserCheems Dec 21, 2025
99a7fa9
Collapses mask/bias handling into single path
LoserCheems Dec 21, 2025
50942ae
Streamlines flash fwd kernel dispatch
LoserCheems Dec 21, 2025
831b149
Polishes flash backward helpers
LoserCheems Dec 21, 2025
f636446
Aligns bwd kernel with BSP layout
LoserCheems Dec 21, 2025
53a039c
Simplifies flash bwd template args
LoserCheems Dec 21, 2025
d919cac
Simplifies kernel generation params
LoserCheems Dec 21, 2025
ca5b529
Simplifies flash attention bias handling
LoserCheems Dec 21, 2025
d8f29a7
Renames flash sparse attention module
LoserCheems Dec 21, 2025
b3ded16
Aligns bwd kernel with new QKV traits
LoserCheems Dec 23, 2025
9dd8a8f
Fix import statement in bug report template
LoserCheems Dec 23, 2025
e99d465
Fix placeholder text in bug report template for flash_sparse_attn
LoserCheems Dec 23, 2025
8757aa0
Fix placeholder text in feature request template for implementation d…
LoserCheems Dec 23, 2025
b631b41
Fix placeholder text in bug fix template for flash_sparse_attn_interface
LoserCheems Dec 23, 2025
fc4f400
Fix placeholder text in feature support template for flash_sparse_attn
LoserCheems Dec 23, 2025
8ee17d2
Fix environment variable name for skipping CUDA build in publish work…
LoserCheems Dec 23, 2025
d95c5e7
Fix title in CITATION.cff for consistency with project name
LoserCheems Dec 23, 2025
1625220
Fix description in pyproject.toml for clarity
LoserCheems Dec 23, 2025
615ba7e
Fix project name references in CONTRIBUTING.md for consistency
LoserCheems Dec 23, 2025
36fd600
Initializes softmax accumulators
LoserCheems Jan 8, 2026
cfe092d
Refactor attention bias calculation in FlashSparseAttention
LoserCheems Jan 8, 2026
4ccecaa
Fix attention bias calculation in FlashSparseAttention
LoserCheems Jan 9, 2026
46b3f49
Fix attention bias calculation in FlashSparseAttention
LoserCheems Jan 10, 2026
a8153a5
Adds Triton masking helper
LoserCheems Jan 13, 2026
0da98a7
Adds docstrings to mask function
LoserCheems Jan 13, 2026
7e5f36d
Adds Triton online softmax helpers
LoserCheems Jan 14, 2026
9ca3fde
Simplifies online softmax state
LoserCheems Jan 14, 2026
38befd7
Adds Triton block boundary helpers
LoserCheems Jan 14, 2026
5719558
Uses host min/max for block bounds
LoserCheems Jan 15, 2026
0f83c40
Adds Triton flash fwd kernel
LoserCheems Jan 15, 2026
3ba2e7f
Updates docstring param annotations
LoserCheems Jan 15, 2026
fb39b29
Adopts param-style docstrings
LoserCheems Jan 15, 2026
b9dfb6e
Adds Triton seq len utilities
LoserCheems Jan 15, 2026
d372e43
Enables variable-length flash forward
LoserCheems Jan 15, 2026
392a5f2
Enables local windowed flash forward
LoserCheems Jan 15, 2026
d8bc7c7
Merge branch 'main' into fsa
LoserCheems Jan 16, 2026
fabf60b
Bump version to 2.0.0
LoserCheems Jan 16, 2026
da48ce6
Merge branch 'fsa' of https://github.com/SmallDoges/flash-dmattn into…
LoserCheems Jan 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Import flash_dmattn
1. Import flash_sparse_attn
2. Run the following code:
```python
# Paste your code here
Expand Down
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ body:
attributes:
label: Describe the bug
description: Provide a concise description of the incorrect behaviour.
placeholder: Unexpected error when calling flash_dmattn(...)
placeholder: Unexpected error when calling flash_sparse_attn(...)
validations:
required: true
- type: textarea
Expand All @@ -31,7 +31,7 @@ body:
label: Steps to reproduce
description: Share the minimal steps or code necessary for us to see the failure.
placeholder: |
1. Import flash_dmattn
1. Import flash_sparse_attn
2. Run the snippet below
3. Observe the error
render: python
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ body:
attributes:
label: Implementation details
description: Call out potential CUDA/Python changes, performance implications, or compatibility considerations.
placeholder: Requires updates to flash_dmattn_interface and CUDA op...
placeholder: Requires updates to flash_sparse_attn_interface and CUDA op...
- type: textarea
id: use-case
attributes:
Expand Down
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE/bug_fix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ body:
attributes:
label: Changes
description: Highlight the notable code-level modifications.
placeholder: Updated flash_dmattn_interface to...
placeholder: Updated flash_sparse_attn_interface to...
validations:
required: true
- type: textarea
Expand Down
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE/feature_support.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ body:
attributes:
label: Changes
description: Describe new or changed public APIs, configuration, or CLI behaviour.
placeholder: Adds flash_dmattn.feature_flag...
placeholder: Adds flash_sparse_attn.feature_flag...
validations:
required: true
- type: textarea
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ jobs:
pip install torch --index-url https://download.pytorch.org/whl/cpu
- name: Build core package
env:
FLASH_DMATTN_SKIP_CUDA_BUILD: "TRUE"
FLASH_SPARSE_ATTENTION_SKIP_CUDA_BUILD: "TRUE"
run: |
python setup.py sdist --dist-dir=dist
- name: Deploy
Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
cff-version: "1.2.0"
date-released: 2025-06
message: "If you use this software, please cite it using these metadata."
title: "Flash Sparse Attention: Trainable Dynamic Mask Sparse Attention"
title: "Trainable Flash Sparse Attention"
url: "https://github.com/flash-algo/flash-sparse-attention"
authors:
- family-names: Shi
Expand Down Expand Up @@ -42,7 +42,7 @@ preferred-citation:
given-names: Guang
- family-names: Luo
given-names: Yuyu
title: "Trainable Dynamic Mask Sparse Attention"
title: "Trainable Flash Sparse Attention"
year: 2025
url: "https://arxiv.org/abs/2508.02124"
doi: "10.48550/arXiv.2508.02124"
Expand Down
12 changes: 6 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Contributing to Flash Dynamic Mask Attention
# Contributing to Flash Sparse Attention

Everyone is welcome to contribute, and we value everybody's contribution. Code contributions are not the only way to help the community. Answering questions, helping others, and improving the documentation are also immensely valuable.

Expand All @@ -8,7 +8,7 @@ However you choose to contribute, please be mindful and respect our [code of con

## Ways to contribute

There are several ways you can contribute to Flash-DMA:
There are several ways you can contribute to FSA:

* Fix outstanding issues with the existing code.
* Submit issues related to bugs or desired new features.
Expand All @@ -30,7 +30,7 @@ Do your best to follow these guidelines when submitting a bug-related issue or a

### Did you find a bug?

The Flash-DMA library is robust and reliable thanks to users who report the problems they encounter.
The FSA library is robust and reliable thanks to users who report the problems they encounter.

Before you report an issue, we would really appreciate it if you could **make sure the bug was not already reported** (use the search bar on GitHub under Issues). Your issue should also be related to bugs in the library itself, and not your code.

Expand All @@ -50,7 +50,7 @@ python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {

### Do you want a new feature?

If there is a new feature you'd like to see in Flash-DMA, please open an issue and describe:
If there is a new feature you'd like to see in FSA, please open an issue and describe:

1. What is the *motivation* behind this feature? Is it related to performance optimization, memory efficiency, or new attention mechanisms?

Expand All @@ -77,7 +77,7 @@ We're always looking for improvements to the documentation that make it more cle

Before writing any code, we strongly advise you to search through the existing PRs or issues to make sure nobody is already working on the same thing.

You will need basic `git` proficiency to contribute to Flash-DMA. You'll need **Python 3.8+** and **CUDA 11.8+** to contribute.
You will need basic `git` proficiency to contribute to FSA. You'll need **Python 3.8+** and **CUDA 11.8+** to contribute.

### Development Setup

Expand Down Expand Up @@ -120,7 +120,7 @@ You will need basic `git` proficiency to contribute to Flash-DMA. You'll need **
python -m pytest tests/ -v
```

Flash-DMA also includes performance benchmarks. Run them to ensure your changes don't regress performance:
FSA also includes performance benchmarks. Run them to ensure your changes don't regress performance:

```bash
python benchmarks/forward_performance.py
Expand Down
Loading
Loading