SpargeAttn fork for Windows wheels and easy installation

This repo makes it easy to build SpargeAttention (also known as SparseSageAttention, and the package name is spas_sage_attn) for multiple Python, PyTorch, and CUDA versions, then distribute the wheels to other people.

The latest wheels support GTX 16xx, RTX 20xx/30xx/40xx/50xx, V100, A100, H100, AGX Orin (sm70/75/80/86/87/89/90/120). There are also reports that it works with B200 (sm100) and DGX Spark (sm121), but I did not bundle these kernels in the wheels, and you need to build from source.

Installation

It's similar to SageAttention. Install a wheel in the release page: https://github.com/woct0rdho/SpargeAttn/releases

Use notes

Before using SpargeAttention in larger projects like ComfyUI, please run test_spargeattn.py to test if SpargeAttention itself works.

SpargeAttention is usually not used directly, and RadialAttention is built on top of it. To apply RadialAttention in ComfyUI, you can use my node ComfyUI-RadialAttn, or WanVideoSetRadialAttention node in kijai's ComfyUI-WanVideoWrapper.

SLA (Sparse-Linear Attention) is also built on top of it.

Build from source

(This is for developers)

If you need to build and run SpargeAttention on your own machine:

Install Visual Studio (MSVC and Windows SDK), and CUDA toolkit
Clone this repo
- Checkout abi3_stable branch if you want ABI3 and libtorch stable ABI, which supports PyTorch >= 2.9
- Checkout abi3 branch if you want ABI3, which supports PyTorch >= 2.4
- The purpose of ABI3 and libtorch stable ABI is to avoid building many wheels. I've also added torch.compile support in these branches. There is no other functional difference from the main branch
Install the dependencies in pyproject.toml, including the desired torch version such as torch 2.7.1+cu128
Run python setup.py install --verbose to install directly, or python setup.py bdist_wheel --verbose to build a wheel. This avoids the environment checks of pip

Dev notes

The wheels are built using the workflow
Block-Sparse-SageAttention-2.0 is included, which is used in RadialAttention
Compared to the official repo, I've modified the API so we can use the same functions spas_sage2_attn_meansim_cuda, spas_sage2_attn_meansim_topk_cuda, block_sparse_sage2_attn_cuda for sm >= 80
For RTX 20xx, you need to use the Triton kernel spas_sage_attn_meansim in spas_sage_attn/triton_kernel_example.py

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
.github/workflows		.github/workflows
assets		assets
csrc		csrc
evaluate		evaluate
inference_examples		inference_examples
spas_sage_attn		spas_sage_attn
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-evaluate.txt		requirements-evaluate.txt
setup.py		setup.py
simpleindex.toml		simpleindex.toml
update_pyproject.py		update_pyproject.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpargeAttn fork for Windows wheels and easy installation

Installation

Use notes

Build from source

Dev notes

About

Uh oh!

Releases 5

Languages

License

woct0rdho/SpargeAttn

Folders and files

Latest commit

History

Repository files navigation

SpargeAttn fork for Windows wheels and easy installation

Installation

Use notes

Build from source

Dev notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Languages