Sparse-Linear Attention (SLA) nodes for ComfyUI - Accelerate your diffusion models with trainable sparse-linear attention!
This custom node package brings the power of SLA (Sparse-Linear Attention) to ComfyUI, enabling faster inference for Stable Diffusion and other diffusion transformer models.
- SLA Attention Node: Full-featured sparse-linear attention with configurable parameters
- SageSLA Attention Node: Optimized version for maximum speed
- Easy Integration: Drop-in replacement for standard attention mechanisms
- GPU Accelerated: CUDA-optimized for best performance
- ComfyUI Native: Designed specifically for ComfyUI workflows
SLA (Sparse-Linear Attention) is a novel attention mechanism that combines sparse and linear attention to accelerate diffusion transformer models. It selectively attends to the most relevant tokens rather than computing full attention across all positions, resulting in:
- β‘ Faster inference - Reduced computational complexity
- π― Maintained quality - Minimal impact on generation quality
- π§ Configurable - Adjust sparsity levels to balance speed vs quality
- π Trainable - Can be fine-tuned for specific models
nope
-
Navigate to your ComfyUI custom nodes directory:
cd ComfyUI/custom_nodes/ -
Clone this repository:
git clone https://github.com/marduk191/comfyui_SLA-sage.git
-
Install dependencies:
cd comfyui_SLA-sage python install.pyOr manually:
pip install -r requirements.txt
-
Restart ComfyUI
- PyTorch >= 2.0.0
- CUDA-compatible GPU
- ComfyUI (latest version recommended)
- SLA
- Sparge attention
For the best performance with the SageSLA node, you can install the optimized SpargeAttn backend:
pip install git+https://github.com/marduk191/SpargeAttn.git --no-build-isolationNote:
- Without SpargeAttn, the SageSLA node will automatically fall back to using standard SLA
- The standard SLA implementation works well for most use cases
- SageSLA provides additional speed optimizations based on the SageAttention framework
- Installation requires a CUDA development environment
The main SLA node with full configurability:
Inputs:
model(MODEL): Input model to patch
Parameters:
-
topk(float, 0.0-1.0): Sparsity ratio- 0.2 = keep 80% of tokens (recommended starting point)
- Lower values = more sparse = faster but may reduce quality
- Higher values = less sparse = slower but better quality
-
feature_map(dropdown): Kernel type for attentionsoftmax: Standard attention-like kernel (recommended)elu: ELU-based kernelrelu: ReLU-based kernel
-
block_size_q(int, 16-256): Block size for query processing- Default: 64
- Affects memory usage and performance
-
block_size_k(int, 16-256): Block size for key processing- Default: 64
- Affects memory usage and performance
-
enabled(boolean): Enable/disable SLA- Useful for A/B testing
Outputs:
MODEL: Patched model with SLA attention
Optimized implementation using fixed block sizes (BLKQ=128, BLKK=64) for best performance:
Inputs:
model(MODEL): Input model to patch
Parameters:
topk(float, 0.0-1.0): Sparsity ratiofeature_map(dropdown): Kernel type (softmax, elu, relu)enabled(boolean): Enable/disable SageSLA
Outputs:
MODEL: Patched model with SageSLA attention
Note: Automatically falls back to standard SLA if SpargeAttn is not installed. For optimal performance, install SpargeAttn (see Optional Installation above).
A basic workflow to use SLA:
CheckpointLoader β SLA Attention β KSampler β VAEDecode β SaveImage
- Load your model with
CheckpointLoaderSimple - Connect the MODEL output to
SLA Attentionnode - Connect the SLA Attention output to your
KSampler - Continue your workflow as normal
See example_workflow.json for a complete example you can drag into ComfyUI.
For best quality with good speedup:
- topk: 0.2
- feature_map: softmax
- block sizes: 64 (default)
For maximum speed:
- topk: 0.1
- feature_map: relu
- block sizes: 32
For minimal quality loss:
- topk: 0.3
- feature_map: softmax
- block sizes: 64
-
Start with defaults: The default settings (topk=0.2, softmax) work well for most models
-
Experiment with topk: This is the most impactful parameter
- Lower = faster but may affect quality
- Higher = slower but better quality
-
Use SageSLA for simplicity: If you don't need fine control, SageSLA provides good defaults
-
Test before production: Always compare outputs with and without SLA to ensure quality meets your needs
-
GPU Memory: SLA may use different memory patterns than standard attention. If you encounter OOM errors, try:
- Reducing block sizes
- Increasing topk (less sparse)
- Reducing batch size
Make sure you've installed the SLA package:
pip install git+https://github.com/marduk191/SLA.gitThis is an informational message, not an error. It means:
- You're using the SageSLA node without the SpargeAttn package installed
- The node automatically falls back to standard SLA implementation
- Everything will work fine, but you may not get the maximum performance boost
To get the optimized SageSLA implementation:
pip install git+https://github.com/marduk191/SpargeAttn.git --no-build-isolationnote: if you can not get this to build on windows, wheels are available here - https://github.com/woct0rdho/SpargeAttn/releases/tag/v0.1.0-windows.post3
- Ensure you're using a CUDA GPU
- Try lower topk values (0.1-0.2)
- Some models may benefit more than others
- Increase topk value (0.3-0.4)
- Try different feature_map options
- Use SLA Attention with softmax kernel
- Reduce block_size_q and block_size_k
- Increase topk (less sparse requires less memory overhead)
- Reduce batch size in your workflow
SLA replaces the standard attention mechanism in diffusion models with a hybrid sparse-linear attention:
- Sparse Selection: Identifies most relevant tokens using topk selection
- Linear Attention: Applies efficient linear attention kernels
- Hybrid Output: Combines sparse and linear components
This approach maintains quality while significantly reducing computational cost, especially for long sequences.
- Speedup: Typically 1.5-3x faster inference depending on settings
- Memory: Similar or slightly lower memory usage compared to standard attention
- Quality: Minimal quality loss with proper parameter tuning (topk β₯ 0.2)
- SLA Implementation: thu-ml/SLA
- Original Paper: "SLA: Sparse-Linear Attention for Accelerating Diffusion Models" (2025)
- Authors: Tsinghua University & UC Berkeley researchers
This project follows the license of the original SLA repository. Please refer to the SLA repository for license details.
Contributions are welcome! Please feel free to:
- Report bugs
- Suggest features
- Submit pull requests
- Share your results
If you encounter issues:
- Check the Troubleshooting section
- Review example_workflow.json
- Open an issue on GitHub with:
- ComfyUI version
- GPU model
- Error messages
- Your workflow/settings
Made with β€οΈ for the ComfyUI community
Accelerate your diffusion models with SLA!