[2025-08-04] Radial Attention is now supports Lightx2v, a 4-step LoRA. Radial Attention also supports SageAttention2++ for FP8 Matmul accumulation on 4090. With the joint effort of Radial Attention, SageAttention and Lightx2v LoRA, now it only takes 33/90 seconds to generate a high-fidelity video for Wan2.1 on a single H100/4090 GPU respectively!.
[2025-07-22] Radial Attention is now compatible with SageAttention version 2!
[2025-07-14] Radial Attention is now compatible with SageAttention version 1!
[2025-07-03] Radial Attention now supports Wan2.1_14B_FusionX LoRA! You can get high-quality videos within just 8 steps (90 seconds on a single H100 GPU)!
[2025-06-24] Radial Attention is open-sourced! Wan2.1-14B, HunyuanVideo, and Mochi-1 are supported for fast video generation with high quality under 1-4⨉ video length.
demo_final_with_audio2.mp4
We present Radial Attention, a sparse attention mechanism with
Radial Attention:
Xingyang Li*, Muyang Li*, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, and Song Han
MIT, NVIDIA, Princeton, UC Berkeley, Stanford, and First Intelligence
Radial Attention is a scalable sparse attention mechanism for video diffusion models that translates Spatiotemporal Energy Decay—observed in attention score distributions—into exponentially decaying compute density. Unlike
- Physics-Inspired Sparsity: Static masks enforce spatially local and temporally decaying attention, mirroring energy dissipation in physical systems.
- Efficient Length Extension: Pre-trained models (e.g., Wan2.1-14B, HunyuanVideo) scale to 4× longer videos via lightweight LoRA tuning, avoiding full-model retraining.
(a) The compute density pattern. The attention map is divided into
(b) The corresponding attention mask for (a). The compute density is reflected in the compute diagonal width of each frame-to-frame block. When the diagonal width drops below 1, we reduce the frequency of diagonals. We additionally add an attention sink.
(c) An example mask used in HunyuanVideo, illustrating the final sparsity pattern in practice.
Radial Attention reduces the computational complexity of attention from
Radial Attention delivers nearly identical quality to Wan2.1-14B at default video length, while offering 1.8× speedup.
Radial Attention enables 4× longer video generation with LoRA tuning, outperforming dense attention in vision rewards, while achieving 3.7× speedup and 4.4× lower tuning costs.
Fully compatible with existing style LoRAs. On HunyuanVideo, Radial Attention LoRA enables 4× video length extension while preserving vision quality.
We start with cloning the repository:
git clone [email protected]:mit-han-lab/radial-attention --recursive
cd radial-attention
We recommend using CUDA versions 12.4 + Pytorch versions 2.5.1
# 1. Create and activate conda environment
conda create -n radial python==3.12 -y
conda activate radial
# 2. Install PyTorch
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
# 3. Install pip dependencies from CogVideoX and HunyuanVideo
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
# 4. Install FlashInfer for fast and hardware-friendly inference
pip install flashinfer-python -i https://flashinfer.ai/whl/cu124/torch2.5/
# 5. Install Latest Diffusers to try lightx2v features and Wan2.2
pip install git+https://github.com/huggingface/diffusers
# 6. (Optional) Install Sparse_SageAttention for further acceleration
cd third_party/SageAttention/ # install SageAttention
export EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 # parallel compiling (Optional)
python setup.py install # or pip install -e .
cd ../..
cd third_party/sparse_sageattn # if you want to use Radial Attention with SageAttention v1 backend
python setup.py install
cd ../..
cd third_party/sparse_sageattn_2 # if you want to use Radial Attention with SageAttention v2 backend
pip install ninja # for parallel compilation
python setup.py install # or pip install -e .
cd ../..
We support Text-to-Video inference of Wan2.1-14B. The running script is:
bash scripts/wan_t2v_inference.sh
We support Text-to-Video inference of HunyuanVideo. The running script is:
bash scripts/hunyuan_t2v_inference.sh
- Integrate Wan2.1_14B_FusionX LoRA for high-quality few-step generation
- Adopt Sparse-VideoGen's fused kernels for further speedup
- ComfyUI integration (in ComfyUI-nunchaku)
- Support Mochi-1
- Support Multi-GPU inference
- Release LoRA checkpoints for longer-video generation
If you find Radial Attention useful or relevant to your research, please cite our paper:
@article{li2025radial,
title={Radial Attention: $\mathcal{O}(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation},
author={Li*, Xingyang and Li*, Muyang and Cai, Tianle and Xi, Haocheng and Yang, Shuo and Lin, Yujun and Zhang, Lvmin and Yang, Songlin and Hu, Jinbo and Peng, Kelly and Agrawala, Maneesh and Stoica, Ion and Keutzer, Kurt and Han, Song},
journal={arXiv preprint arXiv:2506.19852},
year={2025}
}
We thank Sparse-VideoGen for insights on code design.
We thank MIT-IBM Watson AI Lab, National Science Foundation, Hyundai, and Amazon for supporting this research.