Skip to content

mit-han-lab/radial-attention

Repository files navigation

Radial Attention

[2025-08-04] Radial Attention is now supports Lightx2v, a 4-step LoRA. Radial Attention also supports SageAttention2++ for FP8 Matmul accumulation on 4090. With the joint effort of Radial Attention, SageAttention and Lightx2v LoRA, now it only takes 33/90 seconds to generate a high-fidelity video for Wan2.1 on a single H100/4090 GPU respectively!.

[2025-07-22] Radial Attention is now compatible with SageAttention version 2!

[2025-07-14] Radial Attention is now compatible with SageAttention version 1!

[2025-07-03] Radial Attention now supports Wan2.1_14B_FusionX LoRA! You can get high-quality videos within just 8 steps (90 seconds on a single H100 GPU)!

[2025-06-24] Radial Attention is open-sourced! Wan2.1-14B, HunyuanVideo, and Mochi-1 are supported for fast video generation with high quality under 1-4⨉ video length.

demo_final_with_audio2.mp4

We present Radial Attention, a sparse attention mechanism with $\mathcal{O}(n\log n)$ computational complexity. Radial Attention accelerates pre-trained HunyuanVideo by 1.9× at its default video length while maintaining comparable video quality. When generating 4× longer videos, it reduces tuning costs by up to 4.4× and speeds up inference by up to 3.7× versus dense attention.

Radial Attention: $\mathcal{O}(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

Xingyang Li*, Muyang Li*, Tianle Cai, Haocheng Xi, Shuo Yang, Yujun Lin, Lvmin Zhang, Songlin Yang, Jinbo Hu, Kelly Peng, Maneesh Agrawala, Ion Stoica, Kurt Keutzer, and Song Han

MIT, NVIDIA, Princeton, UC Berkeley, Stanford, and First Intelligence

📖Overview

teaser

Radial Attention is a scalable sparse attention mechanism for video diffusion models that translates Spatiotemporal Energy Decay—observed in attention score distributions—into exponentially decaying compute density. Unlike $\mathcal{O}(n^2)$ dense attention or linear approximations, Radial Attention achieves $\mathcal{O}(n \log n)$ complexity while preserving expressive power for long videos. Here are our core contributions.

  • Physics-Inspired Sparsity: Static masks enforce spatially local and temporally decaying attention, mirroring energy dissipation in physical systems.
  • Efficient Length Extension: Pre-trained models (e.g., Wan2.1-14B, HunyuanVideo) scale to 4× longer videos via lightweight LoRA tuning, avoiding full-model retraining.

🔍Sparsity Pattern Design

patterns

(a) The compute density pattern. The attention map is divided into $2\lceil\log_2(\max(f, 2))\rceil - 1$ bands (here, the number of frames $f = 12$) based on the temporal distance between tokens. The central band has full compute density, while each successive outer band has half the density of the previous one. Except for band $\pm1$, each band also doubles the diagonal width of its predecessor.
(b) The corresponding attention mask for (a). The compute density is reflected in the compute diagonal width of each frame-to-frame block. When the diagonal width drops below 1, we reduce the frequency of diagonals. We additionally add an attention sink.
(c) An example mask used in HunyuanVideo, illustrating the final sparsity pattern in practice.

📊Performance

results

Radial Attention reduces the computational complexity of attention from $\mathcal{O}(n^2)$ to $\mathcal{O}(n \log n)$. When generating a 500-frame 720p video with HunyuanVideo, it reduces the attention computation by 9×, achieves 3.7× speedup, and saves 4.6× tuning costs.

🎥Visual Results

🔹Accelerating Pre-trained Models

image Radial Attention delivers nearly identical quality to Wan2.1-14B at default video length, while offering 1.8× speedup.

🔹Long Video Generation

image Radial Attention enables 4× longer video generation with LoRA tuning, outperforming dense attention in vision rewards, while achieving 3.7× speedup and 4.4× lower tuning costs.

🔹LoRA Compatibility

image Fully compatible with existing style LoRAs. On HunyuanVideo, Radial Attention LoRA enables 4× video length extension while preserving vision quality.

🔹LoRA

🔧Installation

We start with cloning the repository:

git clone [email protected]:mit-han-lab/radial-attention --recursive
cd radial-attention

We recommend using CUDA versions 12.4 + Pytorch versions 2.5.1

# 1. Create and activate conda environment
conda create -n radial python==3.12 -y
conda activate radial

# 2. Install PyTorch
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

# 3. Install pip dependencies from CogVideoX and HunyuanVideo
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

# 4. Install FlashInfer for fast and hardware-friendly inference
pip install flashinfer-python -i https://flashinfer.ai/whl/cu124/torch2.5/

# 5. Install Latest Diffusers to try lightx2v features and Wan2.2
pip install git+https://github.com/huggingface/diffusers

# 6. (Optional) Install Sparse_SageAttention for further acceleration
cd third_party/SageAttention/ # install SageAttention
export EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 # parallel compiling (Optional)
python setup.py install  # or pip install -e .
cd ../..
cd third_party/sparse_sageattn # if you want to use Radial Attention with SageAttention v1 backend
python setup.py install
cd ../..
cd third_party/sparse_sageattn_2 # if you want to use Radial Attention with SageAttention v2 backend
pip install ninja   # for parallel compilation
python setup.py install   # or pip install -e .
cd ../..

🚀Inference Examples

Wan2.1-14B

We support Text-to-Video inference of Wan2.1-14B. The running script is:

bash scripts/wan_t2v_inference.sh

HunyuanVideo

We support Text-to-Video inference of HunyuanVideo. The running script is:

bash scripts/hunyuan_t2v_inference.sh

📕Open-source Plan

📚Citation

If you find Radial Attention useful or relevant to your research, please cite our paper:

@article{li2025radial,
  title={Radial Attention: $\mathcal{O}(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation},
  author={Li*, Xingyang and Li*, Muyang and Cai, Tianle and Xi, Haocheng and Yang, Shuo and Lin, Yujun and Zhang, Lvmin and Yang, Songlin and Hu, Jinbo and Peng, Kelly and Agrawala, Maneesh and Stoica, Ion and Keutzer, Kurt and Han, Song},
  journal={arXiv preprint arXiv:2506.19852},
  year={2025}
}

Acknowledgements

We thank Sparse-VideoGen for insights on code design.

We thank MIT-IBM Watson AI Lab, National Science Foundation, Hyundai, and Amazon for supporting this research.