Skip to content

wangjiangshan0725/Elastic-DiT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elastic Diffusion Transformer

Jiangshan Wang1,2, Zeqiang Lai2,3†, Jiarui Chen2,4, Jiayi Guo1,
Hang Guo1, Xiu Li1, Xiangyu Yue3*, Chunchao Guo2*

1 Tsinghua University, 2 Tencent Hunyuan, 3 CUHK MMLab, 4 HITSZ

Project Lead * Corresponding Author

arXiv

Diffusion Transformers (DiT) have demonstrated remarkable generative capabilities but remain highly computationally expensive. Previous acceleration methods, such as pruning and distillation, typically rely on a fixed computational capacity, leading to insufficient acceleration and degraded generation quality. To address this limitation, we propose Elastic Diffusion Transformer (E-DiT) , an adaptive acceleration framework for DiT that effectively improves efficiency while maintaining generation quality. Specifically, we observe that the generative process of DiT exhibits substantial sparsity (i.e., some computations can be skipped with minimal impact on quality), and this sparsity varies significantly across samples. Motivated by this observation, E-DiT equips each DiT block with a lightweight router that dynamically identifies sample-dependent sparsity from the input latent. Each router adaptively determines whether the corresponding block can be skipped. If the block is not skipped, the router then predicts the optimal MLP width reduction ratio within the block. During inference, we further introduce a block-level feature caching mechanism that leverages router predictions to eliminate redundant computations in a training-free manner. Extensive experiments across 2D image (Qwen-Image and FLUX) and 3D asset (Hunyuan3D-3.0) demonstrate the effectiveness of E-DiT, achieving up to ∼2× speedup with negligible loss in generation quality.

🔥 News

  • [2026.2.25] Inference code for image generation is released!
  • [2026.2.15] Paper released!

💻 Code

🛠️ Setup

  • Running the following command to construct the environment. We use the DiffSynth-Studio as the codebase to develop E-DiT, you can also refer to their official repo if there is any problem about the environment.
git clone https://github.com/wangjiangshan0725/Elastic-DiT.git 
cd Elastic-DiT
pip install -e .
  • Download the weight of Qwen-Image from their official Huggingface Repo and put them at ckpt/Qwen-Image;

  • Download the weight of E-DiT here and put it at ckpt/model.safetensors

🚀 Inference

python infer.py

🎨 Gallery

Image Generation

3D Asset Generation

🖋️ Citation

If you find our work helpful, please star 🌟 this repo and cite 📑 our paper. Thanks for your support!

@article{wang2026elastic,
  title={Elastic Diffusion Transformer},
  author={Wang, Jiangshan and Lai, Zeqiang and Chen, Jiarui and Guo, Jiayi and Guo, Hang and Li, Xiu and Yue, Xiangyu and Guo, Chunchao},
  journal={arXiv preprint arXiv:2602.13993},
  year={2026}
}

Acknowledgements

We thank Qwen-Image and DiffSynth-Studio for their clean codebase.

Contact

The code in this repository is still being reorganized. Errors that may arise during the organizing process could lead to code malfunctions or discrepancies from the original research results. If you have any questions or concerns, please send emails to wjs23@mails.tsinghua.edu.cn.

About

Elastic Diffusion Transformer: Accelerating SOTA generation models (e.g., Qwen-Image, Hunyuan3d ) through adaptive computation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages