Jiangshan Wang1,2,
Zeqiang Lai2,3†,
Jiarui Chen2,4,
Jiayi Guo1,
Hang Guo1,
Xiu Li1,
Xiangyu Yue3*,
Chunchao Guo2*
1 Tsinghua University, 2 Tencent Hunyuan, 3 CUHK MMLab, 4 HITSZ
† Project Lead * Corresponding Author
Diffusion Transformers (DiT) have demonstrated remarkable generative capabilities but remain highly computationally expensive. Previous acceleration methods, such as pruning and distillation, typically rely on a fixed computational capacity, leading to insufficient acceleration and degraded generation quality. To address this limitation, we propose Elastic Diffusion Transformer (E-DiT) , an adaptive acceleration framework for DiT that effectively improves efficiency while maintaining generation quality. Specifically, we observe that the generative process of DiT exhibits substantial sparsity (i.e., some computations can be skipped with minimal impact on quality), and this sparsity varies significantly across samples. Motivated by this observation, E-DiT equips each DiT block with a lightweight router that dynamically identifies sample-dependent sparsity from the input latent. Each router adaptively determines whether the corresponding block can be skipped. If the block is not skipped, the router then predicts the optimal MLP width reduction ratio within the block. During inference, we further introduce a block-level feature caching mechanism that leverages router predictions to eliminate redundant computations in a training-free manner. Extensive experiments across 2D image (Qwen-Image and FLUX) and 3D asset (Hunyuan3D-3.0) demonstrate the effectiveness of E-DiT, achieving up to ∼2× speedup with negligible loss in generation quality.
- [2026.2.25] Inference code for image generation is released!
- [2026.2.15] Paper released!
- Running the following command to construct the environment. We use the DiffSynth-Studio as the codebase to develop E-DiT, you can also refer to their official repo if there is any problem about the environment.
git clone https://github.com/wangjiangshan0725/Elastic-DiT.git
cd Elastic-DiT
pip install -e .
-
Download the weight of Qwen-Image from their official Huggingface Repo and put them at ckpt/Qwen-Image;
-
Download the weight of E-DiT here and put it at ckpt/model.safetensors
python infer.py
If you find our work helpful, please star 🌟 this repo and cite 📑 our paper. Thanks for your support!
@article{wang2026elastic,
title={Elastic Diffusion Transformer},
author={Wang, Jiangshan and Lai, Zeqiang and Chen, Jiarui and Guo, Jiayi and Guo, Hang and Li, Xiu and Yue, Xiangyu and Guo, Chunchao},
journal={arXiv preprint arXiv:2602.13993},
year={2026}
}
We thank Qwen-Image and DiffSynth-Studio for their clean codebase.
The code in this repository is still being reorganized. Errors that may arise during the organizing process could lead to code malfunctions or discrepancies from the original research results. If you have any questions or concerns, please send emails to wjs23@mails.tsinghua.edu.cn.




