TreeQ: Pushing the Quantization Boundary of Diffusion Transformer via Tree-Structured Mixed-Precision Search

Kaicheng Yang, Kaisen Yang,Baiting Wu,Xun Zhang,Qianrui Yang,Haotong Qin, He Zhang,and Yulun Zhang. [arXiv] [supplementary material]

🔥🔥🔥 News

2025-12-06: Repository initial release.

Abstract: Diffusion Transformers (DiTs) have emerged as a highly scalable and effective backbone for image generation, outperforming U-Net architectures in both scalability and performance. However, their real-world deployment remains challenging due to high computational and memory demands. Mixed-Precision Quantization (MPQ), designed to push the limits of quantization, has demonstrated remarkable success in advancing U-Net quantization to sub-4-bit settings while significantly reducing computational and memory overhead. Nevertheless, its application to DiT architectures remains limited and underexplored. In this work, we propose TreeQ, a unified framework addressing key challenges in DiT quantization. First, to tackle inefficient search and proxy misalignment, we introduce Tree-Structured Search (TSS). This DiT-specific approach leverages the architecture's linear properties to traverse the solution space in $\mathcal{O(n)}$ time while improving objective accuracy through comparison-based pruning. Second, to unify optimization objectives, we propose Environmental Noise Guidance (ENG), which aligns Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) configurations using a single hyperparameter. Third, to mitigate information bottlenecks in ultra-low-bit regimes, we design the General Monarch Branch (GMB). This structured sparse branch prevents irreversible information loss, enabling finer detail generation. Through extensive experiments, our TreeQ framework demonstrates state-of-the-art performance on DiT-XL/2 under W3A3 and W4A4 PTQ/PEFT settings. Notably, our work is the first to achieve near-lossless 4-bit PTQ performance on DiT models.

Visualization

Fig1:TreeQ(right) achieves better generation compared to baseline(left) under low-bit PTQ on DiT-XL/2

Fig2:Visualization comparison between TSS(right) and traditional methods Integer Programming(left).

Fig3:GMB provides more details for low-bit quantized DiT4SR (left) and FLUX-Schnell (right), advancing practical applications.

🔖 TODO

Release ckpt,training and inference code
Release inference engine
Release more quantized DiTs

💡 Acknowledgements

This code is built on Diffusion Transformer.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
figs		figs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TreeQ: Pushing the Quantization Boundary of Diffusion Transformer via Tree-Structured Mixed-Precision Search

🔥🔥🔥 News

Visualization

🔖 TODO

💡 Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TreeQ: Pushing the Quantization Boundary of Diffusion Transformer via Tree-Structured Mixed-Precision Search

🔥🔥🔥 News

Visualization

🔖 TODO

💡 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Packages