A list of papers, docs, codes about diffusion quantization. This repo collects various quantization methods for the Diffusion Models. Welcome to PR the works (papers, repositories) missed by the repo.
- [ICLR] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation [code]
- [ICLR] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models [code]
- [ICLR] BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models [code]
- [ICLR] SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration [code]
- [CVPR] Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers [code]
- [CVPR] CacheQuant: Comprehensively Accelerated Diffusion Models [code]
- [CVPR] PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution [code]
- [ICML] Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers [code]
- [ICML] SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization [code]
- [ICCV] Text Embedding Knows How to Quantize Text-Guided Diffusion Models
- [ICCV] QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning [code]
- [ICCV] DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization[code]
- [ICCV] QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [code]
- [WACV] DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing [code]
- [ISCAS] CDM-QTA: Quantized Training Acceleration for Efficient LoRA Fine-Tuning of Diffusion Model
- [Arxiv] QVGen: Pushing the Limit of Quantized Video Generative Models
- [Arxiv] TR-DQ: Time-Rotation Diffusion Quantization
- [Arxiv] Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping
- [Arxiv] TQ-DiT: Efficient Time-Aware Quantization for Diffusion Transformers
- [Arxiv] FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers [code]
- [Arxiv] Quantizing Diffusion Models from a Sampling-Aware Perspective
- [Arxiv] QVGen: Pushing the Limit of Quantized Video Generative Models
- [Arxiv] QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution [code]
- [Arxiv] Q&C: When Quantization Meets Cache in Efficient Image Generation
- [Arxiv] Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
- [Arxiv] DVD-Quant: Data-free Video Diffusion Transformers Quantization [code]
- [Arxiv] MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation
- [Arxiv] PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
- [ICLR] EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models [code]
- [CVPR] TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models [code]
- [CVPR] Towards Accurate Post-training Quantization for Diffusion Models [code]
- [ECCV] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization [code]
- [ECCV] Timestep-Aware Correction for Quantized Diffusion Models
- [ECCV] Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [code]
- [ECCV] Memory-Efficient Fine-Tuning for Quantized Diffusion Model [code]
- [NeurIPS] PTQ4DiT: Post-training Quantization for Diffusion Transformers [code]
- [NeurIPS] BitsFusion: 1.99 bits Weight Quantization of Diffusion Model [code]
- [NeurIPS] TerDiT: Ternary Diffusion Models with Transformers [code]
- [NeurIPS] Binarized Diffusion Model for Image Super-Resolution [code]
- [NeurIPS] BiDM: Pushing the Limit of Quantization for Diffusion Models [code]
- [NeurIPS] StepbaQ: Stepping backward as Correction for Quantized Diffusion Models
- [AAAI] MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models [code]
- [AAAI] Qua2SeDiMo: Quantifiable Quantization Sensitivity of Diffusion Models [code]
- [AAAI] TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models
- [AAAI] Optimizing Quantized Diffusion Models via Distillation with Cross-Timestep Error Correction
- [Arxiv] HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization
- [Arxiv] VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
- [Arxiv] TaQ-DiT: Time-aware Quantization for Diffusion Transformers [code]
- [ICCV] Q-Diffusion: Quantizing Diffusion Models [code]
- [CVPR] Post-training Quantization on Diffusion Models [code]
- [NeurIPS] PTQD: Accurate Post-Training Quantization for Diffusion Models [code]
- [NeurIPS] Q-DM: An Efficient Low-bit Quantized Diffusion Model
- [NeurIPS] Temporal Dynamic Quantization for Diffusion Models