Based on ADM, the downstream works are listed as follows:
| Paper | Task | Date | Conference/Journal |
|---|---|---|---|
| CycleDiff | Cycle Generation | 2026-01-17 | TIP 2026 |
| DISCO | Combinatorial optimization | 2025-08-13 | CAD/Graphics 2025 Best Paper >>Graphical Models |
| LaDi-WM | World modeling | 2025-08-02 | CoRL 2025 |
| EA6D | 6D Pose estimation | 2025-06-26 | ICCV 2025 |
| PASG | Shape generation | 2025-04-18 | IEEE TVCG |
| DiffMOT | Multiple object tracking | 2024-02-27 | CVPR 2024 |
| DiffusionEdge | Edge detection | 2023-12-09 | AAAI 2024 |
- Update ddm_const_2, replacing the noise scheduler \sqrt(t) with t.
- We now update training for text-2-img, please refer to text-2-img.
- We now modify the two-branch UNet, resulting a single-decoder UNet architecture.
You can use the single-decoder UNet in uncond-unet-sd and cond-unet-sd.
- install torch
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
- install other packages.
pip install -r requirement.txt
- prepare accelerate config.
accelerate config
The file structure should look like:
(a) unconditional cifar10:
cifar-10-python
|-- cifar-10-batches-py
| |-- data_batch_1
| |-- data_batch_2
| |-- XXX
(b) unconditional Celeb-AHQ:
celebahq
|-- celeba_hq_256
| |-- 00000.jpg
| |-- 00001.jpg
| |-- XXXXX.jpg
(c) conditional DIV2K:
DIV2K
|-- DIV2K_train_HR
| |-- 0001.png
| |-- 0002.png
| |-- XXXX.png
|-- DIV2K_valid_HR
| |-- 0801.png
| |-- 0802.png
| |-- XXXX.png
(d) conditional DUTS:
DUTS
|-- DUTS-TR
| |-- DUTS-TR-Image
| | |-- XXX.jpg
| |-- DUTS-TR-Mask
| | |-- XXX.png
|-- DUTS-TE
| |-- DUTS-TE-Image
| | |-- XXX.jpg
| |-- DUTS-TE-Mask
| | |-- XXX.png
accelerate launch train_uncond_dpm.py --cfg ./configs/cifar10/ddm_uncond_const_uncond_unet.yaml
- training auto-encoder:
accelerate launch train_vae.py --cfg ./configs/celebahq/celeb_ae_kl_256x256_d4.yaml
- you should add the model weights in the first step to config file
./configs/celebahq/celeb_uncond_ddm_const_uncond_unet_ldm.yaml(line 41), then train latent diffusion model:
accelerate launch train_uncond_ldm.py --cfg ./configs/celebahq/celeb_uncond_ddm_const_uncond_unet_ldm.yaml
- training auto-encoder:
accelerate launch train_vae.py --cfg ./configs/super-resolution/div2k_ae_kl_512x512_d4.yaml
- training latent diffusion model:
accelerate launch train_cond_ldm.py --cfg ./configs/super-resolution/div2k_cond_ddm_const_ldm.yaml
accelerate launch train_cond_dpm.py --cfg ./configs/saliency/DUTS_ddm_const_dpm_114.yaml
change the sampling steps "sampling_timesteps" in the config file
- unconditional generation:
python sample_uncond.py --cfg ./configs/cifar10/ddm_uncond_const_uncond_unet.yaml
python sample_uncond.py --cfg ./configs/celebahq/celeb_uncond_ddm_const_uncond_unet_ldm.yaml
- conditional generation (Latent space model):
- Super-resolution:
python ./eval_downstream/eval_sr.py --cfg ./configs/super-resolution/div2k_sample.yaml
- Inpainting:
python ./eval_downstream/sample_inpainting.py --cfg ./configs/celebahq/celeb_uncond_ddm_const_uncond_unet_ldm_sample.yaml
- Saliency:
python ./eval_downstream/eval_saliency.py --cfg ./configs/saliency/DUTS_sample_114.yaml
- download laion data from laion.
- download metadata using
img2dataset, please refer to here. - install clip.
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
- The final data structure looks like:
|-- laion
| |-- 00000.tar
| |-- 00001.tar
| |-- XXXXX.tar
- training with config file text-2-img.
accelerate launch train_cond_ldm.py --cfg ./configs/text2img/ddm_uncond_const.yaml
Note that the pretrained weight of the AutoEncoder is downloaded from here, and you should unzip the file.
| Task | Weight | Config |
|---|
| Uncond-Cifar10 | None | url | | Uncond-Celeb | None | url |
If you have some questions, please contact huangai@nudt.edu.cn.
Thanks to the public repos: DDPM and LDM for providing the base code.
@article{huang2023decoupled,
title={Simultaneous Image to Zero and Zero to Noise: Diffusion Models with Analytical Image Attenuation},
author={Huang, Yuhang and Qin, Zheng and Liu, Xinwang and Xu, Kai},
journal={arXiv preprint arXiv:2306.13720},
year={2023}
}

