Project completed on May 3, 2025.
This project explores denoising diffusion models by training various UNet-based architectures to generate handwritten digits from noise. The models are trained and evaluated on the MNIST dataset, progressively improving generation quality by learning to reverse a diffusion (noising) process. The final models include both time-conditioned and class-conditioned versions of DDPMs, along with visualization of generation steps.
The primary goal was to understand the diffusion process, implement it step-by-step in PyTorch, and generate interpretable results using GIFs and sampled outputs.
- Single-Step Denoising UNet:
- A UNet model trained to denoise a noisy MNIST image with fixed noise level (σ=0.5).
- Visualizations of denoising results and out-of-distribution noise testing.
- Time-Conditioned UNet (DDPM):
- Implements full DDPM training with time embeddings injected into the UNet.
- Samples generated by iteratively denoising a noise image over 300 timesteps.
- Class-Conditioned DDPM:
- Adds class-conditioning to guide the generation of specific digits (0–9).
- Implements classifier-free guidance for more controllable outputs.
- Training Visualizations & GIFs:
- Model outputs tracked at multiple epochs (1, 5, 10, 15, 20).
- GIFs show the full reverse diffusion process frame by frame.
- Time-conditioned DDPM
epoch_1.mp4
epoch_20.mp4
- Class-conditioned DDPM
epoch_1.mp4
epoch_20.mp4
As shown in the samples, class-conditioning significantly improves the denoising process, enabling clearer and more controlled digit generation by epoch 20.
ddpm-implementation.ipynb-- Main Jupyter Notebook with full implementation.Report.pdf-- Summary of training progress and results.