Skip to content

krishnakoushik225/GenDiff-PEFT-Efficient-Conditional-Diffusion-Optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

GenDiff-PEFT

Parameter-Efficient Optimization of Conditional Diffusion Models


📌 Overview

GenDiff-PEFT is a research-focused implementation and optimization study of conditional diffusion models on CIFAR-10.

This project investigates architectural and sampling strategies to improve image quality while significantly reducing training cost.

Key Contributions

  • Multi-resolution attention (8×8 and 16×16)
  • Classifier-Free Guidance (CFG) ablation study
  • DDIM sampling optimization (50 → 100 steps)
  • Parameter-efficient fine-tuning
  • Quantitative evaluation using FID and Inception Score

Results

  • 17.1% FID improvement (290.19 → 240.48)
  • 85% reduction in training time
  • Empirical validation of the quality–diversity tradeoff

🧠 Motivation

Diffusion models achieve state-of-the-art generative performance but are computationally expensive and sensitive to architectural design.

This project explores:

  • The effect of attention resolution on semantic coherence
  • The tradeoff between fidelity and diversity via CFG scaling
  • The impact of sampling steps on output refinement
  • Transfer learning for efficient experimentation

🏗 Model Architecture

Baseline Model

  • Conditional UNet
  • CIFAR-10 (32×32×3)
  • Attention at 8×8 resolution
  • 50-step DDIM sampling
  • 95 epochs training
  • AdamW optimizer
  • Mixed precision (FP16)

Enhanced Model

  • Added 16×16 multi-head self-attention
  • Extended DDIM sampling (100 steps)
  • Cosine learning rate scheduling
  • Exponential Moving Average (EMA)
  • Fine-tuned from baseline checkpoint (10 epochs)

📊 Quantitative Results

Configuration FID ↓ IS ↑ Steps CFG Scale
Baseline 290.19 2.240 50 5.0
Improved (w=7.5) 240.48 1.563 100 7.5
Improved (w=3.5) 292.46 1.856 100 3.5

Observations

  • Higher CFG improves fidelity but reduces diversity.
  • Multi-resolution attention improves structural coherence.
  • Increasing sampling steps significantly refines generation quality.
  • Fine-tuning reduces training time from ~8.5 hours to ~1 hour.

🔬 Ablation Studies

Sampling Steps

Increasing DDIM steps from 50 to 100 improves FID by approximately 8–12 points.

Multi-Resolution Attention

Adding 16×16 attention improves mid-level feature modeling and reduces FID by 5–7 points.

Classifier-Free Guidance

CFG scale analysis reveals a fundamental quality–diversity tradeoff:

  • Higher guidance → better fidelity
  • Lower guidance → better diversity

⚙️ Training Configuration

  • Optimizer: AdamW
  • Learning Rate: 2e-4 (baseline), 1e-4 (fine-tune)
  • Scheduler: Cosine Annealing
  • Batch Size: 128
  • Mixed Precision (FP16)
  • EMA decay: 0.9999
  • Deterministic DDIM sampling (η = 0)

📁 Repository Structure

GenDiff-PEFT/ │ ├── notebooks/ │ └── diffusion_unet_experiments.ipynb │ ├── reports/ │ └── Improving_Image_Quality_and_Training_Efficiency_in_Conditional_Diffusion_Models.pdf │ ├── models/ │ ├── baseline/ │ └── finetuned/ │ ├── results/ │ └── README.md


🚀 Getting Started

Install Dependencies

pip install -r requirements.txt

Train Baseline Model

python train.py

Fine-Tune Enhanced Model

python finetune.py

Generate Samples

python sample.py --cfg 7.5 --steps 100


📈 Evaluation Metrics

  • Frechet Inception Distance (FID)
  • Inception Score (IS)
  • 10,000 generated samples per configuration
  • Deterministic sampling for reproducibility

🔮 Future Work

  • Dynamic CFG scheduling
  • Efficient attention variants (linear or sparse attention)
  • Diversity-preserving guidance strategies
  • Higher resolution datasets (256×256+)
  • Sampling acceleration via distillation or consistency models

👤 Author

Krishna Koushik Unnam
M.S. Computer Science & Engineering
University of South Florida


About

Parameter-efficient optimization of conditional diffusion models using multi-resolution attention, classifier-free guidance ablation, and DDIM sampling — achieving 17% FID improvement with 85% reduced training time.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors