Generative Models Documentation Index

Comprehensive documentation for all generative models in the Nexus framework.

Overview

This directory contains detailed documentation for generative modeling approaches, from classical GANs and VAEs to state-of-the-art diffusion models and temporal generation methods.

Total Documentation: ~5000 lines across 7 comprehensive files

Documentation Files

Core Documentation

1. Main README (`README.md`)

Lines: ~500 | Scope: Complete overview

Comparison of all generative paradigms (GANs, VAEs, Diffusion, Flow)
Key concepts: noise schedules, conditioning, latent diffusion
Implementation patterns and training loops
Evaluation metrics and common pitfalls
Quick reference for all model categories

Key Sections:

Generative modeling paradigms comparison table
Diffusion models core principles
Conditioning mechanisms (CFG, cross-attention)
Training considerations and hyperparameter guidelines

2. Diffusion Models README (`diffusion/README.md`)

Lines: ~800 | Scope: Complete diffusion guide

Forward and reverse diffusion processes
Training objectives (ε-prediction, v-prediction, x_0-prediction)
Noise schedules (linear, cosine, learned)
Sampling algorithms (DDPM, DDIM, DPM-Solver)
Classifier-free guidance
Architecture comparison table
Implementation patterns and optimization tricks

Key Sections:

The diffusion process explained
Mathematical formulation
Sampling algorithms with code
Training strategies (progressive, multi-aspect, timestep sampling)
Optimization tricks (EMA, mixed precision, gradient accumulation, Min-SNR)

3. Audio/Video README (`audio_video/README.md`)

Lines: ~600 | Scope: Temporal generation

Temporal modeling challenges
Neural audio codecs (EnCodec, SoundStream)
Video tokenization and compression
Training strategies (progressive, multi-task, two-stage)
Conditioning mechanisms (text, audio prompting, melody)
Model comparison tables
Memory optimization techniques

Key Sections:

Architecture paradigms (AR, NAR, Diffusion, Hybrid)
Neural audio codec overview
Progressive training strategies
Evaluation metrics (FVD, MOS, PESQ, FAD)
Common pitfalls and solutions

Model-Specific Documentation

4. Base Diffusion (`diffusion/base_diffusion.md`)

Lines: ~900 | Scope: DDPM foundation

Complete coverage of Denoising Diffusion Probabilistic Models:

Theoretical background (forward/reverse processes)
Mathematical formulation with detailed equations
High-level intuition (noising-denoising analogy)
Code walkthrough of key components
Optimization tricks (timestep sampling, loss weighting, EMA)
Experiments and results with comparison tables
Common pitfalls (wrong noise scale, broadcasting, EMA usage)
Hyperparameter guidelines

Standout Features:

Line-by-line code walkthrough
Noise schedule comparison experiments
Timestep analysis and recommendations
Sampling speed vs quality trade-offs

5. GANs (`gans.md`)

Lines: ~1000 | Scope: All GAN variants

Comprehensive GAN documentation covering:

Base GAN, Conditional GAN, CycleGAN, WGAN
Adversarial training dynamics
Mathematical formulation (standard, Wasserstein, conditional)
Implementation details (generator, discriminator architectures)
Training loop with code examples
Optimization tricks (label smoothing, spectral norm, TTUR)
Mode collapse and training instability solutions
GAN variant comparison table

Standout Features:

Complete training loop implementation
WGAN-GP gradient penalty code
Comprehensive stability tricks
Architecture impact analysis

6. VAE (`vae.md`)

Lines: ~1100 | Scope: Variational autoencoders

In-depth VAE documentation including:

Probabilistic formulation and ELBO
Reparameterization trick
Beta-VAE for disentanglement
Mathematical formulation with KL divergence
Code walkthrough (encoder, decoder, loss)
Optimization tricks (KL annealing, free bits, IWAE)
Posterior collapse solutions
Beta-VAE trade-off analysis

Standout Features:

Complete loss computation explanation
Reparameterization trick details
Advanced variants (CVAE, Hierarchical, VQ-VAE)
Disentanglement metrics

7. Implementation Overview (`MODELS_IMPLEMENTED.md`)

Lines: ~200 | Scope: Navigation guide

Quick reference document:

Documentation structure overview
Completion status for all models
Implementation file paths
Quick start guides by category
Contributing guidelines

Key Features Across All Documentation

1. Structured Format

Every document follows a consistent 10-section structure:

Overview & Motivation
Theoretical Background
Mathematical Formulation
High-Level Intuition
Implementation Details
Code Walkthrough
Optimization Tricks
Experiments & Results
Common Pitfalls
References

2. Practical Code Examples

Complete training loops
Sampling algorithms
Loss computation
Architecture implementations
Optimization techniques

3. Comparison Tables

Model architectures
Performance metrics
Training strategies
Hyperparameter recommendations

4. Mathematical Rigor

Formal equations and objectives
Derivations where helpful
Intuitive explanations
Connection to theory

5. Troubleshooting Guides

Common pitfalls identified
Symptoms described
Solutions provided
Code fixes included

Quick Navigation

By Skill Level

Beginner:

Start with main README.md overview
Read category READMEs for chosen domain
Study base implementations
Follow quick start guides

Intermediate:

Deep dive into specific model docs
Study optimization tricks sections
Review experiments and results
Implement variants

Advanced:

Read mathematical formulations
Study advanced techniques
Implement custom variants
Contribute new models

By Goal

Understanding Theory:

Main README → Category README → Model-specific doc
Focus on sections 1-4 (Overview through Intuition)

Implementation:

Category README → Code Walkthrough sections
Review Implementation Details and Code Structure

Optimization:

Jump to Optimization Tricks sections
Review Common Pitfalls
Study Experiments & Results

Research:

Read Mathematical Formulation sections
Review References
Study comparison tables

Coverage Summary

Fully Documented (7 files)

✅ Main Overview (README.md)
✅ Diffusion Category (diffusion/README.md)
✅ Base Diffusion Model (diffusion/base_diffusion.md)
✅ GAN Models (gans.md)
✅ VAE Models (vae.md)
✅ Audio/Video Category (audio_video/README.md)
✅ Implementation Guide (MODELS_IMPLEMENTED.md)

Category READMEs Provide Coverage For

While individual docs are not yet created for every model, the comprehensive category READMEs cover:

Diffusion Models (in diffusion/README.md):

Conditional Diffusion
Stable Diffusion
UNet Architecture
DiT, MMDiT
Consistency Models, LCM
Flow Matching, Rectified Flow
PixArt-alpha

Audio/Video Models (in audio_video/README.md):

CogVideoX
VideoPoet
VALL-E
Voicebox
SoundStorm
MusicGen
NaturalSpeech 3

File Sizes and Scope

File	Lines	Topics	Detail Level
README.md	~500	All paradigms	High-level overview
diffusion/README.md	~800	All diffusion	Comprehensive guide
diffusion/base_diffusion.md	~900	DDPM	Deep dive
gans.md	~1000	All GANs	Comprehensive
vae.md	~1100	VAE variants	Deep dive
audio_video/README.md	~600	Audio/video	Comprehensive guide
MODELS_IMPLEMENTED.md	~200	Navigation	Reference

How to Use This Documentation

For Learning

Start here: Main README.md
Choose domain: Pick category README
Deep dive: Read specific model docs
Practice: Implement from examples

For Implementation

Category README: Understand approach
Code walkthrough: Follow examples
Implementation details: Configure model
Optimization: Apply tricks

For Research

Mathematical formulation: Study equations
Experiments: Review benchmarks
References: Read original papers
Advanced sections: Explore variants

For Debugging

Common pitfalls: Check known issues
Troubleshooting: Apply solutions
Hyperparameters: Verify settings
Code examples: Compare implementation

Model Implementation Status

All models have working implementations in Nexus/nexus/models/:

Diffusion Models ✅

base_diffusion.py
conditional_diffusion.py
stable_diffusion.py
unet.py
dit.py
mmdit.py
consistency_model.py
flow_matching.py
rectified_flow.py
pixart_alpha.py

GAN Models ✅

base_gan.py
conditional_gan.py
cycle_gan.py
wgan.py

VAE Models ✅

vae.py (with MLP and Conv variants)

Audio Models ✅

valle.py
voicebox.py
soundstorm.py
musicgen.py
naturalspeech3.py

Video Models ✅

cogvideox.py
videopoet.py

Next Steps

Additional Documentation

Individual model docs can be added following the template in base_diffusion.md:

Conditional Diffusion
Stable Diffusion
UNet
DiT, MMDiT
Consistency Models
Flow Matching variants
Audio/video models

Enhancements

Add more code examples
Include training scripts
Add visualization notebooks
Create model comparison notebooks
Add architecture diagrams

Community

Contribution guidelines in place
Consistent documentation structure
Easy to extend and improve

Additional Resources

External Links

Papers with Code: Implementations and benchmarks
Lil'Log Blog: Excellent overviews
Hugging Face Diffusion Course
Annotated Diffusion

Internal References

Core architecture components: /nexus/components/
Training infrastructure: /nexus/core/
Example configs: Check implementation files

Documentation Stats:

7 comprehensive markdown files
~5000 total lines
100+ code examples
20+ comparison tables
50+ references to papers

Last Updated: 2026-02-06

For questions or contributions, refer to MODELS_IMPLEMENTED.md

FilesExpand file tree

DOCUMENTATION_INDEX.md

Latest commit

History

DOCUMENTATION_INDEX.md

File metadata and controls

Generative Models Documentation Index

Overview

Documentation Files

Core Documentation

1. Main README (README.md)

2. Diffusion Models README (diffusion/README.md)

3. Audio/Video README (audio_video/README.md)

Model-Specific Documentation

4. Base Diffusion (diffusion/base_diffusion.md)

5. GANs (gans.md)

6. VAE (vae.md)

7. Implementation Overview (MODELS_IMPLEMENTED.md)

Key Features Across All Documentation

1. Structured Format

2. Practical Code Examples

3. Comparison Tables

4. Mathematical Rigor

5. Troubleshooting Guides

Quick Navigation

By Skill Level

By Goal

Coverage Summary

Fully Documented (7 files)

Category READMEs Provide Coverage For

File Sizes and Scope

How to Use This Documentation

For Learning

For Implementation

For Research

For Debugging

Model Implementation Status

Diffusion Models ✅

GAN Models ✅

VAE Models ✅

Audio Models ✅

Video Models ✅

Next Steps

Additional Documentation

Enhancements

Community

Additional Resources

External Links

Internal References

1. Main README (`README.md`)

2. Diffusion Models README (`diffusion/README.md`)

3. Audio/Video README (`audio_video/README.md`)

4. Base Diffusion (`diffusion/base_diffusion.md`)

5. GANs (`gans.md`)

6. VAE (`vae.md`)

7. Implementation Overview (`MODELS_IMPLEMENTED.md`)