This document provides an overview of all generative models with comprehensive documentation.
Each model documentation includes:
- Overview & Motivation - Why this model matters
- Theoretical Background - Core principles and theory
- Mathematical Formulation - Loss functions, equations
- High-Level Intuition - Conceptual understanding
- Implementation Details - Architecture and config
- Code Walkthrough - Key implementation sections
- Optimization Tricks - Training improvements
- Experiments & Results - Benchmarks and comparisons
- Common Pitfalls - Issues and solutions
- References - Original papers and resources
- Main README (
README.md) - Overview of all generative models - Diffusion Models (
diffusion/README.md) - Comprehensive diffusion guide - GANs (
gans.md) - All GAN variants - VAE (
vae.md) - Variational autoencoders - Audio/Video (
audio_video/README.md) - Temporal generation
- Base Diffusion (
base_diffusion.md) - DDPM foundation - Conditional Diffusion - Conditioning mechanisms
- Stable Diffusion - Latent diffusion models
- UNet - Architecture for diffusion
- DiT - Diffusion transformers
- MMDiT - Multimodal transformers (SD3/FLUX)
- Consistency Models - Single-step generation
- LCM - Latent consistency models
- Flow Matching - Continuous normalizing flows
- Rectified Flow - Straight probability paths
- PixArt-alpha - Efficient high-res generation
- CogVideoX - Text-to-video generation
- VideoPoet - LLM for video
- VALL-E - Neural codec TTS
- Voicebox - Flow-based speech
- SoundStorm - Parallel audio generation
- MusicGen - Text-to-music
- NaturalSpeech 3 - Factorized diffusion TTS
All implementations are in Nexus/nexus/models/:
nexus/models/
├── diffusion/ # Diffusion model implementations
│ ├── base_diffusion.py
│ ├── conditional_diffusion.py
│ ├── stable_diffusion.py
│ ├── unet.py
│ ├── dit.py
│ ├── mmdit.py
│ ├── consistency_model.py
│ ├── flow_matching.py
│ ├── rectified_flow.py
│ └── pixart_alpha.py
├── video/ # Video generation
│ ├── cogvideox.py
│ └── videopoet.py
├── audio/ # Audio generation
│ ├── valle.py
│ ├── voicebox.py
│ ├── soundstorm.py
│ ├── musicgen.py
│ └── naturalspeech3.py
├── gan/ # GAN models
│ ├── base_gan.py
│ ├── conditional_gan.py
│ ├── cycle_gan.py
│ └── wgan.py
└── cv/vae/ # VAE models
└── vae.py
- Start with Base Diffusion to understand DDPM
- Learn Conditional Diffusion for control
- Study Stable Diffusion for latent space
- Explore Fast Sampling methods
- Read GANs documentation
- Start with base GAN implementation
- Try WGAN-GP for stable training
- Experiment with conditional variants
- Study VAE documentation
- Understand beta-VAE trade-offs
- Try different architectures (MLP vs Conv)
- Experiment with disentanglement
- Review Audio/Video README
- Start with codec-based models (VALL-E)
- Try diffusion-based (NaturalSpeech 3)
- Explore video generation (CogVideoX)
For models without individual docs, refer to:
- Category READMEs for comprehensive overviews
- Implementation files for code details
- Main README for general concepts
Each implementation file includes:
- Detailed docstrings
- Architecture descriptions
- Key innovations explained
- Usage examples in comments
To add documentation for remaining models:
- Follow the 10-section structure listed above
- Include code examples from implementations
- Add mathematical formulations where relevant
- Reference original papers
- Provide practical optimization tips
- Document common pitfalls and solutions
See individual documentation files for complete reference lists.
Key resources:
- Lil'Log Blog - Excellent overviews
- Hugging Face Diffusion Course
- Papers with Code - Implementations and benchmarks
Last updated: 2026-02-06