Comprehensive documentation for all generative models in the Nexus framework.
This directory contains detailed documentation for generative modeling approaches, from classical GANs and VAEs to state-of-the-art diffusion models and temporal generation methods.
Total Documentation: ~5000 lines across 7 comprehensive files
Lines: ~500 | Scope: Complete overview
- Comparison of all generative paradigms (GANs, VAEs, Diffusion, Flow)
- Key concepts: noise schedules, conditioning, latent diffusion
- Implementation patterns and training loops
- Evaluation metrics and common pitfalls
- Quick reference for all model categories
Key Sections:
- Generative modeling paradigms comparison table
- Diffusion models core principles
- Conditioning mechanisms (CFG, cross-attention)
- Training considerations and hyperparameter guidelines
Lines: ~800 | Scope: Complete diffusion guide
- Forward and reverse diffusion processes
- Training objectives (ε-prediction, v-prediction, x_0-prediction)
- Noise schedules (linear, cosine, learned)
- Sampling algorithms (DDPM, DDIM, DPM-Solver)
- Classifier-free guidance
- Architecture comparison table
- Implementation patterns and optimization tricks
Key Sections:
- The diffusion process explained
- Mathematical formulation
- Sampling algorithms with code
- Training strategies (progressive, multi-aspect, timestep sampling)
- Optimization tricks (EMA, mixed precision, gradient accumulation, Min-SNR)
Lines: ~600 | Scope: Temporal generation
- Temporal modeling challenges
- Neural audio codecs (EnCodec, SoundStream)
- Video tokenization and compression
- Training strategies (progressive, multi-task, two-stage)
- Conditioning mechanisms (text, audio prompting, melody)
- Model comparison tables
- Memory optimization techniques
Key Sections:
- Architecture paradigms (AR, NAR, Diffusion, Hybrid)
- Neural audio codec overview
- Progressive training strategies
- Evaluation metrics (FVD, MOS, PESQ, FAD)
- Common pitfalls and solutions
Lines: ~900 | Scope: DDPM foundation
Complete coverage of Denoising Diffusion Probabilistic Models:
- Theoretical background (forward/reverse processes)
- Mathematical formulation with detailed equations
- High-level intuition (noising-denoising analogy)
- Code walkthrough of key components
- Optimization tricks (timestep sampling, loss weighting, EMA)
- Experiments and results with comparison tables
- Common pitfalls (wrong noise scale, broadcasting, EMA usage)
- Hyperparameter guidelines
Standout Features:
- Line-by-line code walkthrough
- Noise schedule comparison experiments
- Timestep analysis and recommendations
- Sampling speed vs quality trade-offs
Lines: ~1000 | Scope: All GAN variants
Comprehensive GAN documentation covering:
- Base GAN, Conditional GAN, CycleGAN, WGAN
- Adversarial training dynamics
- Mathematical formulation (standard, Wasserstein, conditional)
- Implementation details (generator, discriminator architectures)
- Training loop with code examples
- Optimization tricks (label smoothing, spectral norm, TTUR)
- Mode collapse and training instability solutions
- GAN variant comparison table
Standout Features:
- Complete training loop implementation
- WGAN-GP gradient penalty code
- Comprehensive stability tricks
- Architecture impact analysis
Lines: ~1100 | Scope: Variational autoencoders
In-depth VAE documentation including:
- Probabilistic formulation and ELBO
- Reparameterization trick
- Beta-VAE for disentanglement
- Mathematical formulation with KL divergence
- Code walkthrough (encoder, decoder, loss)
- Optimization tricks (KL annealing, free bits, IWAE)
- Posterior collapse solutions
- Beta-VAE trade-off analysis
Standout Features:
- Complete loss computation explanation
- Reparameterization trick details
- Advanced variants (CVAE, Hierarchical, VQ-VAE)
- Disentanglement metrics
Lines: ~200 | Scope: Navigation guide
Quick reference document:
- Documentation structure overview
- Completion status for all models
- Implementation file paths
- Quick start guides by category
- Contributing guidelines
Every document follows a consistent 10-section structure:
- Overview & Motivation
- Theoretical Background
- Mathematical Formulation
- High-Level Intuition
- Implementation Details
- Code Walkthrough
- Optimization Tricks
- Experiments & Results
- Common Pitfalls
- References
- Complete training loops
- Sampling algorithms
- Loss computation
- Architecture implementations
- Optimization techniques
- Model architectures
- Performance metrics
- Training strategies
- Hyperparameter recommendations
- Formal equations and objectives
- Derivations where helpful
- Intuitive explanations
- Connection to theory
- Common pitfalls identified
- Symptoms described
- Solutions provided
- Code fixes included
Beginner:
- Start with main
README.mdoverview - Read category READMEs for chosen domain
- Study base implementations
- Follow quick start guides
Intermediate:
- Deep dive into specific model docs
- Study optimization tricks sections
- Review experiments and results
- Implement variants
Advanced:
- Read mathematical formulations
- Study advanced techniques
- Implement custom variants
- Contribute new models
Understanding Theory:
- Main README → Category README → Model-specific doc
- Focus on sections 1-4 (Overview through Intuition)
Implementation:
- Category README → Code Walkthrough sections
- Review Implementation Details and Code Structure
Optimization:
- Jump to Optimization Tricks sections
- Review Common Pitfalls
- Study Experiments & Results
Research:
- Read Mathematical Formulation sections
- Review References
- Study comparison tables
- ✅ Main Overview (README.md)
- ✅ Diffusion Category (diffusion/README.md)
- ✅ Base Diffusion Model (diffusion/base_diffusion.md)
- ✅ GAN Models (gans.md)
- ✅ VAE Models (vae.md)
- ✅ Audio/Video Category (audio_video/README.md)
- ✅ Implementation Guide (MODELS_IMPLEMENTED.md)
While individual docs are not yet created for every model, the comprehensive category READMEs cover:
Diffusion Models (in diffusion/README.md):
- Conditional Diffusion
- Stable Diffusion
- UNet Architecture
- DiT, MMDiT
- Consistency Models, LCM
- Flow Matching, Rectified Flow
- PixArt-alpha
Audio/Video Models (in audio_video/README.md):
- CogVideoX
- VideoPoet
- VALL-E
- Voicebox
- SoundStorm
- MusicGen
- NaturalSpeech 3
| File | Lines | Topics | Detail Level |
|---|---|---|---|
| README.md | ~500 | All paradigms | High-level overview |
| diffusion/README.md | ~800 | All diffusion | Comprehensive guide |
| diffusion/base_diffusion.md | ~900 | DDPM | Deep dive |
| gans.md | ~1000 | All GANs | Comprehensive |
| vae.md | ~1100 | VAE variants | Deep dive |
| audio_video/README.md | ~600 | Audio/video | Comprehensive guide |
| MODELS_IMPLEMENTED.md | ~200 | Navigation | Reference |
- Start here: Main
README.md - Choose domain: Pick category README
- Deep dive: Read specific model docs
- Practice: Implement from examples
- Category README: Understand approach
- Code walkthrough: Follow examples
- Implementation details: Configure model
- Optimization: Apply tricks
- Mathematical formulation: Study equations
- Experiments: Review benchmarks
- References: Read original papers
- Advanced sections: Explore variants
- Common pitfalls: Check known issues
- Troubleshooting: Apply solutions
- Hyperparameters: Verify settings
- Code examples: Compare implementation
All models have working implementations in Nexus/nexus/models/:
- base_diffusion.py
- conditional_diffusion.py
- stable_diffusion.py
- unet.py
- dit.py
- mmdit.py
- consistency_model.py
- flow_matching.py
- rectified_flow.py
- pixart_alpha.py
- base_gan.py
- conditional_gan.py
- cycle_gan.py
- wgan.py
- vae.py (with MLP and Conv variants)
- valle.py
- voicebox.py
- soundstorm.py
- musicgen.py
- naturalspeech3.py
- cogvideox.py
- videopoet.py
Individual model docs can be added following the template in base_diffusion.md:
- Conditional Diffusion
- Stable Diffusion
- UNet
- DiT, MMDiT
- Consistency Models
- Flow Matching variants
- Audio/video models
- Add more code examples
- Include training scripts
- Add visualization notebooks
- Create model comparison notebooks
- Add architecture diagrams
- Contribution guidelines in place
- Consistent documentation structure
- Easy to extend and improve
- Papers with Code: Implementations and benchmarks
- Lil'Log Blog: Excellent overviews
- Hugging Face Diffusion Course
- Annotated Diffusion
- Core architecture components:
/nexus/components/ - Training infrastructure:
/nexus/core/ - Example configs: Check implementation files
Documentation Stats:
- 7 comprehensive markdown files
- ~5000 total lines
- 100+ code examples
- 20+ comparison tables
- 50+ references to papers
Last Updated: 2026-02-06
For questions or contributions, refer to MODELS_IMPLEMENTED.md