This directory contains comprehensive documentation on modern 3D computer vision techniques, focusing on neural representations for novel view synthesis, 3D reconstruction, and scene understanding.
Before neural methods, 3D scenes were represented using:
- Point Clouds: Discrete 3D points with optional attributes (color, normals)
- Meshes: Vertices connected by edges forming polygonal surfaces
- Voxel Grids: 3D discretization of space (memory-intensive)
- Multi-View Stereo (MVS): Reconstructing geometry from multiple images
Limitations:
- Memory inefficient (especially voxels)
- Discrete representations with limited resolution
- Difficult to optimize end-to-end
- Poor handling of view-dependent effects
Neural Radiance Fields (NeRF) revolutionized 3D vision by representing scenes as continuous functions learned by neural networks:
F_θ: (x, y, z, θ, φ) → (r, g, b, σ)
Key Innovations:
- Continuous Representation: Infinite resolution through coordinate-based MLPs
- Differentiable Rendering: End-to-end optimization with volume rendering
- View-Dependent Effects: Modeling reflections, specularity, transparency
- Implicit Geometry: No explicit surface representation needed
Impact: Enabled photorealistic novel view synthesis from sparse images, sparking an explosion of research.
The evolution can be understood through three main axes:
- NeRF (2020): Hours of training, seconds per frame rendering
- Fast NeRF Variants (2021): Caching, factorization, octrees
- Instant-NGP (2022): Hash encoding, sub-minute training
- 3D Gaussian Splatting (2023): Real-time rendering (>30 FPS)
- Mip-NeRF (2021): Anti-aliasing through cone tracing
- NeRF++ (2020): Unbounded scene modeling
- Zip-NeRF (2023): Combined anti-aliasing and regularization
- SuGaR (2023): Surface-aligned Gaussians for better geometry
- Early NeRF: Static scene representation
- GaussianEditor (2023): Direct 3D editing capabilities
- DreamGaussian (2023): Text-to-3D generation
- LRM (2023): Single-image 3D reconstruction
3D Gaussian Splatting represents a paradigm shift from implicit to explicit representations:
NeRF Approach (Implicit):
- Scene = MLP network weights
- Rendering = Thousands of network queries per ray
- Optimization = Gradient descent on network parameters
Gaussian Splatting Approach (Explicit):
- Scene = Collection of 3D Gaussians with parameters (position, covariance, color, opacity)
- Rendering = Rasterization-based splatting (GPU-friendly)
- Optimization = Direct gradient descent on Gaussian parameters
Why Gaussians Won:
- Real-Time Rendering: 100-1000x faster than NeRF
- Differentiable Rasterization: GPU-optimized forward/backward pass
- Explicit Representation: Easier editing and manipulation
- Quality: Comparable or better than NeRF for many scenes
- Memory Efficient: Adaptive density based on scene complexity
-
NeRF - Original Neural Radiance Fields
- Foundation of neural 3D representations
- Volume rendering with MLPs
- Positional encoding and hierarchical sampling
-
Fast NeRF - Acceleration Techniques
- Caching and factorization
- Neural sparse voxel octrees
- Efficient sampling strategies
-
Mip-NeRF - Anti-Aliasing via Cone Tracing
- Integrated positional encoding
- Multi-scale representation
- Superior rendering quality
-
NeRF++ - Unbounded Scene Modeling
- Inverted sphere parameterization
- Foreground-background decomposition
- 360-degree outdoor scenes
-
Zip-NeRF - State-of-the-Art NeRF
- Multi-resolution hash encoding
- Anti-aliasing and regularization
- Best quality-speed tradeoff
-
3D Gaussian Splatting - Real-Time Novel View Synthesis
- Explicit 3D Gaussian primitives
- Differentiable rasterization
- Adaptive density control
-
SuGaR - Surface-Aligned Gaussians
- Regularization for surface alignment
- Extracting explicit meshes
- Better geometry reconstruction
-
GaussianEditor - 3D Scene Editing
- Semantic-aware editing
- Interactive manipulation
- Consistent multi-view editing
-
DreamGaussian - Text/Image to 3D
- Fast 3D generation from 2D priors
- Combining diffusion and Gaussians
- Mesh extraction and refinement
-
LRM - Large Reconstruction Model
- Single-image to 3D reconstruction
- Transformer-based architecture
- Generalizable across objects
-
ProlificDreamer - High-Quality Text-to-3D
- Variational Score Distillation
- Superior geometry and texture
- Multi-view consistent generation
The foundation of NeRF-based methods:
C(r) = ∫ T(t)σ(r(t))c(r(t), d) dt
where:
C(r)is the rendered color along rayrT(t) = exp(-∫σ(r(s))ds)is transmittance (accumulated transparency)σ(r(t))is volume density at pointr(t)c(r(t), d)is view-dependent color
Mapping low-dimensional coordinates to high-dimensional space:
γ(p) = [sin(2^0πp), cos(2^0πp), ..., sin(2^(L-1)πp), cos(2^(L-1)πp)]
Enables MLPs to learn high-frequency details.
Two-stage sampling strategy:
- Coarse Network: Uniform sampling to identify important regions
- Fine Network: Importance sampling focusing on high-density areas
Reduces computational cost while maintaining quality.
Each 3D Gaussian is parameterized by:
- Position μ ∈ ℝ³: Center in 3D space
- Covariance Σ ∈ ℝ³ˣ³: 3D shape and orientation
- Color c ∈ ℝ³: RGB appearance (or spherical harmonics)
- Opacity α ∈ [0,1]: Transparency
Projected to 2D for efficient rasterization.
Both NeRF and Gaussian Splatting use differentiable rendering:
- NeRF: Differentiable volume rendering through numerical integration
- Gaussians: Differentiable splatting through custom CUDA kernels
Enables end-to-end optimization from 2D images.
| Method | Training Time | Rendering Speed | Quality | Memory | Editing | Use Case |
|---|---|---|---|---|---|---|
| NeRF | Hours | Slow (FPS: 0.1) | High | Low | Hard | Research baseline |
| Fast NeRF | 1-2 hours | Medium (FPS: 1-10) | High | Medium | Hard | Balanced quality/speed |
| Mip-NeRF | Hours | Slow | Very High | Low | Hard | Best quality NeRF |
| NeRF++ | Hours | Slow | High | Low | Hard | Unbounded scenes |
| Zip-NeRF | 1-2 hours | Medium | Very High | Medium | Hard | SOTA NeRF variant |
| Gaussian Splatting | Minutes | Real-time (FPS: 60+) | High | Medium-High | Easy | Production, real-time |
| SuGaR | 30-60 min | Real-time | High | Medium | Medium | When geometry matters |
| GaussianEditor | Minutes | Real-time | High | Medium | Very Easy | Interactive editing |
| DreamGaussian | Minutes | Real-time | Medium | Low | Medium | Quick 3D generation |
| LRM | Seconds | Real-time | Medium | High | Medium | Single-image 3D |
| ProlificDreamer | Hours | N/A | Very High | Medium | Medium | High-quality generation |
- Prioritizing rendering quality over speed
- Working with complex view-dependent effects
- Memory is constrained
- Research and experimentation
- Real-time rendering is required
- Interactive editing is needed
- Training time must be minimized
- Deployment to production systems
- Creating 3D content from text/images
- Single-view 3D reconstruction
- No multi-view images available
- Fast prototyping of 3D assets
Our implementation in nexus/models/cv/ provides:
nerf/
├── nerf.py # Base NeRF implementation
├── nerf_plus_plus.py # Unbounded scene extension
├── fast_nerf.py # Acceleration techniques
├── mipnerf.py # Anti-aliased rendering
├── renderer.py # Volume rendering utilities
├── networks.py # MLP architectures
└── hierarchical.py # Hierarchical sampling
NeRFNetwork: Base MLP with positional encodingNeRFPlusPlusNetwork: Foreground-background decompositionFastNeRFNetwork: Cached and factorized renderingMipNeRFNetwork: Integrated positional encodingNeRFRenderer: Volume rendering implementation
2020: NeRF, NeRF++
└─ Foundation of neural 3D representations
2021: Mip-NeRF, Fast NeRF variants
└─ Quality improvements and acceleration
2022: Instant-NGP, TensoRF
└─ Hybrid representations, dramatic speedups
2023: 3D Gaussian Splatting, Zip-NeRF
└─ Real-time rendering, SOTA quality
2023: DreamGaussian, LRM, ProlificDreamer
└─ Generative 3D from 2D priors
2024: GaussianEditor, SuGaR
└─ Editing and geometric improvements
- Problem: NeRF optimization can be unstable
- Solutions: Learning rate scheduling, weight regularization, coarse-to-fine training
- Problem: Floaters, inconsistencies across views
- Solutions: Multi-view consistency losses, depth regularization, pruning
- Problem: NeRF rendering is computationally expensive
- Solutions: Neural acceleration, caching, Gaussian splatting, neural sparse voxels
- Problem: Overfitting with sparse inputs
- Solutions: Regularization, depth priors, semantic guidance, generative priors
- Problem: Both NeRF and Gaussians assume static scenes
- Solutions: Deformation fields, 4D representations, temporal consistency
-
Start with NeRF: Understand the foundation
from nexus.models.cv.nerf import NeRFNetwork config = { "pos_encoding_dims": 10, "dir_encoding_dims": 4, "hidden_dim": 256 } model = NeRFNetwork(config)
-
Explore Gaussian Splatting: For production use cases
- See
gaussian_splatting.mdfor implementation details - Understand differentiable rasterization
- Learn adaptive density control
- See
-
Try Generative Methods: For content creation
- DreamGaussian for quick prototypes
- LRM for single-image reconstruction
- ProlificDreamer for high quality
- NeRF: Mildenhall et al., ECCV 2020
- Gaussian Splatting: Kerbl et al., SIGGRAPH 2023
- Mip-NeRF: Barron et al., ICCV 2021
- Official NeRF: https://github.com/bmild/nerf
- Official Gaussian Splatting: https://github.com/graphdeco-inria/gaussian-splatting
- Nerfstudio: https://docs.nerf.studio/ (unified framework)
- Synthetic: NeRF Synthetic, ShapeNet
- Real-World: Mip-NeRF 360, Tanks and Temples
- Unbounded: Mip-NeRF 360, Free datasets
When adding new methods:
- Follow the documentation template (see individual method docs)
- Include mathematical formulations
- Provide code walkthroughs with references to implementation
- Add comparisons with existing methods
- Include experimental results and ablations
Active areas of research:
- Dynamic 3D: Handling moving scenes and deformations
- Generalization: Single-forward-pass reconstruction
- Compression: Reducing memory footprint for deployment
- Physics Integration: Simulating physical interactions
- Relighting: Separating lighting and materials
- Large-Scale Scenes: City-level reconstruction
Explore individual method documentation for deep dives into each technique.