-
Notifications
You must be signed in to change notification settings - Fork 498
Description
Summary
I would like to contribute a comprehensive deep learning framework for neural operator learning for reservoir simulation to PhysicsNemo. This framework is designed for broader subsurface flow applications but tested on 2D + time dataset (so far).
Features
Neural Operator Architectures
Completed Architectures:
- U-FNO
- Conv-FNO
- Conv-U-FNO
- Standard FNO
- Standalone UNet
**Future Architectures (Planned):
- DeepONet, U-DeepONet, Fourier-DeepONet
- MIONet, Fourier-MIONet
- Additional DeepONet variants
**Dimensionality Support:
- Completed: 2D+T (spatial + temporal)
- Partially completed: 3D+T (spatial + temporal)
Training Infrastructure
Distributed & Scalable Training:
- Full DDP support via PhysicsNemo's DistributedManager
- Multi-GPU training with automatic data sharding
- Mixed precision training (AMP) with GradScaler for memory efficiency where applicable
- CUDA graphs support for reduced kernel launch overhead
- cuDNN benchmarking and deterministic modes
Checkpoint & Resume:
- Automatic model checkpoint naming:
best_model_{variable}_{architecture}.pth - Prevents accidental model overwriting
- Full training state save/resume (model, optimizer, epoch, metrics)
- Integration with PhysicsNemo's checkpoint utilities
Loss Functions & Physics
Multiple Loss Types:
- MSE (Mean Squared Error / L2)
- L1 (Mean Absolute Error)
- Relative L2 (scale-invariant)
- Simple Relative L2 (no masking/derivatives)
Unified Loss Framework:
- Physics-informed losses: Optional spatial derivative constraints (∂x, ∂z) using central finite differences
- Domain masking: Apply loss only on active reservoir regions (irregular geometries)
Data Pipeline Optimization
High-Performance Data Loading:
- CUDA pinned memory for faster CPU→GPU transfers
- Persistent workers (kept alive between epochs)
- Auto-tuned num_workers (2× num_gpus)
- Custom collate function optimized for 3D+T spatiotemporal data
- Distributed sampling with proper data sharding
Data Validation:
- Dynamic dimension checking (B, H, W, T, C format)
- Detailed error reporting with expected vs actual shapes
- Validation during training startup
Configuration System
Hydra-Based Configuration:
- Two-file system for clean separation:
model_config.yaml: Architecture and loss configurationtraining_config.yaml: Training, data, optimizer settings
- Production-ready good defaults
Experiment Tracking
Multiple Logging Backends:
- MLFlow
- TensorBoard
- Automatic logging of hyperparameters, metrics, and model architecture
Evaluation & Metrics
Comprehensive Evaluation:
- Mean Plume Error (MPE) - domain-specific metric
- Mean Absolute Error (MAE)
- R² Score (coefficient of determination)
- Relative L2 Error (scale-invariant)
- Mean Relative Error (MRE)
Evaluation Features:
- Metrics averaged over entire test dataset
- Checkpoint-based evaluation with automatic model reconstruction
- Separate evaluation scripts for pressure and saturation
- Denormalization utilities for physical units
Extensibility: Framework designed for broader subsurface flow simulation problems
Location
examples/reservoir_simulation/DeepONet/
Implementation Statistics
- 17 source files
- ~4,600 lines of code
- All pre-commit hooks passing
- Tested on multi-GPU systems (up to 8 GPUs)
Ready for Review
The current implementation is ready for review. The code follows PhysicsNemo conventions and integrates seamlessly with existing utilities.
Future work will extend the framework with additional DeepONet variants, enhanced physics-informed loss terms, and support for 3D+T datasets with comprehensive testing.