Skip to content

Add a configurable framework for neural operator learning based on the DeepONet and its variants #1255

@wdyab

Description

@wdyab

Summary

I would like to contribute a comprehensive deep learning framework for neural operator learning for reservoir simulation to PhysicsNemo. This framework is designed for broader subsurface flow applications but tested on 2D + time dataset (so far).

Features

Neural Operator Architectures

Completed Architectures:

  • U-FNO
  • Conv-FNO
  • Conv-U-FNO
  • Standard FNO
  • Standalone UNet

**Future Architectures (Planned):

  • DeepONet, U-DeepONet, Fourier-DeepONet
  • MIONet, Fourier-MIONet
  • Additional DeepONet variants

**Dimensionality Support:

  • Completed: 2D+T (spatial + temporal)
  • Partially completed: 3D+T (spatial + temporal)

Training Infrastructure

Distributed & Scalable Training:

  • Full DDP support via PhysicsNemo's DistributedManager
  • Multi-GPU training with automatic data sharding
  • Mixed precision training (AMP) with GradScaler for memory efficiency where applicable
  • CUDA graphs support for reduced kernel launch overhead
  • cuDNN benchmarking and deterministic modes

Checkpoint & Resume:

  • Automatic model checkpoint naming: best_model_{variable}_{architecture}.pth
  • Prevents accidental model overwriting
  • Full training state save/resume (model, optimizer, epoch, metrics)
  • Integration with PhysicsNemo's checkpoint utilities

Loss Functions & Physics

Multiple Loss Types:

  • MSE (Mean Squared Error / L2)
  • L1 (Mean Absolute Error)
  • Relative L2 (scale-invariant)
  • Simple Relative L2 (no masking/derivatives)

Unified Loss Framework:

  • Physics-informed losses: Optional spatial derivative constraints (∂x, ∂z) using central finite differences
  • Domain masking: Apply loss only on active reservoir regions (irregular geometries)

Data Pipeline Optimization

High-Performance Data Loading:

  • CUDA pinned memory for faster CPU→GPU transfers
  • Persistent workers (kept alive between epochs)
  • Auto-tuned num_workers (2× num_gpus)
  • Custom collate function optimized for 3D+T spatiotemporal data
  • Distributed sampling with proper data sharding

Data Validation:

  • Dynamic dimension checking (B, H, W, T, C format)
  • Detailed error reporting with expected vs actual shapes
  • Validation during training startup

Configuration System

Hydra-Based Configuration:

  • Two-file system for clean separation:
    • model_config.yaml: Architecture and loss configuration
    • training_config.yaml: Training, data, optimizer settings
  • Production-ready good defaults

Experiment Tracking

Multiple Logging Backends:

  • MLFlow
  • TensorBoard
  • Automatic logging of hyperparameters, metrics, and model architecture

Evaluation & Metrics

Comprehensive Evaluation:

  • Mean Plume Error (MPE) - domain-specific metric
  • Mean Absolute Error (MAE)
  • R² Score (coefficient of determination)
  • Relative L2 Error (scale-invariant)
  • Mean Relative Error (MRE)

Evaluation Features:

  • Metrics averaged over entire test dataset
  • Checkpoint-based evaluation with automatic model reconstruction
  • Separate evaluation scripts for pressure and saturation
  • Denormalization utilities for physical units

Extensibility: Framework designed for broader subsurface flow simulation problems

Location

examples/reservoir_simulation/DeepONet/

Implementation Statistics

  • 17 source files
  • ~4,600 lines of code
  • All pre-commit hooks passing
  • Tested on multi-GPU systems (up to 8 GPUs)

Ready for Review

The current implementation is ready for review. The code follows PhysicsNemo conventions and integrates seamlessly with existing utilities.

Future work will extend the framework with additional DeepONet variants, enhanced physics-informed loss terms, and support for 3D+T datasets with comprehensive testing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions