NVIDIA · wdyab · Nov 26, 2025 · Nov 26, 2025 · greptile-apps · Nov 26, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,6 +10,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
+- Examples: Added DeepONet framework for CO2 sequestration modeling in
+  `examples/reservoir_simulation/DeepONet/`. Features multiple neural operator
+  architectures (U-FNO, Conv-FNO, Standalone UNet), physics-informed losses with
+  spatial derivatives and domain masking, distributed training with DDP and AMP,
+  comprehensive experiment tracking (MLFlow, TensorBoard, W&B), and flexible
+  Hydra-based configuration system. (#1255)
+
 ### Changed
 
 ### Deprecated

diff --git a/examples/reservoir_simulation/DeepONet/.gitignore b/examples/reservoir_simulation/DeepONet/.gitignore
@@ -0,0 +1,30 @@
+# Temporary directories
+logs/
+outputs/
+mlruns/
+visualizations/
+__pycache__/
+checkpoints/
+
+# NFS lock files
+.nfs*
+
+# Cluster-specific files
+*.sbatch
+
+# Temporary scripts
+check_gpu_memory.sh
+launch.log
+
+# Python cache
+*.pyc
+*.pyo
+*.pyd
+.Python
+
+# Jupyter
+.ipynb_checkpoints/
+
+# IDE
+.vscode/
+.idea/
diff --git a/examples/reservoir_simulation/DeepONet/README.md b/examples/reservoir_simulation/DeepONet/README.md
@@ -0,0 +1,145 @@
+# U-FNO for CO2 Sequestration
+
+Deep learning models for predicting pressure and saturation in CO2 sequestration reservoirs.
+
+## Quick Start
+
+### 1. Install Dependencies
+
+```bash
+pip install -r requirements.txt
+```
+
+### 2. Training
+
+```bash
+# Edit configuration
+vim conf/training_config.yaml
+
+# Run training (single GPU)
+python train_fno3d.py
+
+# Run training (multi-GPU with DDP)
+torchrun --nproc_per_node=4 train_fno3d.py
+```
+
+### 3. Evaluation
+
+```bash
+# Evaluate pressure model
+python evaluate_pressure.py --checkpoint checkpoints/best_model_pressure_*.pth
+
+# Evaluate saturation model
+python evaluate_saturation.py --checkpoint checkpoints/best_model_saturation_*.pth
+```
+
+## Available Models
+
+| Model Type | Description | Best For |
+|-----------|-------------|----------|
+| **U-FNO** | Fourier + U-Net (hybrid) | Best accuracy, spatiotemporal PDEs |
+| **Conv-FNO** | Fourier + 3D Convolutions | Balanced performance/speed |
+| **Standalone UNet** | Pure spatial convolutions | Baseline comparisons |
+| **Standard FNO** | Pure Fourier layers | Global patterns, fast training |
+
+All models support both **custom** and **PhysicsNemo** UNet implementations.
+
+## Documentation
+
+### Core Guides
+
+- Configuration system (model_config.yaml + training_config.yaml)
+- Model architectures and parameters
+- Data format requirements and validation
+- Model evaluation system
+
+### Architecture Guides
+
+- Conv-FNO architecture details
+
+## File Structure
+
+```text
+U-FNO/
+├── conf/
+│   ├── model_config.yaml       # Model architecture & loss
+│   └── training_config.yaml    # Training, data, optimizer settings
+├── train_fno3d.py              # Training script
+├── evaluate_pressure.py        # Pressure evaluation
+├── evaluate_saturation.py      # Saturation evaluation
+├── ufno.py                     # U-FNO model architectures
+├── unet3d.py                   # Custom UNet implementations
+├── physicsnemo_unet.py         # PhysicsNemo UNet wrapper
+├── data_validation.py          # Data validation utilities
+├── dataset.py                  # Data loading
+├── losses.py                   # Loss functions
+├── metrics.py                  # Evaluation metrics
+└── checkpoints/                # Trained models
+```
+
+## Configuration
+
+### Two-File System
+
+1. **`model_config.yaml`** - Model architecture and loss (rarely changed)
+2. **`training_config.yaml`** - Training parameters (frequently tuned)
+
+### Example: Train U-FNO
+
+```yaml
+# training_config.yaml
+data:
+  variable: pressure  # or "saturation"
+
+# model_config.yaml
+arch:
+  model_type: ufno
+  ufno:
+    num_fno_layers: 3
+    num_unet_layers: 3
+    num_conv_layers: 0
+    unet_type: custom  # or "physicsnemo"
+```
+
+### Example: Train Conv-FNO
+
+```yaml
+# model_config.yaml
+arch:
+  model_type: ufno
+  ufno:
+    num_fno_layers: 3
+    num_unet_layers: 0    # Disable U-Net
+    num_conv_layers: 3    # Enable Conv
+```
+
+### Example: Train Standalone UNet
+
+```yaml
+# model_config.yaml
+arch:
+  model_type: unet
+  unet:
+    unet_type: physicsnemo  # NOTE: Only physicsnemo supported for standalone use
+```
+
+**Note:** Custom UNet3D is designed for U-FNO only (constant channel dimensions).  
+For standalone UNet, always use `unet_type: physicsnemo`.
+
+## Model Checkpoints
+
+Models are automatically named based on architecture:
+
+- `best_model_pressure_ufno_custom.pth`
+- `best_model_pressure_convfno.pth`
+- `best_model_saturation_unet_physicsnemo.pth`
+- etc.
+
+No manual naming needed - prevents accidental overwriting!
+
+## Citation
+
+If you use this code, please cite:
+
+- PhysicsNemo: [NVIDIA PhysicsNemo](https://github.com/NVIDIA/physicsnemo)
+- U-FNO: [Your paper/reference]
diff --git a/examples/reservoir_simulation/DeepONet/conf/model_config.yaml b/examples/reservoir_simulation/DeepONet/conf/model_config.yaml
@@ -0,0 +1,101 @@
+# SPDX-FileCopyrightText: Copyright (c) 2023 - 2025 NVIDIA CORPORATION & AFFILIATES.
+# SPDX-FileCopyrightText: All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Model Architecture and Loss Configuration
+
+arch:
+  # Model type: "ufno" (Fourier + U-Net) or "unet" (pure U-Net, no Fourier layers)
+  model_type: ufno  # Options: "ufno", "unet"
+
+  ufno:
+    in_channels: 12         # Input channels (12 physical quantities)
+    out_channels: 1         # Output channel (pressure dP OR saturation sg)
+    width: 36               # Latent feature dimension (reduced for 3D)
+    modes1: 10              # Number of Fourier modes in height direction  
+    modes2: 10              # Number of Fourier modes in width direction
+    modes3: 10              # Number of Fourier modes in time direction
+    num_fno_layers: 3       # Standard Fourier layers (no enhancement)
+    num_unet_layers: 3      # U-Net enhanced layers (U-FNO: set > 0, Conv-FNO: set to 0)
+    num_conv_layers: 0      # Conv enhanced layers (Conv-FNO: set > 0, U-FNO: set to 0)
+    padding: 8              # Spatial padding
+    conv_kernel_size: 3     # Conv layer kernel size (for Conv-FNO)
+    unet_kernel_size: 3     # U-Net convolutional kernel size
+    unet_dropout: 0.0       # U-Net dropout rate
+    unet_type: physicsnemo  # UNet type: "custom" (UNet3D) or "physicsnemo" (PhysicsNemo's UNet)
+
+    # Activation function 
+    # Options: "relu", "gelu", "silu", "swish", "mish", "tanh", "leaky_relu", etc.
+    activation_fn: relu     
+
+    # Lifting network configuration (input -> latent space)
+    lifting_type: mlp       # Type: "mlp" or "conv"
+    lifting_layers: 1       # Number of layers 
+    lifting_width: 2        # Hidden width factor: hidden_width = width // lifting_width
+
+    # Decoder network configuration (latent space -> output)
+    decoder_type: mlp       # Type: "mlp" or "conv"
+    decoder_layers: 1       # Number of hidden layers (1 = original U-FNO)
+    decoder_width: 128      # Hidden layer size
+
+  # Standalone U-Net configuration (used when model_type: "unet")
+  # NOTE: Only PhysicsNemo UNet is supported for standalone use.
+  # Custom UNet3D is designed for U-FNO only (where channels remain constant).
+  unet:
+    in_channels: 12         # Input channels
+    out_channels: 1         # Output channels
+    unet_type: physicsnemo  # MUST be "physicsnemo" (custom UNet3D only works within U-FNO)
+
+    # PhysicsNemo UNet parameters (only used when unet_type: "physicsnemo")
+    # Defaults are set to match the U-FNO comparison for reproducibility
+    physicsnemo:
+      kernel_size: 3
+      stride: 1
+      model_depth: 3        # Number of downsampling levels
+      feature_map_channels: [36, 36, 36]  # Channels at each level (length = model_depth * num_conv_blocks)
+      num_conv_blocks: 1    # Number of conv blocks per level
+      conv_activation: relu
+      conv_transpose_activation: relu
+      padding: 1
+      padding_mode: zeros
+      pooling_type: MaxPool3d
+      pool_size: 2
+      normalization: batchnorm
+      use_attn_gate: false
+      gradient_checkpointing: false
+
+    # Custom UNet3D parameters (only used when unet_type: "custom")
+    custom:
+      kernel_size: 3
+      dropout_rate: 0.0
+
+# Loss function configuration
+loss:
+  base_loss_type: relative_l2    # Base loss type - Options: 'mse', 'l1', 'relative_l2', 'simple_relative_l2'
+
+  # Reduction method - Options: 'mean', 'sum', 'none'
+  reduction: mean
+
+  # Masking - Options: true, false
+  # Apply loss only on active reservoir cells (irregular domain)
+  use_mask: true
+
+  # Physics-informed derivative loss - Options: true, false
+  # Add spatial derivative term to loss 
+  use_derivative: true
+  # Derivative weight - Range: 0.0 to 1.0
+  derivative_weight: 0.5
+  derivative_dim: dx
+
diff --git a/examples/reservoir_simulation/DeepONet/conf/training_config.yaml b/examples/reservoir_simulation/DeepONet/conf/training_config.yaml
@@ -0,0 +1,90 @@
+# SPDX-FileCopyrightText: Copyright (c) 2023 - 2025 NVIDIA CORPORATION & AFFILIATES.
+# SPDX-FileCopyrightText: All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Training Configuration
+# This file imports model_config.yaml and adds training-specific settings
+
+defaults:
+  - model_config
+  - _self_
+
+hydra:
+  job:
+    chdir: False
+  run:
+    dir: ./outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}
+
+# Data configuration
+data:
+  data_path: /home/wdyab/physicsnemo/data_lustre
+  variable: saturation  # 'pressure' or 'saturation'
+  normalize: false
+  num_workers: null  # Auto-set to 2 × num_gpus
+
+# Training configuration
+training:
+  batch_size: 4
+  epochs: 11
+  initial_lr: 0.001
+  checkpoint_freq: 10
+  checkpoint_dir: ./checkpoints
+  validate_freq: 10
+  early_stopping: true
+  patience: 20
+  use_amp: false  # Not beneficial for FNO models
+  use_graphs: true
+
+# Optimizer configuration
+optimizer:
+  type: adam
+  weight_decay: 0.0001
+  betas: [0.9, 0.999]
+  eps: 1.0e-8
+
+# Learning rate scheduler configuration
+scheduler:
+  type: step                # Scheduler type: 'step' or 'exponential'
+  step_size: 4              # Step every N epochs (for StepLR)
+  gamma: 0.85               # Multiplicative factor of learning rate decay
+  min_lr: 1.0e-6            # Minimum learning rate
+
+# Logging configuration
+logging:
+  console_freq: 10
+  use_tensorboard: true
+  tensorboard_dir: ./tensorboard
+  use_mlflow: false
+  experiment_name: ufno_co2_sequestration
+  use_wandb: false
+  wandb_project: physicsnemo_co2
+  wandb_entity: null
+
+# Random seed for reproducibility
+seed: 42
+
+# Compute configuration
+compute:
+  # Device
+  device: cuda              # 'cuda' or 'cpu'
+
+  # Multi-GPU training
+  num_gpus: 8               # Number of GPUs to use (null = auto-detect)
+  distributed: true         # Enable distributed training
+
+  # Performance
+  benchmark: false          # Enable cudnn benchmarking
+  deterministic: true       # Deterministic mode (slower but reproducible)
+