Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- Examples: Added DeepONet framework for CO2 sequestration modeling in
`examples/reservoir_simulation/DeepONet/`. Features multiple neural operator
architectures (U-FNO, Conv-FNO, Standalone UNet), physics-informed losses with
spatial derivatives and domain masking, distributed training with DDP and AMP,
comprehensive experiment tracking (MLFlow, TensorBoard, W&B), and flexible
Hydra-based configuration system. (#1255)

### Changed

### Deprecated
Expand Down
30 changes: 30 additions & 0 deletions examples/reservoir_simulation/DeepONet/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Temporary directories
logs/
outputs/
mlruns/
visualizations/
__pycache__/
checkpoints/

# NFS lock files
.nfs*

# Cluster-specific files
*.sbatch

# Temporary scripts
check_gpu_memory.sh
launch.log

# Python cache
*.pyc
*.pyo
*.pyd
.Python

# Jupyter
.ipynb_checkpoints/

# IDE
.vscode/
.idea/
145 changes: 145 additions & 0 deletions examples/reservoir_simulation/DeepONet/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# U-FNO for CO2 Sequestration

Deep learning models for predicting pressure and saturation in CO2 sequestration reservoirs.

## Quick Start

### 1. Install Dependencies

```bash
pip install -r requirements.txt
```

### 2. Training

```bash
# Edit configuration
vim conf/training_config.yaml

# Run training (single GPU)
python train_fno3d.py

# Run training (multi-GPU with DDP)
torchrun --nproc_per_node=4 train_fno3d.py
```

### 3. Evaluation

```bash
# Evaluate pressure model
python evaluate_pressure.py --checkpoint checkpoints/best_model_pressure_*.pth

# Evaluate saturation model
python evaluate_saturation.py --checkpoint checkpoints/best_model_saturation_*.pth
```

## Available Models

| Model Type | Description | Best For |
|-----------|-------------|----------|
| **U-FNO** | Fourier + U-Net (hybrid) | Best accuracy, spatiotemporal PDEs |
| **Conv-FNO** | Fourier + 3D Convolutions | Balanced performance/speed |
| **Standalone UNet** | Pure spatial convolutions | Baseline comparisons |
| **Standard FNO** | Pure Fourier layers | Global patterns, fast training |

All models support both **custom** and **PhysicsNemo** UNet implementations.

## Documentation

### Core Guides

- Configuration system (model_config.yaml + training_config.yaml)
- Model architectures and parameters
- Data format requirements and validation
- Model evaluation system

### Architecture Guides

- Conv-FNO architecture details

## File Structure

```text
U-FNO/
├── conf/
│ ├── model_config.yaml # Model architecture & loss
│ └── training_config.yaml # Training, data, optimizer settings
├── train_fno3d.py # Training script
├── evaluate_pressure.py # Pressure evaluation
├── evaluate_saturation.py # Saturation evaluation
├── ufno.py # U-FNO model architectures
├── unet3d.py # Custom UNet implementations
├── physicsnemo_unet.py # PhysicsNemo UNet wrapper
├── data_validation.py # Data validation utilities
├── dataset.py # Data loading
├── losses.py # Loss functions
├── metrics.py # Evaluation metrics
└── checkpoints/ # Trained models
```

## Configuration

### Two-File System

1. **`model_config.yaml`** - Model architecture and loss (rarely changed)
2. **`training_config.yaml`** - Training parameters (frequently tuned)

### Example: Train U-FNO

```yaml
# training_config.yaml
data:
variable: pressure # or "saturation"

# model_config.yaml
arch:
model_type: ufno
ufno:
num_fno_layers: 3
num_unet_layers: 3
num_conv_layers: 0
unet_type: custom # or "physicsnemo"
```

### Example: Train Conv-FNO

```yaml
# model_config.yaml
arch:
model_type: ufno
ufno:
num_fno_layers: 3
num_unet_layers: 0 # Disable U-Net
num_conv_layers: 3 # Enable Conv
```

### Example: Train Standalone UNet

```yaml
# model_config.yaml
arch:
model_type: unet
unet:
unet_type: physicsnemo # NOTE: Only physicsnemo supported for standalone use
```

**Note:** Custom UNet3D is designed for U-FNO only (constant channel dimensions).
For standalone UNet, always use `unet_type: physicsnemo`.

## Model Checkpoints

Models are automatically named based on architecture:

- `best_model_pressure_ufno_custom.pth`
- `best_model_pressure_convfno.pth`
- `best_model_saturation_unet_physicsnemo.pth`
- etc.

No manual naming needed - prevents accidental overwriting!

## Citation

If you use this code, please cite:

- PhysicsNemo: [NVIDIA PhysicsNemo](https://github.com/NVIDIA/physicsnemo)
- U-FNO: [Your paper/reference]
101 changes: 101 additions & 0 deletions examples/reservoir_simulation/DeepONet/conf/model_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2025 NVIDIA CORPORATION & AFFILIATES.
# SPDX-FileCopyrightText: All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Model Architecture and Loss Configuration

arch:
# Model type: "ufno" (Fourier + U-Net) or "unet" (pure U-Net, no Fourier layers)
model_type: ufno # Options: "ufno", "unet"

ufno:
in_channels: 12 # Input channels (12 physical quantities)
out_channels: 1 # Output channel (pressure dP OR saturation sg)
width: 36 # Latent feature dimension (reduced for 3D)
modes1: 10 # Number of Fourier modes in height direction
modes2: 10 # Number of Fourier modes in width direction
modes3: 10 # Number of Fourier modes in time direction
num_fno_layers: 3 # Standard Fourier layers (no enhancement)
num_unet_layers: 3 # U-Net enhanced layers (U-FNO: set > 0, Conv-FNO: set to 0)
num_conv_layers: 0 # Conv enhanced layers (Conv-FNO: set > 0, U-FNO: set to 0)
padding: 8 # Spatial padding
conv_kernel_size: 3 # Conv layer kernel size (for Conv-FNO)
unet_kernel_size: 3 # U-Net convolutional kernel size
unet_dropout: 0.0 # U-Net dropout rate
unet_type: physicsnemo # UNet type: "custom" (UNet3D) or "physicsnemo" (PhysicsNemo's UNet)

# Activation function
# Options: "relu", "gelu", "silu", "swish", "mish", "tanh", "leaky_relu", etc.
activation_fn: relu

# Lifting network configuration (input -> latent space)
lifting_type: mlp # Type: "mlp" or "conv"
lifting_layers: 1 # Number of layers
lifting_width: 2 # Hidden width factor: hidden_width = width // lifting_width

# Decoder network configuration (latent space -> output)
decoder_type: mlp # Type: "mlp" or "conv"
decoder_layers: 1 # Number of hidden layers (1 = original U-FNO)
decoder_width: 128 # Hidden layer size

# Standalone U-Net configuration (used when model_type: "unet")
# NOTE: Only PhysicsNemo UNet is supported for standalone use.
# Custom UNet3D is designed for U-FNO only (where channels remain constant).
unet:
in_channels: 12 # Input channels
out_channels: 1 # Output channels
unet_type: physicsnemo # MUST be "physicsnemo" (custom UNet3D only works within U-FNO)

# PhysicsNemo UNet parameters (only used when unet_type: "physicsnemo")
# Defaults are set to match the U-FNO comparison for reproducibility
physicsnemo:
kernel_size: 3
stride: 1
model_depth: 3 # Number of downsampling levels
feature_map_channels: [36, 36, 36] # Channels at each level (length = model_depth * num_conv_blocks)
num_conv_blocks: 1 # Number of conv blocks per level
conv_activation: relu
conv_transpose_activation: relu
padding: 1
padding_mode: zeros
pooling_type: MaxPool3d
pool_size: 2
normalization: batchnorm
use_attn_gate: false
gradient_checkpointing: false

# Custom UNet3D parameters (only used when unet_type: "custom")
custom:
kernel_size: 3
dropout_rate: 0.0

# Loss function configuration
loss:
base_loss_type: relative_l2 # Base loss type - Options: 'mse', 'l1', 'relative_l2', 'simple_relative_l2'

# Reduction method - Options: 'mean', 'sum', 'none'
reduction: mean

# Masking - Options: true, false
# Apply loss only on active reservoir cells (irregular domain)
use_mask: true

# Physics-informed derivative loss - Options: true, false
# Add spatial derivative term to loss
use_derivative: true
# Derivative weight - Range: 0.0 to 1.0
derivative_weight: 0.5
derivative_dim: dx

90 changes: 90 additions & 0 deletions examples/reservoir_simulation/DeepONet/conf/training_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2025 NVIDIA CORPORATION & AFFILIATES.
# SPDX-FileCopyrightText: All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Training Configuration
# This file imports model_config.yaml and adds training-specific settings

defaults:
- model_config
- _self_

hydra:
job:
chdir: False
run:
dir: ./outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}

# Data configuration
data:
data_path: /home/wdyab/physicsnemo/data_lustre
variable: saturation # 'pressure' or 'saturation'
normalize: false
Comment on lines +32 to +34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: hardcoded user-specific data path

The path /home/wdyab/physicsnemo/data_lustre is specific to the author's system and won't work for other users. Consider using a relative path or adding a clear comment that users must update this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: normalize: false may not be intended for production use - verify this is intentional for the saturation variable or if it should be true by default

num_workers: null # Auto-set to 2 × num_gpus

# Training configuration
training:
batch_size: 4
epochs: 11
initial_lr: 0.001
checkpoint_freq: 10
checkpoint_dir: ./checkpoints
validate_freq: 10
early_stopping: true
patience: 20
use_amp: false # Not beneficial for FNO models
use_graphs: true

# Optimizer configuration
optimizer:
type: adam
weight_decay: 0.0001
betas: [0.9, 0.999]
eps: 1.0e-8

# Learning rate scheduler configuration
scheduler:
type: step # Scheduler type: 'step' or 'exponential'
step_size: 4 # Step every N epochs (for StepLR)
gamma: 0.85 # Multiplicative factor of learning rate decay
min_lr: 1.0e-6 # Minimum learning rate

# Logging configuration
logging:
console_freq: 10
use_tensorboard: true
tensorboard_dir: ./tensorboard
use_mlflow: false
experiment_name: ufno_co2_sequestration
use_wandb: false
wandb_project: physicsnemo_co2
wandb_entity: null

# Random seed for reproducibility
seed: 42

# Compute configuration
compute:
# Device
device: cuda # 'cuda' or 'cpu'

# Multi-GPU training
num_gpus: 8 # Number of GPUs to use (null = auto-detect)
distributed: true # Enable distributed training

# Performance
benchmark: false # Enable cudnn benchmarking
deterministic: true # Deterministic mode (slower but reproducible)

Loading