Add DeepONet framework for neural operator learning in reservoir simulation #1256

wdyab · 2025-11-26T18:44:23Z

PhysicsNeMo Pull Request

Description

This PR adds a comprehensive deep learning framework for neural operator learning in reservoir simulation to PhysicsNemo. Implements multiple architectures (U-FNO, Conv-FNO, UNet) with physics-informed losses, distributed training, and extensive experiment tracking capabilities.

Closes #1255

Checklist

[ x] I am familiar with the Contributing Guidelines.
[ x] New or existing tests cover these changes.
[ x] The documentation is up to date with these changes.
[ x] The CHANGELOG.md is up to date with these changes.
[ x] An #1255 is linked to this pull request.

Dependencies

This contribution uses existing PhysicsNemo dependencies and adds no new requirements beyond what's already in the main repository. All dependencies are standard PyTorch ecosystem packages:

PyTorch (core framework)
Hydra (configuration management) - already in PhysicsNemo
MLFlow (experiment tracking) - optional, commonly available
TensorBoard (visualization) - standard PyTorch tool

The example is self-contained in examples/reservoir_simulation/DeepONet/ and includes its own requirements.txt for reference.

Testing

Trained and validated on production datasets (500 realizations)
Multi-GPU training verified (DDP across 8 GPUs)
All architectures tested (pressure + saturation)
Pre-commit hooks passing
Compatible with PhysicsNemo utilities (DistributedManager, checkpoint utils, logging)

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

This contribution adds a comprehensive deep learning framework for CO2 sequestration reservoir simulation, featuring: - Multiple neural operator architectures (U-FNO, Conv-FNO, Standalone UNet) - Flexible configuration system for model selection and hyperparameters - Automatic model checkpoint naming to prevent overwriting - Dynamic data validation utilities - Comprehensive evaluation scripts with multiple metrics - Apache 2.0 licensed, pre-commit compliant The DeepONet framework enables efficient spatiotemporal prediction of pressure and saturation fields in CO2 sequestration scenarios. Location: examples/reservoir_simulation/DeepONet/ Signed-off-by: wdyab <[email protected]>

Add entry for DeepONet framework contribution to version 1.4.0a0. References: NVIDIA#1255 Signed-off-by: wdyab <[email protected]>

greptile-apps · 2025-11-26T18:49:47Z

Greptile Overview

Greptile Summary

This PR adds a comprehensive neural operator learning framework for reservoir simulation, featuring U-FNO, Conv-FNO, and UNet architectures with distributed training support. The implementation is well-structured with ~4,600 lines across 17 files, integrating cleanly with PhysicsNemo's existing utilities.

Key Changes

Neural architectures: U-FNO, Conv-FNO, Standard FNO, and Standalone UNet implementations in ufno.py, unet3d.py, and physicsnemo_unet.py
Training infrastructure: DDP-enabled training script with AMP, checkpointing, and MLFlow integration in train_fno3d.py
Physics-informed losses: Unified loss framework with masking and spatial derivative constraints in losses.py
Data pipeline: Optimized dataloaders with distributed sampling, normalization, and persistent workers in dataset.py
Configuration system: Hydra-based two-file config (model + training) for clean separation of concerns

Critical Issues Found

UNet4D uses non-existent PyTorch operations (Conv4d, ConvTranspose4d, BatchNorm4d) - will cause runtime errors if used
Incorrect grid spacing formula in derivative computation - adds terms instead of computing differences, affecting physics-informed loss accuracy

Configuration Issues

Hardcoded user-specific data path in training_config.yaml
normalize: false setting may need review for production use

Strengths

Excellent code organization and documentation
Proper PhysicsNemo integration using DistributedManager, checkpoint utilities, and logging
Comprehensive metrics and evaluation scripts
Well-designed configuration with sensible defaults
Clean separation between U-FNO variants and standalone models

Important Files Changed

File Analysis

Filename	Score	Overview
examples/reservoir_simulation/DeepONet/unet3d.py	1/5	Contains UNet4D class using non-existent PyTorch Conv4d/ConvTranspose4d/BatchNorm4d operations - will cause runtime errors
examples/reservoir_simulation/DeepONet/losses.py	2/5	Derivative computation has incorrect formula in grid spacing calculation (line 266) - adds divided terms instead of computing differences
examples/reservoir_simulation/DeepONet/ufno.py	4/5	Implements U-FNO, Conv-FNO, and standard FNO architectures with proper integration of PhysicsNemo layers and flexible configuration
examples/reservoir_simulation/DeepONet/train_fno3d.py	4/5	Comprehensive training script with DDP support, AMP, checkpointing, and MLFlow integration - well-structured with proper distributed handling
examples/reservoir_simulation/DeepONet/conf/training_config.yaml	3/5	Training configuration with hardcoded data path and normalize=false - needs user customization before use

greptile-apps

_{18 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-11-26T18:49:44Z

examples/reservoir_simulation/DeepONet/unet3d.py

+class UNet4D(nn.Module):
+    """4D U-Net for higher-dimensional spatiotemporal data (H × W × T × D)."""
+
+    def __init__(
+        self,
+        input_channels: int,
+        output_channels: int,
+        kernel_size: int = 3,
+        dropout_rate: float = 0.0,
+    ):
+        super().__init__()
+
+        self.input_channels = input_channels
+        self.output_channels = output_channels
+        self.kernel_size = kernel_size
+        self.dropout_rate = dropout_rate
+
+        # Encoder
+        self.conv1 = self._conv_block(
+            input_channels,
+            output_channels,
+            kernel_size=kernel_size,
+            stride=2,
+            dropout_rate=dropout_rate,
+        )
+        self.conv2 = self._conv_block(
+            input_channels,
+            output_channels,
+            kernel_size=kernel_size,
+            stride=2,
+            dropout_rate=dropout_rate,
+        )
+        self.conv2_1 = self._conv_block(
+            input_channels,
+            output_channels,
+            kernel_size=kernel_size,
+            stride=1,
+            dropout_rate=dropout_rate,
+        )
+        self.conv3 = self._conv_block(
+            input_channels,
+            output_channels,
+            kernel_size=kernel_size,
+            stride=2,
+            dropout_rate=dropout_rate,
+        )
+        self.conv3_1 = self._conv_block(
+            input_channels,
+            output_channels,
+            kernel_size=kernel_size,
+            stride=1,
+            dropout_rate=dropout_rate,
+        )
+
+        # Decoder
+        self.deconv2 = self._deconv_block(input_channels, output_channels)
+        self.deconv1 = self._deconv_block(input_channels * 2, output_channels)
+        self.deconv0 = self._deconv_block(input_channels * 2, output_channels)
+
+        # Output
+        self.output_layer = self._output_block(
+            input_channels * 2,
+            output_channels,
+            kernel_size=kernel_size,
+            stride=1,
+            dropout_rate=dropout_rate,
+        )
+
+    def _conv_block(
+        self,
+        in_channels: int,
+        out_channels: int,
+        kernel_size: int,
+        stride: int,
+        dropout_rate: float,
+    ) -> nn.Module:
+        """4D convolutional block."""
+        return nn.Sequential(
+            nn.Conv4d(
+                in_channels,
+                out_channels,
+                kernel_size=kernel_size,
+                stride=stride,
+                padding=(kernel_size - 1) // 2,
+                bias=False,
+            ),
+            nn.BatchNorm4d(out_channels),
+            nn.LeakyReLU(0.1, inplace=True),
+            nn.Dropout(dropout_rate) if dropout_rate > 0 else nn.Identity(),
+        )
+
+    def _deconv_block(self, in_channels: int, out_channels: int) -> nn.Module:
+        """4D transposed convolutional block."""
+        return nn.Sequential(
+            nn.ConvTranspose4d(
+                in_channels, out_channels, kernel_size=4, stride=2, padding=1
+            ),
+            nn.LeakyReLU(0.1, inplace=True),
+        )
+
+    def _output_block(
+        self,
+        in_channels: int,
+        out_channels: int,
+        kernel_size: int,
+        stride: int,
+        dropout_rate: float,
+    ) -> nn.Module:
+        """Output layer."""
+        return nn.Conv4d(
+            in_channels,
+            out_channels,
+            kernel_size=kernel_size,
+            stride=stride,
+            padding=(kernel_size - 1) // 2,
+        )
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward pass. Input: (batch, channels, H, W, T, D)"""
+        # Validate dimensions (must be divisible by 8)
+        dims = x.shape[2:]
+        if any(d % 8 != 0 for d in dims):
+            raise ValueError(
+                f"Input dimensions {dims} must be divisible by 8. Got shape: {x.shape}"
+            )
+
+        # Encoder
+        out_conv1 = self.conv1(x)
+        out_conv2 = self.conv2_1(self.conv2(out_conv1))
+        out_conv3 = self.conv3_1(self.conv3(out_conv2))
+
+        # Decoder with skip connections
+        out_deconv2 = self.deconv2(out_conv3)
+        if out_deconv2.shape[2:] != out_conv2.shape[2:]:
+            out_deconv2 = F.interpolate(
+                out_deconv2, size=out_conv2.shape[2:], mode="nearest"
+            )
+        concat2 = torch.cat((out_conv2, out_deconv2), dim=1)
+
+        out_deconv1 = self.deconv1(concat2)
+        if out_deconv1.shape[2:] != out_conv1.shape[2:]:
+            out_deconv1 = F.interpolate(
+                out_deconv1, size=out_conv1.shape[2:], mode="nearest"
+            )
+        concat1 = torch.cat((out_conv1, out_deconv1), dim=1)
+
+        out_deconv0 = self.deconv0(concat1)
+        if out_deconv0.shape[2:] != x.shape[2:]:
+            out_deconv0 = F.interpolate(out_deconv0, size=x.shape[2:], mode="nearest")
+        concat0 = torch.cat((x, out_deconv0), dim=1)
+
+        out = self.output_layer(concat0)
+
+        return out
+
+    def count_params(self) -> int:
+        """Count total number of trainable parameters."""
+        return sum(p.numel() for p in self.parameters() if p.requires_grad)
+
+
+# Aliases for backward compatibility with U-FNO
+UNetModule2D = UNet2D
+UNetModule3D = UNet3D
+UNetModule4D = UNet4D


logic: PyTorch doesn't have nn.Conv4d, nn.ConvTranspose4d, or nn.BatchNorm4d operations. This entire UNet4D class will fail at runtime.

If 4D convolutions are needed, consider using a workaround like reshaping to 3D, or remove this class if it's not currently being used (the PR description mentions 2D+T support is completed, 3D+T is partial).

greptile-apps · 2025-11-26T18:49:45Z

examples/reservoir_simulation/DeepONet/losses.py

+        grid_x = inputs[0, 0, :, 0, -3]  # (W=200,) - extract grid_x channel values
+        grid_dx = grid_x[1:-1] + grid_x[:-2] / 2 + grid_x[2:] / 2  # (W-2=198,)


logic: incorrect grid spacing formula - adds terms instead of computing differences

The current formula grid_dx = grid_x[1:-1] + grid_x[:-2] / 2 + grid_x[2:] / 2 doesn't compute spacing correctly. For central finite differences, grid spacing should be the distance between adjacent points.

Should likely be:

Suggested change

grid_x = inputs[0, 0, :, 0, -3] # (W=200,) - extract grid_x channel values

grid_dx = grid_x[1:-1] + grid_x[:-2] / 2 + grid_x[2:] / 2 # (W-2=198,)

grid_x = inputs[0, 0, :, 0, -3] # (W=200,) - extract grid_x channel values

grid_dx = (grid_x[2:] - grid_x[:-2]) / 2 # (W-2=198,) - central difference spacing

This computes the average spacing for central finite differences: (x[i+1] - x[i-1]) / 2.

greptile-apps · 2025-11-26T18:49:46Z

examples/reservoir_simulation/DeepONet/conf/training_config.yaml

+  data_path: /home/wdyab/physicsnemo/data_lustre
+  variable: saturation  # 'pressure' or 'saturation'
+  normalize: false


style: hardcoded user-specific data path

The path /home/wdyab/physicsnemo/data_lustre is specific to the author's system and won't work for other users. Consider using a relative path or adding a clear comment that users must update this.

greptile-apps · 2025-11-26T18:49:47Z

examples/reservoir_simulation/DeepONet/conf/training_config.yaml

+data:
+  data_path: /home/wdyab/physicsnemo/data_lustre
+  variable: saturation  # 'pressure' or 'saturation'
+  normalize: false


style: normalize: false may not be intended for production use - verify this is intentional for the saturation variable or if it should be true by default

wdyab added 2 commits November 26, 2025 09:17

Update CHANGELOG.md for DeepONet contribution

0a5ea23

Add entry for DeepONet framework contribution to version 1.4.0a0. References: NVIDIA#1255 Signed-off-by: wdyab <[email protected]>

greptile-apps bot reviewed Nov 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add DeepONet framework for neural operator learning in reservoir simulation #1256

Add DeepONet framework for neural operator learning in reservoir simulation #1256

Uh oh!

wdyab commented Nov 26, 2025

Uh oh!

greptile-apps bot commented Nov 26, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Nov 26, 2025

Uh oh!

greptile-apps bot Nov 26, 2025

Uh oh!

greptile-apps bot Nov 26, 2025

Uh oh!

greptile-apps bot Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		grid_x = inputs[0, 0, :, 0, -3] # (W=200,) - extract grid_x channel values
		grid_dx = grid_x[1:-1] + grid_x[:-2] / 2 + grid_x[2:] / 2 # (W-2=198,)

Add DeepONet framework for neural operator learning in reservoir simulation #1256

Are you sure you want to change the base?

Add DeepONet framework for neural operator learning in reservoir simulation #1256

Uh oh!

Conversation

wdyab commented Nov 26, 2025

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Testing

Review Process

Uh oh!

greptile-apps bot commented Nov 26, 2025

Greptile Overview

Greptile Summary

Key Changes

Critical Issues Found

Configuration Issues

Strengths

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant