feat(domino): support domino for training and test

xiaoyewww · xiaoyewww · commit 3eea7c0ac91d · 2025-05-21T00:37:24.000+08:00
diff --git a/README.md b/README.md
@@ -98,6 +98,7 @@ PaddleScience 是一个基于深度学习框架 PaddlePaddle 开发的科学计
 | 热仿真 | [1D 换热器热仿真](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/examples/heat_exchanger) | 机理驱动 | PI-DeepONet | 无监督学习 | - | - |
 | 热仿真 | [2D 热仿真](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/examples/heat_pinn) | 机理驱动 | PINN | 无监督学习 | - | [Paper](https://arxiv.org/abs/1711.10561)|
 | 热仿真 | [2D 芯片热仿真](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/examples/chip_heat) | 机理驱动 | PI-DeepONet | 无监督学习 | - | [Paper](https://doi.org/10.1063/5.0194245)|
+| 外流空气动力学 | [DoMINO](https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/examples/domino) | 数据驱动 | FNO | 监督学习 | [Data](https://caemldatasets.org/drivaerml/) | [Paper](https://arxiv.org/abs/2501.13350)|
 
 <br>
 <p align="center"><b>材料科学(AI for Material)</b></p>
diff --git a/docs/index.md b/docs/index.md
@@ -133,6 +133,7 @@
 | 热仿真 | [1D 换热器热仿真](./zh/examples/heat_exchanger.md) | 机理驱动 | PI-DeepONet | 无监督学习 | - | - |
 | 热仿真 | [2D 热仿真](./zh/examples/heat_pinn.md) | 机理驱动 | PINN | 无监督学习 | - | [Paper](https://arxiv.org/abs/1711.10561)|
 | 热仿真 | [2D 芯片热仿真](./zh/examples/chip_heat.md) | 机理驱动 | PI-DeepONet | 无监督学习 | - | [Paper](https://doi.org/10.1063/5.0194245)|
+| 外流空气动力学 | [DoMINO](./zh/examples/domino.md) | 数据驱动 | FNO | 监督学习 | [Data](https://caemldatasets.org/drivaerml/) | [Paper](https://arxiv.org/abs/2501.13350)|
 
 <br>
 <p align="center"><b>材料科学(AI for Material)</b></p>
diff --git a/docs/zh/examples/domino.md b/docs/zh/examples/domino.md
@@ -0,0 +1,89 @@
+# DoMINO
+
+=== "模型训练命令"
+
+    ``` sh
+    cd examples/domino
+
+    # 1. Download the DrivAer ML dataset using the provided download_aws_dataset.sh script or using the Hugging Face repo(https://huggingface.co/datasets/neashton/drivaerml).
+    sh download_aws_dataset.sh
+
+    # 2. Specify the configuration settings in `examples/domino/conf/config.yaml`.
+
+    # 3. Run process_data.py. This will process VTP/VTU files and save them as npy for faster processing in DoMINO datapipe. Modify data_processor key in config file. Additionally, run cache_data.py to save outputs of DoMINO datapipe in the .npy files. The DoMINO datapipe is set up to calculate Signed Distance Field and Nearest Neighbor interpolations on-the-fly during training. Caching will save these as a preprocessing step and should be used in cases where the STL surface meshes are upwards of 30 million cells. The final processed dataset should be divided and saved into 2 directories, for training and validation. Specify these directories in conf/config.yaml.
+    python3 process_data.py
+
+    # 4. run train
+    python3 train.py
+    ```
+
+=== "模型评估命令"
+
+    暂无
+
+=== "模型导出命令"
+
+    暂无
+
+=== "模型推理命令"
+
+    ``` sh
+    cd examples/domino
+    python3 test.py
+    ```
+
+## 1. 背景简介
+
+外部空气动力学涉及高雷诺数Navier-Stokes方程求解，传统CFD方法计算成本高昂。神经算子通过端到端映射提升了效率，但面临多尺度耦合建模与长期预测稳定性不足的挑战。Decomposable Multi-scale Iterative Neural Operator（Domino）提出可分解多尺度架构，通过分层特征解耦、迭代残差校正及参数独立编码，显著提升跨尺度流动建模精度与泛化能力。实验显示，其计算速度较CFD快2-3个量级，分离流预测精度较FNO等模型提升约40%，为飞行器设计等工程问题提供高效解决方案。
+
+## 2. 模型原理
+
+DOMINO (Decomposable Multi-scale Iterative Neural Operator)是一种新颖的机器学习模型架构，旨在解决大规模工程仿真代理建模中的挑战。它是一个基于点云的机器学习模型，利用局部几何信息来预测离散点上的流场 。
+
+以下是DOMINO模型的主要原理：
+
+- 全局几何表示学习（Global Geometry Representation）：
+    - 模型首先以几何体的三维表面网格作为输入。
+    - 在几何体周围构建一个紧密贴合的表面包围盒和一个表示计算域的包围盒。
+    - 几何点云的特征（如空间坐标）通过可学习的点卷积核投影到表面包围盒上的N维结构化网格上（分辨率为$m×m×m×f$）。
+    - 点卷积核的实现使用了NVIDIA Warp加速的自定义球查询层 。
+    - 通过两种方法将几何特征传播到计算域包围盒中：1）学习一组单独的多尺度点卷积核，将几何信息投影到计算域网格上；2）使用包含卷积、池化和反池化层的CNN块，将表面包围盒网格上的特征$G_s$​传播到计算域包围盒网格$G_c$。CNN块会迭代评估。
+    - 计算域网格上计算出的$m×m×m×f$特征代表了几何点云的全局编码。此外，还会计算符号距离场（SDF）及其梯度分量，并附加到学习到的特征中，以提供关于几何拓扑的额外信息。
+
+- 局部几何表示（Local Geometry Representation）：
+    - 局部几何表示取决于计算域中评估解场的物理位置。
+    - 在计算局部几何表示之前，会在计算域中采样一批离散点。
+    - 对于批次中每个采样点，在其周围定义一个大小为$l×l×l$的子区域，并计算局部几何编码。
+    - 局部编码本质上是全局编码的一个子集，取决于其在计算域中的位置，并通过点卷积计算。
+    - 提取的局部特征通过全连接神经网络进一步转换。
+    - 这种局部几何表示用于使用聚合网络评估采样点上的解场。
+
+- 聚合网络（Aggregation Network）：
+    - 局部几何表示代表了采样点及其邻居的计算模板附近几何和解的学习特征。
+    - 计算模板中的每个点都由其在计算域中的物理坐标、这些坐标处的SDF、来自域质心的法向量以及表面法向量（如果点在表面上）表示。
+    - 这些输入特征通过一个全连接神经网络（称为基函数神经网络），计算出一个潜在向量，代表计算模板中每个点的这些特征。
+    - 每个潜在向量与局部几何编码连接，并通过另一组全连接层，以预测计算模板中每个点上的解向量。
+    - 解向量通过逆距离加权方案进行平均，以预测采样点处的最终解向量。
+    - 对于每个解变量，都使用聚合网络的一个独立实例，但全局几何编码网络在它们之间是共享的。
+
+DOMINO模型通过这种分解式、多尺度和迭代的方法，能够有效地处理大规模仿真数据，捕捉长距离和短距离的相互作用，并在不牺牲准确性的情况下提供可扩展、准确和可推广的代理模型 。
+
+## 3. 完整代码
+
+``` py linenums="1" title="examples/domino/train.py"
+--8<--
+examples/domino/train.py
+--8<--
+```
+
+``` py linenums="1" title="examples/domino/test.py"
+--8<--
+examples/domino/test.py
+--8<--
+```
+
+## 4. 结果展示
+
+## 5. 参考资料
+
+- [DoMINO: A Decomposable Multi-scale Iterative Neural Operator for Modeling Large Scale Engineering Simulations](https://arxiv.org/abs/2501.13350)
diff --git a/examples/domino/conf/config.yaml b/examples/domino/conf/config.yaml
@@ -28,8 +28,8 @@ hydra: # Hydra config
   output_subdir: hydra  # Default is .hydra which causes files not being uploaded in W&B.
 
 data: # Input directory for training and validation data
-  input_dir: /home/aistudio/modulus/examples/cfd/external_aerodynamics/domino/outputs/volume_data/
-  input_dir_val: /home/aistudio/modulus/examples/cfd/external_aerodynamics/domino/outputs/volume_data/
+  input_dir: outputs/volume_data/
+  input_dir_val: outputs/volume_data/
   bounding_box: # Bounding box dimensions for computational domain
     min: [-3.5, -2.25 , -0.32]
     max: [8.5 , 2.25  , 3.00]
@@ -103,7 +103,7 @@ train: # Training configurable parameters
   sampler:
     shuffle: true
     drop_last: false
-  checkpoint_dir: /lustre/rranade/modulus_dev/modulus_forked/modulus/examples/cfd/external_aerodynamics/domino/outputs/AWS_Dataset/3/models/
+  checkpoint_dir: outputs/AWS_Dataset/3/models/
 
 val: # Validation configurable parameters
   dataloader:
@@ -113,12 +113,12 @@ val: # Validation configurable parameters
     drop_last: false
 
 eval: # Testing configurable parameters
-  test_path: /home/aistudio/modulus/examples/cfd/external_aerodynamics/domino/drivaer_data_full_new
-  save_path: /home/aistudio/modulus/examples/cfd/external_aerodynamics/domino/outputs/mesh_predictions_surf_final1/
-  checkpoint_name: /home/aistudio/xiaoyewww/PaddleScience/examples/domino/outputs/AWS_Dataset/1/models/DoMINO.0.30.pdparams
+  test_path: drivaer_data_full
+  save_path: outputs/mesh_predictions_surf_final1/
+  checkpoint_name: outputs/AWS_Dataset/1/models/DoMINO.0.30.pdparams
 
 data_processor: # Data processor configurable parameters
   kind: drivaer_aws # must be either drivesim or drivaer_aws
-  output_dir: /lustre/rranade/modulus_dev/data/volume_data/
-  input_dir: /lustre/datasets/drivaer_aws/drivaer_data_full/
+  output_dir: data/volume_data/
+  input_dir: drivaer_aws/drivaer_data_full/
   num_processors: 12
diff --git a/examples/domino/download_aws_dataset.sh b/examples/domino/download_aws_dataset.sh
@@ -0,0 +1,64 @@
+#!/bin/bash
+
+# This Bash script downloads the AWS DrivAer files from the Amazon S3 bucket to a local directory.
+# Only the volume files (.vtu), STL files (.stl), and VTP files (.vtp) are downloaded.
+# It uses a function, download_run_files, to check for the existence of three specific files (".vtu", ".stl", ".vtp") in a run directory.
+# If a file doesn't exist, it's downloaded from the S3 bucket. If it does exist, the download is skipped.
+# The script runs multiple downloads in parallel, both within a single run and across multiple runs.
+# It also includes checks to prevent overloading the system by limiting the number of parallel downloads.
+
+# Set the local directory to download the files
+LOCAL_DIR="./drivaer_data_full"  # <--- This is the directory where the files will be downloaded.
+
+# Set the S3 bucket and prefix
+S3_BUCKET="caemldatasets"
+S3_PREFIX="drivaer/dataset"
+
+# Create the local directory if it doesn't exist
+mkdir -p "$LOCAL_DIR"
+
+# Function to download files for a specific run
+download_run_files() {
+    local i=$1
+    RUN_DIR="run_$i"
+    RUN_LOCAL_DIR="$LOCAL_DIR/$RUN_DIR"
+
+    # Create the run directory if it doesn't exist
+    mkdir -p "$RUN_LOCAL_DIR"
+
+    # Check if the .vtu file exists before downloading
+    if [ ! -f "$RUN_LOCAL_DIR/volume_$i.vtu" ]; then
+        aws s3 cp --no-sign-request "s3://$S3_BUCKET/$S3_PREFIX/$RUN_DIR/volume_$i.vtu" "$RUN_LOCAL_DIR/" &
+    else
+        echo "File volume_$i.vtu already exists, skipping download."
+    fi
+
+    # Check if the .stl file exists before downloading
+    if [ ! -f "$RUN_LOCAL_DIR/drivaer_$i.stl" ]; then
+        aws s3 cp --no-sign-request "s3://$S3_BUCKET/$S3_PREFIX/$RUN_DIR/drivaer_$i.stl" "$RUN_LOCAL_DIR/" &
+    else
+        echo "File drivaer_$i.stl already exists, skipping download."
+    fi
+
+    # Check if the .vtp file exists before downloading
+    if [ ! -f "$RUN_LOCAL_DIR/boundary_$i.vtp" ]; then
+        aws s3 cp --no-sign-request "s3://$S3_BUCKET/$S3_PREFIX/$RUN_DIR/boundary_$i.vtp" "$RUN_LOCAL_DIR/" &
+    else
+        echo "File boundary_$i.vtp already exists, skipping download."
+    fi
+
+    wait # Ensure that both files for this run are downloaded before moving to the next run
+}
+
+# Loop through the run folders and download the files
+for i in $(seq 1 500); do
+    download_run_files "$i" &
+
+    # Limit the number of parallel jobs to avoid overloading the system
+    if (( $(jobs -r | wc -l) >= 8 )); then
+        wait -n # Wait for the next background job to finish before starting a new one
+    fi
+done
+
+# Wait for all remaining background jobs to finish
+wait
diff --git a/examples/domino/process_data.py b/examples/domino/process_data.py
@@ -0,0 +1,109 @@
+# SPDX-FileCopyrightText: Copyright (c) 2023 - 2024 NVIDIA CORPORATION & AFFILIATES.
+# SPDX-FileCopyrightText: All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+This code runs the data processing in parallel to load OpenFoam files, process them
+and save in the npy format for faster processing in the DoMINO datapipes. Several
+parameters such as number of processors, input and output paths, etc. can be
+configured in config.yaml in the data_processing tab.
+"""
+
+import multiprocessing
+import os
+import time
+
+import hydra
+import numpy as np
+from omegaconf import DictConfig
+from omegaconf import OmegaConf
+from openfoam_datapipe import OpenFoamDataset
+from physicsnemo.utils.domino.utils import *  # noqa: F403
+
+
+def process_files(*args_list):
+    ids = args_list[0]
+    processor_id = args_list[1]
+    fm_data = args_list[2]
+    output_dir = args_list[3]
+    for j in ids:
+        fname = fm_data.filenames[j]
+        if len(os.listdir(os.path.join(fm_data.data_path, fname))) == 0:
+            print(f"Skipping {fname} - empty.")
+            continue
+        outname = os.path.join(output_dir, fname)
+        print("Filename:%s on processor: %d" % (outname, processor_id))
+        filename = f"{outname}.npy"
+        if os.path.exists(filename):
+            print(f"Skipping {filename} - already exists.")
+            continue
+        start_time = time.time()
+        data_dict = fm_data[j]
+        np.save(filename, data_dict)
+        print("Time taken for %d = %f" % (j, time.time() - start_time))
+
+
+@hydra.main(version_base="1.3", config_path="conf", config_name="config")
+def main(cfg: DictConfig):
+    print(f"Config summary:\n{OmegaConf.to_yaml(cfg, sort_keys=True)}")
+    volume_variable_names = list(cfg.variables.volume.solution.keys())
+    num_vol_vars = 0
+    for j in volume_variable_names:
+        if cfg.variables.volume.solution[j] == "vector":
+            num_vol_vars += 3
+        else:
+            num_vol_vars += 1
+
+    surface_variable_names = list(cfg.variables.surface.solution.keys())
+    num_surf_vars = 0
+    for j in surface_variable_names:
+        if cfg.variables.surface.solution[j] == "vector":
+            num_surf_vars += 3
+        else:
+            num_surf_vars += 1
+
+    fm_data = OpenFoamDataset(
+        cfg.data_processor.input_dir,
+        kind=cfg.data_processor.kind,
+        volume_variables=volume_variable_names,
+        surface_variables=surface_variable_names,
+        model_type=cfg.model.model_type,
+    )
+    output_dir = cfg.data_processor.output_dir
+    create_directory(output_dir)  # noqa: F405
+    n_processors = cfg.data_processor.num_processors
+
+    num_files = len(fm_data)
+    ids = np.arange(num_files)
+    num_elements = int(num_files / n_processors) + 1
+    process_list = []
+    ctx = multiprocessing.get_context("spawn")
+    for i in range(n_processors):
+        if i != n_processors - 1:
+            sf = ids[i * num_elements : i * num_elements + num_elements]
+        else:
+            sf = ids[i * num_elements :]
+        # print(sf)
+        process = ctx.Process(target=process_files, args=(sf, i, fm_data, output_dir))
+
+        process.start()
+        process_list.append(process)
+
+    for process in process_list:
+        process.join()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/domino/requirements.txt b/examples/domino/requirements.txt
@@ -1,4 +1,6 @@
 hydra-core
 importlib_metadata
 pyvista==0.34.2
+termcolor
+treelib
 warp-lang