Paper Link: https://www.science.org/doi/full/10.1126/sciadv.adu2488
git clone https://github.com/OpenEarthLab/ORCA-DL.git
cd ORCA-DL
conda create -n orca python=3.9.17
conda activate orca
pip install -r requirements.txt
All the predictions, trained model weights and data can be found in https://1drv.ms/f/c/49d761d10f0b201d/Emi9scIyaWBCrNTgRo6t12oBLnF2qGDRGj0M7-g0ekRM1A
Note The data in the
train_data
andvalid_test_data
directories have been interpolated and normalized using the mean and standard deviation provided in thestat
directory. Therefore, they can be directly fed into the model, primarily by concatenating them in the order of the variables.
See demo.ipynb
Please note that ORCA-DL initially uses GODAS data as its starting point. Initialization with other data is also feasible, but it is necessary to ensure that the data is interpolated to the correct longitude and latitude range and resolution, as in example_data. Here is an example of how to interpolate the original data using CDO (recommended):
wget https://downloads.psl.noaa.gov/Datasets/godas/sshg.1980.nc -O sshg-1980.nc # download a 2D data from GODAS
cdo -b f64 remapbil,grid sshg-1980.nc sshg-1980-processed.nc
wget https://downloads.psl.noaa.gov/Datasets/godas/salt.1980.nc -O salt-1980.nc # download a 3D data from GODAS
cdo -b f64 remapbil,grid salt-1980.nc tmp1.nc
cdo intlevel,10,15,30,50,75,100,125,150,200,250,300,400,500,600,800,1000 tmp1.nc tmp2.nc
cdo setzaxis,zaxis.txt tmp2.nc salt-1980-processed.nc
rm tmp1.nc tmp2.nc
After the data interpolation is completed, you can refer to the demo.ipynb to run ORCA-DL.
Importantly You need to unify the units before using our statistics to normalize the data. See demo.ipynb.
First, download and organize the training, validation (optional), testing (optional) data, as shown below.
YOUR_CMIP_DATA_DIR/ # for training
BCC-CSM2-MR/
so/
1850_1.npy
...
thetao/
...
...
CAS-ESM2-0/
...
YOUR_SODA2_DATA_DIR/ # for validation
so/
1850_1.npy
...
thetao/
...
...
YOUR_ORAS5_DATA_DIR/ # for validation
same as SODA2
YOUR_GODAS_DATA_DIR/ # for testing
same as SODA2
Then, replace your corresponding dir path in train.sh
and run bash train.sh
in the command line.
After training, you can run bash predict.sh
to make ensemble prediction using GODAS data. You can also refer to demo.ipynb
for a more straightforward way to make predictions.
Note We use Fully Sharded Data Parallel (FSDP) to accelerate training. With four NVIDIA A100 GPUs, the training process consumes approximately 36 GB of GPU memory per GPU. As the number of GPUs increases, the memory required per GPU decreases, and conversely, fewer GPUs result in higher memory usage per GPU. The testing process consumes approximately 12 GB of GPU memory on a single GPU. Training takes approximately 12 hours, while testing takes about 10 minutes (only saving tos).
- 2025-07-28: Model predictions (initialized with GODAS) and training data and code are released.
- 2025-03-21: Data preprocessing processes are released.
- 2025-03-04: Model weights and demo code are released.
If you find this work useful, please cite our paper:
@article{guo2025data,
title={Data-driven global ocean modeling for seasonal to decadal prediction},
author={Guo, Zijie and Lyu, Pumeng and Ling, Fenghua and Bai, Lei and Luo, Jing-Jia and Boers, Niklas and Yamagata, Toshio and Izumo, Takeshi and Cravatte, Sophie and Capotondi, Antonietta and Ouyang, Wanli},
journal={Science Advances},
volume={11},
number={33},
pages={eadu2488},
year={2025},
publisher={American Association for the Advancement of Science}
}