Observation-driven correction of numerical weather prediction for marine winds

This repository contains the code and data for the paper "Observation-driven correction of numerical weather prediction for marine winds" submitted to JGR: Machine Learning and Computation.

The paper presents a transformer-based approach that reformulates marine wind forecasting as observation-informed correction of numerical weather prediction. Rather than forecasting winds directly, the model learns local correction patterns by assimilating the latest in-situ observations to adjust Global Forecast System (GFS) outputs. The architecture handles irregular and time-varying observation sets through masking and set-based attention mechanisms, conditions predictions on recent observation–forecast pairs via cross-attention, and employs cyclical time embeddings and coordinate-aware location representations to enable single-pass inference at arbitrary spatial coordinates.

The model is evaluated over the Atlantic Ocean using collocated observations from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS). It reduces GFS 10-meter wind root-mean-square error at all lead times up to 48 hours, achieving 45% improvement at 1-hour lead time and 13% improvement at 48-hour lead time. The tokenized architecture naturally accommodates heterogeneous observing platforms (ships, buoys, tide gauges, and coastal stations) and produces both site-specific predictions and basin-scale gridded products in a single forward pass.

Citation

Use the following citation when the code or data are used:

Peduto, M.; Yang, Q.; Giezendanner, J.; Tuia, D.; Wang, S.; Observation-driven correction of numerical weather prediction for marine winds. Submitted to JGR: Machine Learning and Computation, 2025.

Model and data

Data

The data for training, testing and validation can be found on Zenodo. The files are already processed and ready to be used in the model.

For ICOADS, ERA5 and GFS, the following variables are available:

u and v component of wind vector at 10 meters above ground
additional variables for ERA5 and GFS

Code

The code is organised as follows (in offshore-wind-forecasting/):

launch_global_models.py is a laucher pointing at train_global_models.py (the arguments of the parser need to be given)
train_global_models.py contains the main code loop with the arguments --lead_time (lead time hours), --type_data (global or subset), --global_position_embedding (global), --absolute_time_embedding (absolute)
inference_gridded.py contains the code for the gridded evaluation of the model
the folder Dataloader/ contains the data loaders for the models but also the gridded inference
models/ contains the code for the model, the cross-attention, the activations, the location encoder, and the early stopping
common_functiony.py contains some utils functions for the pipeline
loading_files/ contains the pipeline to load the values from ERA5 and GFS
processing_files/ contains the pipeline to process the data from ICOADS, ERA5 and GFS into the files used for training the models

Code inputs

Once the processed training files are in the appropriate folders the code only need the appropriate arguments parse when excecuting the main scripts. The training files are under the following format, where nis the requested lead time:

Data/
├── training_files
   ├── lead_time_n
      ├── train.parquet
      ├── test.parquet
      ├── validation.parquet

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
databuilder		databuilder
loading_files		loading_files
models		models
processing_files		processing_files
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
common_functions.py		common_functions.py
inference_gridded.py		inference_gridded.py
launch_global_models.py		launch_global_models.py
submit_array.sh		submit_array.sh
train_global_models.py		train_global_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Observation-driven correction of numerical weather prediction for marine winds

Citation

Model and data

Data

Code

Code inputs

About

Uh oh!

Releases

Packages

Languages

Earth-Intelligence-Lab/marine-wind-forecasting

Folders and files

Latest commit

History

Repository files navigation

Observation-driven correction of numerical weather prediction for marine winds

Citation

Model and data

Data

Code

Code inputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages