The purpose of this time-series based AI model is to have a simple and fast benchmark model. The inputs are a few time-series of winds and air-pressure, from ERA5 in the examples in this repository. The outputs are timeseries of tides, surge and interaction at a number of locations. For now, the model is trained on output of a numerical model. In principle, measurements can also be used for training.
The AI model in this folder contains 3 modules for forecasting, tides, surges and their interaction.
- Tides: takes only time and location name as input, and should be trained on a multi year dataset of multiple time-series
- Surge: takes winds and pressure at a few points around the North Sea as input, and should be trained with several years of timeseries for a collection of stations.
- Tide-Surge Interation: takes the output of the previous two modules as input and outputs time-series for the non-linear interation.
This should be summed together result in time-series for the total waterlevel as well as for the individual components. The architecture considers the dynamics to be in part local and in part generic, which is reflected in specific inputs per location and common layers. For example to compute the tide level at the second location a one-hot vector
[0,1,0,...]is used with as length the number of locations. The other inputs are Doodson phases at that data and time. The model uses three components and no internal state to achieve a reliable behavior for long lead times. The model is as easily fed with forecasted winds as with winds from a reanalysis for reconstruction of a historical event. Our understanding of the physics of the phenomena has been included in multiple ways into the architecture.
The inputs for winds and air-pressure are sampled at a few relevant locations. In the examples ERA5 fields from the Copernicus Climate Data Store (CDS) are used. Tides require Doodson phases as input, but these are easily computed from the times. The outputs in the examples are from the DCSM-FM model. Previous values of wind and pressure are taken into account for the surge, and the interaction module also has a time window. You have to make sure that the data provided contains an additional few days to compute the first values. The length is equal to the sum of both windows. It's safe to add a bit extra, so you don't have to change anything in case of a small modification of the model.
We aim to have the main datasets available in zarr format in the cloud, so they can be easily accessed from the scripts. For now, the scripts download the data and save it locally. You'll need credentials for downloading data. For the S3 storage this amounts to seting the .aws/credentials and .aws/config files.
Currently the three modules tide, surge and interaction are working. There are a few succesful models. However, the code is still messy. We're working on improvements to the code, but it's still not working again and many scripts still use old routines.
- download sealevel data
get_dcsm_series.jl- read from 1980-2023 DCSM run stored in the cloud - train tides
train_tides.jl.
- convert era5 data to datasets for training
get_era_series.jl - train surges
train_surges.jl
- train tide-surge interaction model
train_interaction.jl
- make analysis of a trained model for a new input dataset
run_analysis.jl
tide_time.jlSome tide routines converted from Hatyan, mainly to compute phases for the basic Doodson frequencieswind_stress.jlConvert 10m winds to stressesnetcdf_utils.jlWrite data in delft3d-fm his format nc filesabstract_series.jl,timeseries.jl,series_netcdf.jl,series_zarr.jl,series_jld2.jlUtilities for TimeSeries including their meta-data. There are basic functions to get the data, times, names, etc. and also for reading and writing to different formats. The TimeSeries type is based on AbstractTimeSeries, and implements an in-memory array based version. Reading routines are lazy, so only a subset of the data is loaded into memory when needed.
test_minio_zarr_with_julia.ipynbTest script for downloading a subset of the 1980-2023 DCSM runhatyan_core.pyCopy of basic tide routines from Haytan2
The different models all need time-series and a configuration as inputs. Each model has different configuration options when studied in more detail.
- Configurations can use a TOML file, wich maps to a data-structure in memory. During development scripts can override values. Production scripts should be fully comfigurable from the config file
- For the model config we can make the time-span for the computation optional. When not given the model settings and times of the dataset are used to determine the start and end time.
- Different configs should share elements where useful
- Long term goal could be more generic scripts like
ai_hydro_train.jlandai_hydro_predict.jlwith the model settings etc all in a config.
- convert DCSM to zarr and store in cload
- basic routines for tides
- create a few training datassets for tides
- prototype for tide training
- export to netcdf his file
- rewrite
train_tides.jlto use TimeSeries datasets - check with cpu and gpu. Is gpu faster?
- rewrite
get_dcsm_series.jlto use TimeSeries
- download ERA5 data # see DataCollector.jl repo
- convert to jld2 and compute stresses
- rewrite
train_surge.jlto use TimeSeries - test
train_surge.jlwith test-dataset - ? convert ERA5 to zarr and store in cloud (also in DataCollector.jl)
- create AI model and train
- update model to unse TimeSeries
- Unit tests
- TimeSeries type based on AbstractTimeSeries
- Selection of locations and times for TimeSeries
- Read and write time-series
- NetCDF
- Zarr
- JLD2
- move
tide_time.jlto src and add a test and check train_tides - move
wind_stress.jlto src and add a test
- tide layers: 3
- channels per layer: 64
- regularization: 0.0001
- batch size: 1024
- stations: 314 (all)
- epochs: 20
- train perdiod: 2008, 2009, 2010
- testing period: 2011
- mean RMSE train: 0.216
- mean RMSE test: 0.230