Skip to content

Commit 2f3e948

Browse files
Merge pull request #1053 from CliMA/ar/calibration_doc
calibration docs
2 parents 5a797ef + 0049357 commit 2f3e948

File tree

2 files changed

+209
-0
lines changed

2 files changed

+209
-0
lines changed

docs/make.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ pages = Any[
5656
"Tutorials" => tutorials,
5757
"Standalone models" => standalone_models,
5858
"Diagnostics" => diagnostics,
59+
"Calibration" => "calibration.md",
5960
"Leaderboard" => "leaderboard/leaderboard.md",
6061
"Restarts" => "restarts.md",
6162
"Time type" => "timemanager.md",

docs/src/calibration.md

Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
# Calibration framework of the CliMA land model
2+
3+
## Introduction
4+
5+
In general, models attempt to reproduce real world observations.
6+
Calibration is the process of finding the parameter set that will best reproduce real world observation.
7+
8+
The key ingredients in calibration are:
9+
- the model we want to run;
10+
- the parameters we want to tune along with their prior distributions;
11+
- the observational data we want to reproduce and how that data is represented in the model;
12+
- the noise associated to such observational data.
13+
14+
The process of calibrating consists of optimizing how different parameters match the given observations within the given noise. In ClimaLand, we use `EnsembleKalmanProcesses.jl` (EKP) to perform automatic calibration.
15+
Before discussing the details of EKP, we introduce the following terminology to mirror the key ingredients introduced above:
16+
17+
- a `forward_model` that runs the model for a given parameter set,
18+
- an `observation` vector that contains the data or statistics of data we want the model to reproduce,
19+
- an `observation_map` which maps the equivalent of the observation vector, but from the model output,
20+
- `priors` which contains the parameters we want to calibrate (find the value that makes the model match the observation best), priors gives which parameters, but also their distribution
21+
- a covariance matrix, that defines the observational error and correlations
22+
23+
[EnsembleKalmanProcesses.jl](https://github.com/CliMA/EnsembleKalmanProcesses.jl) (EKP) is at the center of CliMA's calibration efforts. EKP implements a suite of Ensemble Kalman methods to find a (locally) optimal parameter set `U` for a model `G` to fit noisy `Γ` observational data `Y`. These methods are optimized for problems where the model `G` is computationally expensive and no analtyic derivatives are available, as in the case of weather forcasting, where Ensemble Kalman techniques have a long history of success.
24+
25+
Large calibration campaigns often require supercomputers and while direct use of EKP.jl is possible, CliMA's preferred approach is using [ClimaCalibrate.jl](https://github.com/CliMA/ClimaCalibrate.jl), a package optimized for running on compute clusters. `ClimaCalibrate` handles efficient job orchestration and abstracts the details of the underlying system, providing a simpler user experience. Consult the [ClimaCalibrate documentation](https://clima.github.io/ClimaCalibrate.jl/dev/) for further information.
26+
27+
## Calibrate a land model
28+
29+
In this tutorial, we will perform a calibration using `ClimaCalibrate`. `ClimaCalibrate` provides an interface to `EnsembleKalmanProcesses.jl` that is more optimized for use on supercomputers. The [tutorial to calibrate a single site latent heat flux](https://clima.github.io/ClimaLand.jl/stable/generated/calibration/minimal_working_example_obs/) shows how to perform a calibration using `EKP` directly.
30+
31+
The `calibrate` function is at the heart of performing a calibration with `ClimaCalibrate`:
32+
33+
```julia
34+
import ClimaCalibrate as CAL
35+
36+
CAL.calibrate(
37+
CAL.WorkerBackend,
38+
utki,
39+
n_iterations,
40+
prior,
41+
caldir,
42+
)
43+
```
44+
45+
where the utki object defines your EKP configurations, for example, the default is:
46+
47+
```julia
48+
observationseries,
49+
EKP.TransformUnscented(prior, impose_prior = true);
50+
verbose = true,
51+
rng,
52+
scheduler = EKP.DataMisfitController(terminate_at = 100),
53+
```
54+
55+
where `observationseries` is the "truth" you want to calibrate your model on. It can take many forms.
56+
For example, you may want to calibrate your land model latent heat flux (lhf), the
57+
observations could be monthly global average of lhf, or monthly average at 100 random locations on land, or the annual amplitude and phase...
58+
You will create `observationsseries` from some data (for example ERA5), as a vector.
59+
60+
note that `observationseries` object contains the covariance matrix of the noise, which informs the uncertainties in space and time of your targeted "truth". It can be set, for example, to the inter-annual variance of a variable, or to the average of the variable times a % (e.g., 5%), or to a flat noise (for example, 5 W m-2 for latent heat). This will inform the EKP algorithm that if the model is within the target +- noise at specific space and time, the goal is reached.
61+
62+
`TransformUnscented.Unscented` is a method in EKP, that requires `2 x number of parameters + 1` ensemble members (`ensembe_size`, the number of parameter set drawn for your prior distribution tested at each iteration). For more information, read the [EKP documentation for that method](https://clima.github.io/EnsembleKalmanProcesses.jl/dev/unscented_kalman_inversion/).
63+
64+
`verbose = true` is a setting that writes information about your calibration run to a log file.
65+
66+
`rng` is a set random seed.
67+
68+
`Scheduler` is a EKP setting for timestepping, please read [EKP schedulers documentations](https://clima.github.io/EnsembleKalmanProcesses.jl/dev/learning_rate_scheduler/).
69+
70+
`CAL.WorkerBackend` defines how to interact with the underlying compute system. For other possible backends (for example, `JuliaBackend`, `ClimaGPUBackend`, or `DerechoBackend`),
71+
see [this doc page](https://clima.github.io/ClimaCalibrate.jl/dev/backends/).
72+
73+
Each backend is optimized for specific use cases and computing resources. The backend system is implemented through Julia's multiple dispatch,
74+
so that code written for one computer can seamlessly be ported to a new/different environments.
75+
76+
`prior` is the distribution of the parameters you want to calibrate. For example, if you want to calibrate two parameters called `sc` and `pc`,
77+
you would define your priors like this, for example:
78+
```julia
79+
prior_sc = EKP.constrained_gaussian("sc", 5e-6, 5e-4, 0, Inf);
80+
prior_pc = EKP.constrained_gaussian("pc", -2e6, 1e6, -Inf, Inf);
81+
prior = EKP.combine_distributions([prior_sc, prior_pc]);
82+
```
83+
For more documentation about prior distribution, see [this EKP documentation page](https://clima.github.io/EnsembleKalmanProcesses.jl/dev/parameter_distributions/).
84+
85+
`n_iterations` is the number of times your priors distribution will be updated, at each iteration your model is run for the number of `ensemble_size`.
86+
So in total, your model will be run `ensemble_size` * `n_iterations`.
87+
88+
`caldir` is the path to your calibration output directory. For example `calibration_output`. Inside this folder, the parameter set of each iteration * member will
89+
be stored, as well as the output of your model simulations. For example, if you ran a calibration with 1 iteration and 2 members, caldir would be structured
90+
like this:
91+
92+
```
93+
.
94+
├── iteration_000
95+
│ ├── eki_file.jld2
96+
│ ├── G_ensemble.jld2
97+
│ ├── member_001
98+
│ │ ├── global_diagnostics
99+
│ │ │ ├── output_0000
100+
│ │ │ │ ├── lhf_1M_average.nc
101+
│ │ │ │ ├── lwu_1M_average.nc
102+
│ │ │ │ ├── shf_1M_average.nc
103+
│ │ │ │ └── swu_1M_average.nc
104+
│ │ │ └── output_active -> output_0000
105+
│ │ └── parameters.toml
106+
│ ├── member_002
107+
│ │ ├── global_diagnostics
108+
│ │ │ ├── output_0000
109+
│ │ │ │ ├── lhf_1M_average.nc
110+
│ │ │ │ ├── lwu_1M_average.nc
111+
│ │ │ │ ├── shf_1M_average.nc
112+
│ │ │ │ └── swu_1M_average.nc
113+
│ │ │ └── output_active -> output_0000
114+
│ │ └── parameters.toml
115+
├── iteration_001
116+
│ ├── eki_file.jld2
117+
│ ├── G_ensemble.jld2
118+
│ ├── member_001
119+
│ │ ├── global_diagnostics
120+
│ │ │ ├── output_0000
121+
│ │ │ │ ├── lhf_1M_average.nc
122+
│ │ │ │ ├── lwu_1M_average.nc
123+
│ │ │ │ ├── shf_1M_average.nc
124+
│ │ │ │ └── swu_1M_average.nc
125+
│ │ │ └── output_active -> output_0000
126+
│ │ └── parameters.toml
127+
│ ├── member_002
128+
│ │ ├── global_diagnostics
129+
│ │ │ ├── output_0000
130+
│ │ │ │ ├── lhf_1M_average.nc
131+
│ │ │ │ ├── lwu_1M_average.nc
132+
│ │ │ │ ├── shf_1M_average.nc
133+
│ │ │ │ └── swu_1M_average.nc
134+
│ │ │ └── output_active -> output_0000
135+
│ │ └── parameters.toml
136+
```
137+
each iteration contains folders for each member, inside which you can find the parameters value inside `parameters.toml`, and model outputs inside `global_diagnostics`.
138+
139+
Two additional functions need to be defined in order to run `CAL.calibrate`. `CAL.forward_model(iteration, member)` and `observation_map(iteration)`.
140+
The `CAL.forward_model(iteration, member)` needs to generate your model output for a specific iteration and member. The `observation_map(iteration)`
141+
needs to return your loss, a vector of the same format as `observations` but created with your model output (for example, monthly average of latent heat flux),
142+
for all members. To make this easier, it can be useful to implement a `process_member_data(root_path)` function that generates one member output from your
143+
model output path.
144+
145+
Once you have defined `CAL.forward_model`, `CAL.observation_map`, `caldir`, `noise`, `observations`, `n_iterations`, `ensemble_size`, and your backend, you can
146+
call `CAL.calibrate`!
147+
148+
## job script
149+
150+
A calibration job will likely take hours to complete, so you will probably have to submit a job with a job scheduler.
151+
Below is an example job .pbs script, slurm would be slightly different.
152+
153+
```
154+
#!/bin/bash
155+
#PBS -N derecho_calibration
156+
#PBS -o output.txt
157+
#PBS -e error.txt
158+
#PBS -l walltime=10:00:00
159+
#PBS -l select=1:ncpus=64:ngpus=4
160+
161+
## Account number for CliMA
162+
#PBS -A UCIT0011
163+
#PBS -q main
164+
165+
module load climacommon
166+
# Run your command
167+
export CLIMACOMMS_DEVICE="CUDA"
168+
export CLIMACOMMS_CONTEXT="SINGLETON"
169+
julia --project=.buildkite -e 'using Pkg; Pkg.instantiate(;verbose=true)'
170+
171+
julia --project=.buildkite/ experiments/calibration/global_land/calibrate_land.jl
172+
```
173+
174+
where `calibrate_land.jl` is a script that generates all the arguments needed and eventually calls `CAL.calibrate`.
175+
You would start the job with a command such as `qsub name_of_job_script`, and a few hours later, you would get a calibrated parameter set.
176+
177+
Note that with the default EKP configuration, UTKI, the number of ensemble is set by the number of parameters, as explained in the documentation above. The number of workers (if you use the worker backend) is automatically set to that numbers, so that all members are run in parallel for each iteration.
178+
179+
## Configure your calibration job
180+
181+
To run a calibration, you can modify the following objects in `calibrate_land.jl`:
182+
- include `forward_model.jl` or `forward_model_bucket.jl`, depending on which model you want to calibrate
183+
- `variable_list`: choose the variables you want to calibrate. For example, `["swu"]` or `["lhf", "shf"]`
184+
- `n_iterations`: how many iterations you want to run
185+
- `spinup_period`: the time length of your spinup
186+
- `start_date`: the start date of the calibration
187+
- `nelements`: to adjust the model resolution (n_horizontal elements, n_vertical elements)
188+
- `caldir`: you can change the name and path of your calibration output directory
189+
- `training_locations`: the default is all locations on land, but you can change this to a subset of land coordinates
190+
191+
You should also modify the files:
192+
- `priors.jl`, to give your parameters and their distribution
193+
- `forward_mode.jl` or `forward_model_bucket.jl`, to ensure it reads the parameters from `priors.jl` and uses them
194+
195+
other things to consider:
196+
- `observationseries_era5.jl`: currently the target data is era5. This could be changed to another dataset, or a perfect model target
197+
- `observationseries_era5.jl`: the noise is currently set to 5^2 (flat noise, in W m^-2), this should be changed to desired noise
198+
- you could change the EnsembleKalmanProcesses settings, set in `EKP.EnsembleKalmanProcesses()` in `calibrate_land.jl`
199+
- you can adjust the ClimaCalibrate backend, see the [documentation](https://clima.github.io/ClimaCalibrate.jl/dev/backends/)
200+
- Because the variable names are currently different in era5 file, you may have to add variables to map in observationseries_era5.jl (for example, "lhf" => "mslhf")
201+
202+
Also, note that currently:
203+
- the temporal resolution of observation_maps are seasonal (3 months average)
204+
- the length of calibration is 1 year (data used after spinup)
205+
- the entire globe is used
206+
These could also be changed in the code, but would currently requires significant changes.
207+
208+
Finally, note that the HPC job script and command will slightly differ between slurm (for example central or clima) and pbs (for example derecho).

0 commit comments

Comments
 (0)