You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit rewrites the calibration pipeline to use
ClimaCalibrate.ObservationRecipe and ClimaAnalysis for the data
preprocessing and for making the observations. Additionally, this
calibration pipeline is now closer to the calibration pipeline used for
ClimaCoupler.
where `observationseries` is the "truth" you want to calibrate your model on. It can take many forms.
57
+
where `obs_series` is the "truth" you want to calibrate your model on. It can take many forms.
56
58
For example, you may want to calibrate your land model latent heat flux (lhf), the
57
59
observations could be monthly global average of lhf, or monthly average at 100 random locations on land, or the annual amplitude and phase...
58
-
You will create `observationsseries` from some data (for example ERA5), as a vector.
60
+
You will create `obs_series` from some data (for example ERA5), as a vector.
59
61
60
-
note that `observationseries` object contains the covariance matrix of the noise, which informs the uncertainties in space and time of your targeted "truth". It can be set, for example, to the inter-annual variance of a variable, or to the average of the variable times a % (e.g., 5%), or to a flat noise (for example, 5 W m-2 for latent heat). This will inform the EKP algorithm that if the model is within the target +- noise at specific space and time, the goal is reached.
62
+
Note that `obs_series` object contains the covariance matrix of the noise, which informs the uncertainties in space and time of your targeted "truth". It can be set, for example, to the inter-annual variance of a variable, or to the average of the variable times a % (e.g., 5%), or to a flat noise (for example, 5 W m-2 for latent heat). This will inform the EKP algorithm that if the model is within the target +- noise at specific space and time, the goal is reached.
61
63
62
-
`TransformUnscented.Unscented` is a method in EKP, that requires `2 x number of parameters + 1` ensemble members (`ensembe_size`, the number of parameter set drawn for your prior distribution tested at each iteration). For more information, read the [EKP documentation for that method](https://clima.github.io/EnsembleKalmanProcesses.jl/dev/unscented_kalman_inversion/).
64
+
`TransformUnscented.Unscented` is a method in EKP, that requires `2 x number of parameters + 1` ensemble members (`ensemble_size`, the number of parameter set drawn for your prior distribution tested at each iteration). For more information, read the [EKP documentation for that method](https://clima.github.io/EnsembleKalmanProcesses.jl/dev/unscented_kalman_inversion/).
63
65
64
66
`verbose = true` is a setting that writes information about your calibration run to a log file.
65
67
@@ -134,7 +136,7 @@ like this:
134
136
│ │ │ └── output_active -> output_0000
135
137
│ │ └── parameters.toml
136
138
```
137
-
each iteration contains folders for each member, inside which you can find the parameters value inside `parameters.toml`, and model outputs inside `global_diagnostics`.
139
+
Each iteration contains folders for each member, inside which you can find the parameters value inside `parameters.toml`, and model outputs inside `global_diagnostics`.
138
140
139
141
Two additional functions need to be defined in order to run `CAL.calibrate`. `CAL.forward_model(iteration, member)` and `observation_map(iteration)`.
140
142
The `CAL.forward_model(iteration, member)` needs to generate your model output for a specific iteration and member. The `observation_map(iteration)`
julia --project=.buildkite -e 'using Pkg; Pkg.instantiate(;verbose=true)'
193
-
julia --project=.buildkite/ experiments/calibration/calibrate_land.jl
195
+
julia --project=.buildkite/ experiments/calibration/run_calibration.jl
194
196
```
195
197
196
198
where `calibrate_land.jl` is a script that generates all the arguments needed and eventually calls `CAL.calibrate`.
@@ -199,33 +201,163 @@ You would start the job with a command such as `qsub name_of_job_script` for PBS
199
201
200
202
Note that with the default EKP configuration, UTKI, the number of ensemble is set by the number of parameters, as explained in the documentation above. The number of workers (if you use the worker backend) is automatically set to that numbers, so that all members are run in parallel for each iteration.
201
203
202
-
## Configure your calibration job
203
-
204
-
To run a calibration, you can modify the following objects in `calibrate_land.jl`:
205
-
- include `forward_model.jl` or `forward_model_bucket.jl`, depending on which model you want to calibrate
206
-
-`variable_list`: choose the variables you want to calibrate. For example, `["swu"]` or `["lhf", "shf"]`
207
-
-`n_iterations`: how many iterations you want to run
208
-
-`spinup_period`: the time length of your spinup
209
-
-`start_date`: the start date of the calibration
210
-
-`nelements`: to adjust the model resolution (n_horizontal elements, n_vertical elements)
211
-
-`caldir`: you can change the name and path of your calibration output directory
212
-
-`training_locations`: the default is all locations on land, but you can change this to a subset of land coordinates
213
-
214
-
You should also modify the files:
215
-
-`priors.jl`, to give your parameters and their distribution
216
-
-`forward_mode.jl` or `forward_model_bucket.jl`, to ensure it reads the parameters from `priors.jl` and uses them
217
-
218
-
other things to consider:
219
-
-`observationseries_era5.jl`: currently the target data is era5. This could be changed to another dataset, or a perfect model target
220
-
-`observationseries_era5.jl`: the noise is currently set to 5^2 (flat noise, in W m^-2), this should be changed to desired noise
221
-
- you could change the EnsembleKalmanProcesses settings, set in `EKP.EnsembleKalmanProcesses()` in `calibrate_land.jl`
222
-
- you can adjust the ClimaCalibrate backend, see the [documentation](https://clima.github.io/ClimaCalibrate.jl/dev/backends/)
223
-
- Because the variable names are currently different in era5 file, you may have to add variables to map in observationseries_era5.jl (for example, "lhf" => "mslhf")
224
-
225
-
Also, note that currently:
226
-
- the temporal resolution of observation_maps are seasonal (3 months average)
227
-
- the length of calibration is 1 year (data used after spinup)
228
-
- the entire globe is used
229
-
These could also be changed in the code, but would currently requires significant changes.
230
-
231
-
Finally, note that the HPC job script and command will slightly differ between slurm (for example central or clima) and pbs (for example derecho).
204
+
## Configure your land calibration
205
+
206
+
For configuring the land calibration, you can configure the optimization method
207
+
used by EnsembleKalmanProcesses.jl, the observations and how they are
208
+
preprocessed, and the simulation itself. For most settings, you can modify
209
+
the `CalibrateConfig` struct. See the example below.
210
+
211
+
```julia
212
+
CalibrateConfig(;
213
+
short_names = ["swu"],
214
+
minibatch_size =2,
215
+
n_iterations =10,
216
+
sample_date_ranges = [("2007-12-1", "2008-9-1"),
217
+
("2008-12-1", "2009-9-1"),
218
+
("2009-12-1", "2010-9-1"),
219
+
("2010-12-1", "2011-9-1")],
220
+
extend = Dates.Month(3),
221
+
spinup = Dates.Month(3),
222
+
nelements = (101, 15),
223
+
output_dir ="experiments/calibration/land_model",
224
+
rng_seed =42,
225
+
)
226
+
```
227
+
228
+
With the configuration above, a calibration is being done using the `swu`
229
+
observation. The calibration will run for 10 iterations, and each iteration will
230
+
use a minibatch size of 2. The start and end dates of the simulation is
231
+
automatically determined by `sample_date_ranges`, `spinup`, and `extend`. The
232
+
amount to spin up the simulation for is three months and the amount to run the
233
+
simulation for past the dates in `sample_date_ranges` is also three months.
234
+
Since the minibatch size is 2, the first iteration of the calibration will run
235
+
from 1 September 2007 to 1 December 2009 and the second iteration of the
236
+
calibration is 1 September 2009 to 1 December 2011. Afterward, the iterations
237
+
will repeat, so the third iteration will be the same as the first iteration, the
238
+
fourth iteration will be the same as the second iteration, and so on.
239
+
240
+
The period chosen for `extend` is ensure that all the data is gathered. If you
241
+
are calibrating against seasonal averages, then `extend` should be 3 months and
242
+
if you are calibrating against monthly averages, then `extend` should be 1
243
+
month. For more information, see the sections [Data pipelines](@ref) and
244
+
[Simulation settings](@ref).
245
+
246
+
The number of horizontal and vertical elements to use for the simulation is
247
+
determined by `nelements`. The calibration is saved at `output_dir` and the
248
+
random number generator is seeded by `rng_seed`.
249
+
250
+
### EKP settings
251
+
252
+
In `CalibrateConfig`, you can modify the number of iterations via `n_iterations`
253
+
and the size of the minibatch by `minibatch_size`. For reproducibility, you can
254
+
pass in an integer for `rng_seed` which is used internally by
255
+
EnsembleKalmanProcesses.jl to seed the random number generator.
256
+
257
+
For the land calibration, the default optimization method is
258
+
`EnsembleKalmanProcesses.TransformUnscented` and the default scheduler is
259
+
`EKP.DataMisfitController`. To change this optimization method or change the
260
+
scheduler, you need to go to the `experiments/calibration/run_calibration.jl`
261
+
file and modify it there. For more information about the different optimization
0 commit comments