[1539][infra] Adds base config flag by mtar · Pull Request #1573 · ecmwf/WeatherGenerator

mtar · 2026-01-09T15:51:04Z

Description

This PR introduces the cli option --base-config that allows to provide your own base configuration.
See also !112

Issue Number

Closes #1539

Is this PR a draft? Mark it as draft.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

clessig

Looks good and also worked when I tested it.

@grassesi : could you please also have a look.

src/weathergen/utils/cli.py

clessig · 2026-01-11T18:40:43Z

packages/common/src/weathergen/common/config.py

+    match base :
+        case Path():
+            _logger.info(f"Loading specified base config from file: {base}.")
+            c = OmegaConf.load(base)


Can we avoid single variable names (I am surprised the linter allows it here). conf or cfg or config seem appropriate names here

I changed it, but one letter variables also appear on other places in the file. They need to be addressed in another PR then.

mtar · 2026-01-12T14:36:30Z

test_config.py is still missing a DUMMY_BASE_CONFIG. Any ideas for that config?

grassesi

Looks great, the unit tests can be improved a bit (but this should not stop this PR from being merged)

grassesi · 2026-01-12T19:35:53Z

tests/test_config.py

+
+
+def test_load_with_base_file(base_config_file, private_config_file):
+    sub_cf = OmegaConf.load(overwrite_file)


This should probably be base_config_file?

grassesi · 2026-01-12T19:38:16Z

tests/test_config.py

+DUMMY_BASE_CONF = {
+    # TODO add base configuration
+}


can contain just one dummy key "foo": "bar" to distinguish if the test correctly loads the new base config instead of the default config.

clessig · 2026-01-12T20:44:26Z

Looks great, the unit tests can be improved a bit (but this should not stop this PR from being merged)

@mtar : if we can get the small changes suggested by @grassesi still done until tomorrow afternoon then let's do it.

clessig

Thanks for the quick implementation of this!

* set base config (ecmwf#1539) * update help message * longer variable name * longer variable name * rename config variable * rename base_configs --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int>

* Remove mini_epoch backward compatibility * Update eval_config.yml (#1584) * Repeat flag on develop (#1562) * Squashed commit of the following: commit 9336fe1 Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Fri Dec 12 20:10:50 2025 +0100 requested changes commit dadde23 Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Mon Dec 8 18:54:44 2025 +0100 remove 1 line commit c871f9c Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Mon Dec 8 18:16:50 2025 +0100 remove unnecessary statement commit e3e46eb Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Mon Dec 8 12:49:03 2025 +0100 lint commit 559add7 Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Mon Dec 8 12:47:35 2025 +0100 rename flag and simplify cases commit f6e1c39 Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Thu Dec 4 21:07:42 2025 +0100 reset config and lint commit 27cb0c8 Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Thu Dec 4 20:57:14 2025 +0100 repeat flag commit bf17bfe Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 16:53:51 2025 +0100 Updated config commit 7745e47 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 16:35:19 2025 +0100 Switched to lists of model / target stratgies commit 12bae15 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 15:01:07 2025 +0100 Fixes for diffusion commit 9065219 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 13:33:42 2025 +0100 Changed that model takes sample as input commit 3f52a8d Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 13:32:14 2025 +0100 Changed core functions to take sample as arg commit d36367a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 13:31:55 2025 +0100 Changed args to embedding commit b69b743 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 13:30:41 2025 +0100 Cleaned up comments and return values a bit commit 59510dd Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 00:01:50 2025 +0100 Fixed problem with non_blocking=True commit 69b53a6 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 00:00:42 2025 +0100 Removed old comments commit 51754fa Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 00:00:20 2025 +0100 Fixed missing non_blocking=True in to_device() commit 2cd3971 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 23:56:41 2025 +0100 Completed migration to new batch class by removing reference to old list of lists commit 402b8de Author: Julian Kuehnert <Jubeku@users.noreply.github.com> Date: Wed Dec 3 17:11:15 2025 +0100 1390 - Adapt forward pass of new batch object (#1391) * Add to device to ModelBatch, etc & adapt model TODO adapt validate and inference TODO test forecasting and multiple stream because predict changed substantially * Rename view to sample and fix validate * Revert predict function and fix inference * Fix invalid access with mask * Linting * Fixed handling of target_idxs and other minor issues --------- Co-authored-by: sophiex <24638638+sophie-xhonneux@users.noreply.github.com> Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> commit 9a1a6a9 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 13:12:52 2025 +0100 Re-enabled multi-source training commit 3641e1f Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:20:42 2025 +0100 Fix for integration test commit 9f5e49c Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:20:25 2025 +0100 Fixed uv.lock commit 33d9d8d Merge: 23e0267 c8a2aad Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:13:05 2025 +0100 Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit 23e0267 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:11:48 2025 +0100 Update commit c8a26d7 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:11:37 2025 +0100 Commit commit 2599ec2 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:10:13 2025 +0100 Restructured code so that mask generation and application is cleanly separated commit c8a2aad Author: Tim Hunter <tim.hunter@ecmwf.int> Date: Tue Dec 2 17:06:56 2025 +0100 commenting tests commit 2b2c977 Author: Tim Hunter <tim.hunter@ecmwf.int> Date: Tue Dec 2 17:03:41 2025 +0100 linter warnings commit dc736e5 Merge: 6fe8561 7ff6e0b Author: Tim Hunter <tim.hunter@ecmwf.int> Date: Tue Dec 2 16:48:24 2025 +0100 merge with dev commit 6fe8561 Merge: 15b46e9 f136d60 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 14:16:41 2025 +0100 Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit 15b46e9 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Fri Nov 28 13:30:54 2025 +0100 fix indentation of else: assert False in _get_sample msds commit 4281aff Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Fri Nov 28 12:40:24 2025 +0100 restore loader_num_workers to 8 commit 6ea07e7 Author: Seb Hickman <56727418+shmh40@users.noreply.github.com> Date: Fri Nov 28 11:34:41 2025 +0000 restore masking_strategy to random Had placeholder for testing, now back to "random" for masking strategy in the base level of default_config commit 1a37dd1 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Fri Nov 28 10:31:43 2025 +0100 remove unused mask generation in diffusion_forecast commit 657094a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:59:39 2025 +0100 Fixed problem in engines introduced in recent commits merging develop. This fixes masking training commit d526dfc Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:37:02 2025 +0100 Restored masking as training mode. Not working due to NaN in prediction commit 6289959 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:36:38 2025 +0100 Removed duplicate lines due to mergeing commit bc8d23e Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:18:01 2025 +0100 More linting commit 47750a5 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:10:09 2025 +0100 Restoring masking as training_mode in default_config commit 0db8b62 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:09:41 2025 +0100 Linting commit e41a575 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:09:28 2025 +0100 Linting commit 03166a2 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:09:10 2025 +0100 Linting commit 652500a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:08:53 2025 +0100 Linting commit d8998a9 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:08:38 2025 +0100 Linting commit 8ef3a4c Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:08:04 2025 +0100 Simplified and clarified handling of default target_aux_calcualtor commit 3e4de7a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:07:51 2025 +0100 Linting commit 5f803e5 Merge: b47b0fa 0e2801b Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:03:02 2025 +0100 Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit b47b0fa Merge: 9b702c5 26f7b5b Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 07:09:19 2025 +0100 Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit 26f7b5b Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Thu Nov 27 15:33:22 2025 +0100 add diffusion forecast option for the data sampling, and with noise_level_rn in the metadata. The Trainer needs to be copied from Sophies branch, currently we only get so far commit 6d909d6 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Thu Nov 27 11:32:32 2025 +0100 add mask to SampleMetaData and add forecast_dt to Sample so it is accessible. Can specify the loss in the default config with student-teacher views commit e0d7346 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Wed Nov 26 14:31:52 2025 +0100 remove prints, pdb commit c27156c Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Wed Nov 26 12:35:03 2025 +0100 add SampleMetaData integration and functionality, and update masker to use SampleMetadata. Pass through source_cell_lens and target_coords_idx to student_teacher_batch in iter, and hence pass through to trainer. source_cell_lens and target_coords_idx are now part of Sample, which is itself the components of ModelBatch. To tidy commit 4f8f62b Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Tue Nov 25 18:56:56 2025 +0100 instructions for sophie commit fa24fc1 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Tue Nov 25 16:36:52 2025 +0100 very hacky first pass of full masking_strategy_config for the student and teacher views. Much to fix up commit b193a50 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Mon Nov 24 17:13:37 2025 +0100 updated configs so code runs. Note default config to be overhauled still commit af9a3c1 Merge: 2905cb0 b452bd2 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Mon Nov 24 16:37:55 2025 +0100 merge with develop, include trainer idx_inv_rt, merged default_config, rm tokenizer_forecast commit 2905cb0 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Sat Nov 22 13:59:37 2025 +0000 fix masking for NPP-ATMS by correctly selecting final timestep mask and aligning between source and target. working for num_input_steps = 1, broken for > 1, compute_offsets_scatter_embed not working commit b9a60f3 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Fri Nov 21 18:38:40 2025 +0000 tidy up, remove unused arguments, types commit ece1dd0 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Fri Nov 21 16:22:27 2025 +0000 move build_views_for_stream into masker commit 1a418bf Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Fri Nov 21 12:54:33 2025 +0000 add max_num_samples functionality to tokenizer_masking and pass through in multi_stream_data_sampler. coords_per_cell is a bit nasty commit 91c3d7a Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Fri Nov 21 12:53:31 2025 +0000 add max_num_targets to era5 commit 647e4b2 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Thu Nov 20 18:31:45 2025 +0000 multiple idxs for each teacher, need to confirm for not student case, and updated ModelBatch for this commit 1806ae5 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Thu Nov 20 16:28:30 2025 +0000 tidy up, remove unused build_stream_views in tokenizer_masking commit 9b702c5 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 14:34:34 2025 +0100 Re-enabling inversion of targert ordering. commit 87ad45f Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Thu Nov 20 13:10:34 2025 +0000 add teacher num_views parameter to config commit b34b6da Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Thu Nov 20 13:09:19 2025 +0000 collect num_source_samples and num_target_samples, add loop over teacher masks hence allowing multiple teacher views, and add source_target_idx to keep track of which student belongs to which teacher commit b2be982 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Thu Nov 20 13:07:47 2025 +0000 fix typo in ModelBatch commit d18cf86 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:26:40 2025 +0100 Added todo commit e8ccb8d Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:22:26 2025 +0100 Added required reflexivity between source and target samples to Batch commit 5d5e999 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:21:31 2025 +0100 Linting problems but removed unused ViewMetaData dependence commit 3bca490 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:21:13 2025 +0100 linting commit 6a96065 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:20:42 2025 +0100 Linting commit c1d32fb Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:20:21 2025 +0100 linting commit 1b1654c Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 22:32:05 2025 +0100 Added basic support for use of ModelBatch class to define rough structure and interface. commit 848880b Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 20:06:41 2025 +0100 Renaming and minor clean up. commit 6d685c0 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 19:57:46 2025 +0100 Moved _get_student_teacher_masks() so that masks are generated for all streams first. commit ed26c02 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 19:57:23 2025 +0100 Changes to have spoofing on a per data reader sample commit 9fe94f5 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 19:30:48 2025 +0100 Changes necessary for spoofing flag per IOReaderData commit 4613f7a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 17:58:10 2025 +0100 Cleaned up parametrization commit 1235aab Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 17:47:40 2025 +0100 More refactoring. Code working again. commit 1e70f5c Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 17:09:20 2025 +0100 More refactoring and cleanup commit 46147d4 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 17:01:29 2025 +0100 More refactoring commit 81cf929 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 15:58:57 2025 +0100 Changes for better student teacher structure commit dfc03f2 Merge: a824bfc 31dc658 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 15:58:37 2025 +0100 Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit a824bfc Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 12:23:47 2025 +0100 Not working draft for restructuring commit 31dc658 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Wed Nov 19 11:04:29 2025 +0000 created function for _get_student_teacher_sample_data which returns the streams_data of the teacher and multiple streams_datas for the student views. commit 2536cec Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:40:26 2025 +0000 correct imports with new batch.py commit b3dfa2f Merge: 11ad4e6 c1580c4 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:36:15 2025 +0000 merge changes commit 11ad4e6 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:34:19 2025 +0000 basic if statement to yield the student and teacher views commit 36ea287 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:33:53 2025 +0000 slight restructure of ViewMetadata commit 66cf9cd Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:33:08 2025 +0000 added stream id to era5 config commit 3c26ddc Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:32:00 2025 +0000 updated default config training_config to allow student-teacher commit c1580c4 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 16:30:44 2025 +0100 Renaming commit 85fa139 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 16:28:46 2025 +0100 Comments commit dd6f85a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 15:30:22 2025 +0100 Added mode and refactored get_sample_data into separate function. commit 668912d Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 13:47:40 2025 +0100 Partially enabled correct handling of multiple input steps. commit c3b5c3b Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 12:02:17 2025 +0100 Added basic support for multi-step sources. commit ab9eecc Merge: a934f97 c733280 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 10:00:37 2025 +0100 Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit a934f97 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 09:58:19 2025 +0100 NOT WORKING: updating class to handle multiple input steps and improving overall structure commit c733280 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Mon Nov 17 18:32:40 2025 +0000 change view_metadata to dict in ModelInput commit 7d5c300 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Mon Nov 17 18:22:33 2025 +0000 draft of training_config in default_config commit 047b299 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Mon Nov 17 18:19:56 2025 +0000 draft changes to allow global local view generation in masker and tokenizer_masking. generate the mask, otherwise using batchify_source and batchify_target as before, with the capacity to remember what mask we have now when it comes to generating the targets. Update to inputs_metadata structure but not put in to practice commit 761e263 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Mon Nov 17 18:13:57 2025 +0000 update ViewMetadata spec commit 7f3c718 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Mon Nov 17 14:51:01 2025 +0100 Updating config to working version commit ae5a2e6 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Mon Nov 17 11:54:18 2025 +0000 added file with ModelBatch and SampleMetadata dataclasses commit debbb8f Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Mon Nov 17 12:28:07 2025 +0100 Changes to prepare_logging to apply index inversion commit 5d127bf Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Sun Nov 16 17:01:08 2025 +0100 Inversion of target output ordering to match input one in forcast mode. Unclear how to deal with it with MTM commit 8fa544d Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 14 20:43:57 2025 +0100 Removed unused parameters commit ce6c735 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 14 16:56:51 2025 +0100 Removing centroids options for embedding that was unused and should not be used. commit 0634105 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 14 09:59:13 2025 +0100 Enabled support for forecast. Cleaned up some bits and pieces. commit ec38123 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 14 08:27:21 2025 +0100 Fixed remaining problems that occured for NPP-ATMS and SYNOP. TODO: - Forecast still needs to be adapted - Some more cleanup of variable naming, return values etc commit db6f285 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 13 23:26:31 2025 +0100 Fixed linting commit 9229e48 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 13 23:19:21 2025 +0100 Minor cleanup commit a581405 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 13 23:17:29 2025 +0100 Working version for ERA5, NPP-ATMS. Problems with SYNOP with empty cell handling commit e4a9cc0 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 13 18:58:28 2025 +0100 Masking target is working in principle but errors when feeding data to the model. commit 51f437f Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 13 07:04:23 2025 +0100 NOT WORKING: Finished src, target still to be done. commit 81bd6eb Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 12 09:38:53 2025 +0100 NOT WORKING: initial draft for index-based masking. Implemented for random and healpix masking. Open issues with _coords_local, centroids and probably other things. * batch * adjusted to develop * one line * tiny fix * better messaging * incorporate requested changes * remove extra layer norms (#1589) * Iluise/fix lead time (#1571) * implement reader merge * working version of merge reader * linter * lint * fix lead time * update to develop * [1539][infra] Adds base config flag (#1573) * set base config (#1539) * update help message * longer variable name * longer variable name * rename config variable * rename base_configs --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> * Revised config and code quality improvements (#1541) * Partially revised config; model is still missing but proper setup of training_config and validation_config * Changes necessary due to changed position of time keys and of run_id * Handling of multiple loss terms / target_aux_calculators and non-LossPhysical ones. * Changed position of run_id in config * Add function to extract batch size from mode_cfg * Changed position of run_id in config * Changes due to revised config. Also proper handling of target_aux_calculator and various other details cleaned up * Revised config structure, in particular for losses, and related changes * Add missing copyright and minor changed to to_device() * Moved sanity checking from trainer here. Also learning_rate sub_part of config is passed to LRScheduler, which leads to major simplifications * Minor cleanups * Changes due to changed structure of losses in config * Changes due to changed structure of losses in config * Minor changes due to changed position of run_id in config * Minor changes to accomodate new config, in particular target_aux_calculator config * Support batch_size > 1. Clean up of various smaller parts * Clean up and implementation for batch_size > 1. * Fix to sharding problem with FSDP2 * Removed scatter offset computation which now happens on the fly in the model * Changes for revised config, simplify overall where possible * Fix issues with source-target sample generation and matching. Work in progress * Linting * Linting * Linting * Linting * Type hint * Linting * Linting * Linting * Renamed loss keys for consistency * implement reader merge * Long list of fixes and improvements * Enabled support for minimal configs without rate * Fixed validation. validation_io still broken * Fixed linting * Fixed problem with target filtering for loss computation for SSL losses * working version of merge reader * linter * lint * fix lead time * Re-instantiated per loss-fct source/target correspondences. Introduced idx and correspondence fields to per sample meta-data which makes correct correspondence for loss computation much easier. * Fixed problem with undefined variable * Revised config * Fixed bug with forecasting * Added sanity check for config * Fix bug with duplicate targets * Linting * Fixed problem when losses is not specified in validation config * Fix DINOv2 * Removed temporary patches; fixed properly in 10b7a28 * Linting * Patched validation IO. Needs to be fixed properly. * Removed unused function * Improved variable naming * Improved encapsulation of functionality: total_batch_size * Fixed broken inference * Fixed problem with test where incorrect config was used * Fixed processing and handling of spoof flag in loss calculation * Fixed problem with pure masking where forecast_steps were 0. Removed duplicate function introduced through merge problem * Fixed bug when output_streams is specified explicitly * Corrected config param for number of samples * Fixed bug in handling of spoof weight * Improved clarity of logging statements * Improved logging msgs * Fix sinkhorn knopp * Fix sinkhorn in multi-GPU mode * Removed some old comments * Fixed inference overwrites * Fixing empty output when masking * Intermediate stage to re-enable integration test * Adjusted thresholds * Renaming * Removing old config files * Adding copyright * Revised default_config. This is a minimal example config for simple training towards forecasting * Changed multiprocessing param * Adapation for new position of multiprocessing param * Adding example config that combines an SSL and physical loss term * More cleanup * Restoring some default values * Restoring default for decoder_type * update to develop * Fixed problem where parameter was expected in old config place * Fixed linting * Simplified interface * Re-enabled forecast step and location weighting * Linting * Using new option to have validate_before_training as an int arg that allows to specify number of samples; Added copyright statement * Added option to have validate_before_training as int argument (specifyiung the number of samples). Fixed some minor subtle problems in validate() to fully distinguish validation and testing. * Refactored correspondence parsing * Sophiex/dev/teacher overrides (#1557) * Add option to modify teacher TODO fix ema update * Fix EMA under teacher and student model differences * Attempt to revert newline * Raise error if teacher has weights not in student * Clessig/sophiex/dev/teacher overrides (#1585) * Simplified error message * Added support for target_and_aux configs * Fix bug that validation EMA params are not used * Removing unused/superfluous function * Removed debug statement * Changed config so that target_aux params are specified as dict at the appropriate place --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> * Fixed missing default value * Bilinear decoder: adapt code for batchsize > 1 (#1592) * Adapt code for batchsize > 1 * Fixed comment --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> * Changed defaults * Linting * Fixed linting issue * Reverting to ERA5-only as default * Fixed problem with train_continue --------- Co-authored-by: Ilaria Luise <luise.ilaria@gmail.com> Co-authored-by: iluise <72020169+iluise@users.noreply.github.com> Co-authored-by: Sophie Xhonneux <24638638+sophie-xhonneux@users.noreply.github.com> * [1601] Remove hardcoded optimizer variable eps (#1602) * rm hardcoded optimizer variable eps * set default for eps in optimizer * WeatherGenerator JSON reader (#1461) * split WeatherGenReader functionality to allow reading only JSON adding weathergen JSON reader to develop * informative error when metrics are not there * restore JSONreader after rebase * JSONreader mostly restored * MLFlow logging independent of JSON/zarr * linting, properly cheking fsteps, ens, samples in JSONreader * tiny change to restore the MergeReader * lint --------- Co-authored-by: Sebastian Buschow <sbuschow@santis-ln001.cscs.ch> Co-authored-by: Sebastian Buschow <sbuschow@santis-ln002.cscs.ch> Co-authored-by: iluise <72020169+iluise@users.noreply.github.com> Co-authored-by: Ilaria Luise <luise.ilaria@gmail.com> * Filter configs using enabled flag (#1604) * Partially revised config; model is still missing but proper setup of training_config and validation_config * Changes necessary due to changed position of time keys and of run_id * Handling of multiple loss terms / target_aux_calculators and non-LossPhysical ones. * Changed position of run_id in config * Add function to extract batch size from mode_cfg * Changed position of run_id in config * Changes due to revised config. Also proper handling of target_aux_calculator and various other details cleaned up * Revised config structure, in particular for losses, and related changes * Add missing copyright and minor changed to to_device() * Moved sanity checking from trainer here. Also learning_rate sub_part of config is passed to LRScheduler, which leads to major simplifications * Minor cleanups * Changes due to changed structure of losses in config * Changes due to changed structure of losses in config * Minor changes due to changed position of run_id in config * Minor changes to accomodate new config, in particular target_aux_calculator config * Support batch_size > 1. Clean up of various smaller parts * Clean up and implementation for batch_size > 1. * Fix to sharding problem with FSDP2 * Removed scatter offset computation which now happens on the fly in the model * Changes for revised config, simplify overall where possible * Fix issues with source-target sample generation and matching. Work in progress * Linting * Linting * Linting * Linting * Type hint * Linting * Linting * Linting * Renamed loss keys for consistency * implement reader merge * Long list of fixes and improvements * Enabled support for minimal configs without rate * Fixed validation. validation_io still broken * Fixed linting * Fixed problem with target filtering for loss computation for SSL losses * working version of merge reader * linter * lint * fix lead time * Re-instantiated per loss-fct source/target correspondences. Introduced idx and correspondence fields to per sample meta-data which makes correct correspondence for loss computation much easier. * Fixed problem with undefined variable * Revised config * Fixed bug with forecasting * Added sanity check for config * Fix bug with duplicate targets * Linting * Fixed problem when losses is not specified in validation config * Fix DINOv2 * Removed temporary patches; fixed properly in 10b7a28 * Linting * Patched validation IO. Needs to be fixed properly. * Removed unused function * Improved variable naming * Improved encapsulation of functionality: total_batch_size * Fixed broken inference * Fixed problem with test where incorrect config was used * Fixed processing and handling of spoof flag in loss calculation * Fixed problem with pure masking where forecast_steps were 0. Removed duplicate function introduced through merge problem * Fixed bug when output_streams is specified explicitly * Corrected config param for number of samples * Fixed bug in handling of spoof weight * Improved clarity of logging statements * Improved logging msgs * Fix sinkhorn knopp * Fix sinkhorn in multi-GPU mode * Removed some old comments * Fixed inference overwrites * Fixing empty output when masking * Intermediate stage to re-enable integration test * Adjusted thresholds * Renaming * Removing old config files * Adding copyright * Revised default_config. This is a minimal example config for simple training towards forecasting * Changed multiprocessing param * Adapation for new position of multiprocessing param * Adding example config that combines an SSL and physical loss term * More cleanup * Restoring some default values * Restoring default for decoder_type * update to develop * Fixed problem where parameter was expected in old config place * Fixed linting * Simplified interface * Re-enabled forecast step and location weighting * Linting * Using new option to have validate_before_training as an int arg that allows to specify number of samples; Added copyright statement * Added option to have validate_before_training as int argument (specifyiung the number of samples). Fixed some minor subtle problems in validate() to fully distinguish validation and testing. * Refactored correspondence parsing * Sophiex/dev/teacher overrides (#1557) * Add option to modify teacher TODO fix ema update * Fix EMA under teacher and student model differences * Attempt to revert newline * Raise error if teacher has weights not in student * Clessig/sophiex/dev/teacher overrides (#1585) * Simplified error message * Added support for target_and_aux configs * Fix bug that validation EMA params are not used * Removing unused/superfluous function * Removed debug statement * Changed config so that target_aux params are specified as dict at the appropriate place --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> * Fixed missing default value * Bilinear decoder: adapt code for batchsize > 1 (#1592) * Adapt code for batchsize > 1 * Fixed comment --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> * Changed defaults * Linting * Fixed linting issue * Reverting to ERA5-only as default * Fixed problem with train_continue * Adding filtering of config based on enabled/disabled --------- Co-authored-by: Ilaria Luise <luise.ilaria@gmail.com> Co-authored-by: iluise <72020169+iluise@users.noreply.github.com> Co-authored-by: Sophie Xhonneux <24638638+sophie-xhonneux@users.noreply.github.com> * Fixed bug with frequency parameter (#1611) * Fix problem with str indices in source/target config (#1619) * Move register & class tokens to be added earlier * Fix problem with str indices in source/target config * Fixed comment --------- Co-authored-by: Sophie Xhonneux <24638638+sophie-xhonneux@users.noreply.github.com> * Sorcha/dev/zarr3 compaction (#1450) * update dependencies to zarr3/experimental anemoi (#1253) * upper-bounding eccodes * zarr3 changes * linting * porblem with new evaluate dependencies (removed temporarily for testing common.io) * revert pyproject * first draft * commit to merge * commit to change branch * trying to remove metadata (too many zarr.json files) * zipstore * working (lot of debug prints to remove) * adding flag * WIP: adding flag * neaten up * wrapping zarruserwarning + linting * changes * fixing warnings * fixes * groups * change writer * switch group * reverting, issue is more complex than thought * post review changes * linting * fixing zarrio * linting * fixing create default arg * small change to fix export * linting * Simon/zarr3 compaction/refactoring (#1553) * make zarrio subclasses * store string literals for output storage in enum. * debugging * small fix for export * removing stream_dict lines in run_evaluation * removing timeit * pyproject.toml removing change * adding comment * removing zarrio writer in trainer.py * Set output dataset metadata in creation of zarr group to avoid incremental metadata writes. (#1593) * WIP:removing zarr_store flag * fixing duplication error * need mode="a" to avoid overwriting * adding comments for mode = "a" * debug w/prints * renaming reader to avoid conflict * debugging * renaming zarrio writer/reader to avoid conflicts * lint-check fix * type-check fixes * lint fix * type-check errors * revert lead_time fix * tidying --------- Co-authored-by: Simon Grasse <161459968+grassesi@users.noreply.github.com> Co-authored-by: Tim Hunter <tim.hunter@ecmwf.int> * Removed unused mask_params return value (#1626) * remove unused config parameter (#1632) * Update eval_config.yml (#1636) Add some supports that are missing as comments. * Jk/develop/1639 fix shard val forward (#1642) * rm model_forward assignment in val * rm clutter from diffusion branch * reverse if order * Clessig/develop/fix finetuning 1640 (#1641) * Fix bug with diagnostic streams * Avoid that empty decoders are allocated * Sophiex/dev/synop nppatms finetuning configs (#1644) * Doing something wrong * Make fine-tuning work * Rename sensibly * Enable multiple student views for one target for JEPA (#1617) * Enable multiple student views for one target * Improved readability * Fix test for empty targets in decoder creation (#1646) * add regions to integration tests (#1648) * Memory pinning (#1615) * add pin mem to IOReaderData * add pin mem to sample & modelbatch class * add pin mem to stream data * add pin mem to training loop * run /scripts/actions.sh lint * run ./scripts/actions.sh unit-test * ignore check torch import in package * move pinning to MultiStreamDataSampler * add _pin_tensor & _pin_tensor_list helper func * ruff the code * move back pin mem. to train loop * Remove the ignore-import-error rule and revert to the state before the change * create protocol for pinnable obj * remove pin_mem from IOReaderData class * add pin_memory to Trainer.validate * remove pin_memory from loader_params * Rever export/export_inference.py to state before c3fc9a7 * change name * revise Pinnable class description * add memory_pinning in config, train & va loop * use getattr to avoid CICD warning * use setattr to avoid CICD warning * disable pylint for self.source_tokens_lens * Fixed issues with memory pinning due to rebasing and also adjusted config position of flag * Reverting unadvert changes --------- Co-authored-by: Javad Kasravi <j.kasravi@fz-juelich.de> Co-authored-by: Javad Kasravi <jkasravi@santis-ln002.cscs.ch> Co-authored-by: Javad kasravi <kasravi66@gmail.com> * Allows for writing normalized samples; fixed config to keep it well-structured (#1653) * Skipping missing scores in JSONreader (#1655) * split WeatherGenReader functionality to allow reading only JSON adding weathergen JSON reader to develop * informative error when metrics are not there * restore JSONreader after rebase * JSONreader mostly restored * MLFlow logging independent of JSON/zarr * linting, properly cheking fsteps, ens, samples in JSONreader * tiny change to restore the MergeReader * lint * enabling JSONreader to skip plots and missing scores gracefully * required reformatting * move skipping of metrics to the reader class * slighly more explicit formulations --------- Co-authored-by: Sebastian Buschow <sbuschow@santis-ln001.cscs.ch> Co-authored-by: Sebastian Buschow <sbuschow@santis-ln002.cscs.ch> Co-authored-by: iluise <72020169+iluise@users.noreply.github.com> Co-authored-by: Ilaria Luise <luise.ilaria@gmail.com> * Remove mini_epoch backward compatibility v2 --------- Co-authored-by: iluise <72020169+iluise@users.noreply.github.com> Co-authored-by: Moritz Hauschulz <60788263+moritzhauschulz@users.noreply.github.com> Co-authored-by: kctezcan <kctezcan@gmail.com> Co-authored-by: Michael Tarnawa <18899420+mtar@users.noreply.github.com> Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> Co-authored-by: Ilaria Luise <luise.ilaria@gmail.com> Co-authored-by: Sophie Xhonneux <24638638+sophie-xhonneux@users.noreply.github.com> Co-authored-by: Julian Kuehnert <Jubeku@users.noreply.github.com> Co-authored-by: s6sebusc <49226935+s6sebusc@users.noreply.github.com> Co-authored-by: Sebastian Buschow <sbuschow@santis-ln001.cscs.ch> Co-authored-by: Sebastian Buschow <sbuschow@santis-ln002.cscs.ch> Co-authored-by: Sorcha Owens <73587207+enssow@users.noreply.github.com> Co-authored-by: Simon Grasse <161459968+grassesi@users.noreply.github.com> Co-authored-by: Tim Hunter <tim.hunter@ecmwf.int> Co-authored-by: Savvas Melidonis <79579567+SavvasMel@users.noreply.github.com> Co-authored-by: Javad Kasravi <j.kasravi@fz-juelich.de> Co-authored-by: Javad Kasravi <jkasravi@santis-ln002.cscs.ch> Co-authored-by: Javad kasravi <kasravi66@gmail.com>

* Remove mini_epoch backward compatibility * Update eval_config.yml (ecmwf#1584) * Repeat flag on develop (ecmwf#1562) * Squashed commit of the following: commit 9336fe1 Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Fri Dec 12 20:10:50 2025 +0100 requested changes commit dadde23 Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Mon Dec 8 18:54:44 2025 +0100 remove 1 line commit c871f9c Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Mon Dec 8 18:16:50 2025 +0100 remove unnecessary statement commit e3e46eb Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Mon Dec 8 12:49:03 2025 +0100 lint commit 559add7 Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Mon Dec 8 12:47:35 2025 +0100 rename flag and simplify cases commit f6e1c39 Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Thu Dec 4 21:07:42 2025 +0100 reset config and lint commit 27cb0c8 Author: moritzhauschulz <moritz.hauschulz@gmail.com> Date: Thu Dec 4 20:57:14 2025 +0100 repeat flag commit bf17bfe Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 16:53:51 2025 +0100 Updated config commit 7745e47 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 16:35:19 2025 +0100 Switched to lists of model / target stratgies commit 12bae15 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 15:01:07 2025 +0100 Fixes for diffusion commit 9065219 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 13:33:42 2025 +0100 Changed that model takes sample as input commit 3f52a8d Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 13:32:14 2025 +0100 Changed core functions to take sample as arg commit d36367a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 13:31:55 2025 +0100 Changed args to embedding commit b69b743 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 13:30:41 2025 +0100 Cleaned up comments and return values a bit commit 59510dd Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 00:01:50 2025 +0100 Fixed problem with non_blocking=True commit 69b53a6 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 00:00:42 2025 +0100 Removed old comments commit 51754fa Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Dec 4 00:00:20 2025 +0100 Fixed missing non_blocking=True in to_device() commit 2cd3971 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 23:56:41 2025 +0100 Completed migration to new batch class by removing reference to old list of lists commit 402b8de Author: Julian Kuehnert <Jubeku@users.noreply.github.com> Date: Wed Dec 3 17:11:15 2025 +0100 1390 - Adapt forward pass of new batch object (ecmwf#1391) * Add to device to ModelBatch, etc & adapt model TODO adapt validate and inference TODO test forecasting and multiple stream because predict changed substantially * Rename view to sample and fix validate * Revert predict function and fix inference * Fix invalid access with mask * Linting * Fixed handling of target_idxs and other minor issues --------- Co-authored-by: sophiex <24638638+sophie-xhonneux@users.noreply.github.com> Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> commit 9a1a6a9 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 13:12:52 2025 +0100 Re-enabled multi-source training commit 3641e1f Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:20:42 2025 +0100 Fix for integration test commit 9f5e49c Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:20:25 2025 +0100 Fixed uv.lock commit 33d9d8d Merge: 23e0267 c8a2aad Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:13:05 2025 +0100 Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit 23e0267 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:11:48 2025 +0100 Update commit c8a26d7 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:11:37 2025 +0100 Commit commit 2599ec2 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Dec 3 00:10:13 2025 +0100 Restructured code so that mask generation and application is cleanly separated commit c8a2aad Author: Tim Hunter <tim.hunter@ecmwf.int> Date: Tue Dec 2 17:06:56 2025 +0100 commenting tests commit 2b2c977 Author: Tim Hunter <tim.hunter@ecmwf.int> Date: Tue Dec 2 17:03:41 2025 +0100 linter warnings commit dc736e5 Merge: 6fe8561 7ff6e0b Author: Tim Hunter <tim.hunter@ecmwf.int> Date: Tue Dec 2 16:48:24 2025 +0100 merge with dev commit 6fe8561 Merge: 15b46e9 f136d60 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 14:16:41 2025 +0100 Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit 15b46e9 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Fri Nov 28 13:30:54 2025 +0100 fix indentation of else: assert False in _get_sample msds commit 4281aff Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Fri Nov 28 12:40:24 2025 +0100 restore loader_num_workers to 8 commit 6ea07e7 Author: Seb Hickman <56727418+shmh40@users.noreply.github.com> Date: Fri Nov 28 11:34:41 2025 +0000 restore masking_strategy to random Had placeholder for testing, now back to "random" for masking strategy in the base level of default_config commit 1a37dd1 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Fri Nov 28 10:31:43 2025 +0100 remove unused mask generation in diffusion_forecast commit 657094a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:59:39 2025 +0100 Fixed problem in engines introduced in recent commits merging develop. This fixes masking training commit d526dfc Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:37:02 2025 +0100 Restored masking as training mode. Not working due to NaN in prediction commit 6289959 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:36:38 2025 +0100 Removed duplicate lines due to mergeing commit bc8d23e Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:18:01 2025 +0100 More linting commit 47750a5 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:10:09 2025 +0100 Restoring masking as training_mode in default_config commit 0db8b62 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:09:41 2025 +0100 Linting commit e41a575 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:09:28 2025 +0100 Linting commit 03166a2 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:09:10 2025 +0100 Linting commit 652500a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:08:53 2025 +0100 Linting commit d8998a9 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:08:38 2025 +0100 Linting commit 8ef3a4c Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:08:04 2025 +0100 Simplified and clarified handling of default target_aux_calcualtor commit 3e4de7a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:07:51 2025 +0100 Linting commit 5f803e5 Merge: b47b0fa 0e2801b Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 08:03:02 2025 +0100 Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit b47b0fa Merge: 9b702c5 26f7b5b Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 28 07:09:19 2025 +0100 Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit 26f7b5b Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Thu Nov 27 15:33:22 2025 +0100 add diffusion forecast option for the data sampling, and with noise_level_rn in the metadata. The Trainer needs to be copied from Sophies branch, currently we only get so far commit 6d909d6 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Thu Nov 27 11:32:32 2025 +0100 add mask to SampleMetaData and add forecast_dt to Sample so it is accessible. Can specify the loss in the default config with student-teacher views commit e0d7346 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Wed Nov 26 14:31:52 2025 +0100 remove prints, pdb commit c27156c Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Wed Nov 26 12:35:03 2025 +0100 add SampleMetaData integration and functionality, and update masker to use SampleMetadata. Pass through source_cell_lens and target_coords_idx to student_teacher_batch in iter, and hence pass through to trainer. source_cell_lens and target_coords_idx are now part of Sample, which is itself the components of ModelBatch. To tidy commit 4f8f62b Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Tue Nov 25 18:56:56 2025 +0100 instructions for sophie commit fa24fc1 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Tue Nov 25 16:36:52 2025 +0100 very hacky first pass of full masking_strategy_config for the student and teacher views. Much to fix up commit b193a50 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Mon Nov 24 17:13:37 2025 +0100 updated configs so code runs. Note default config to be overhauled still commit af9a3c1 Merge: 2905cb0 b452bd2 Author: Sebastian Hickman <seb.hickman@gmail.com> Date: Mon Nov 24 16:37:55 2025 +0100 merge with develop, include trainer idx_inv_rt, merged default_config, rm tokenizer_forecast commit 2905cb0 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Sat Nov 22 13:59:37 2025 +0000 fix masking for NPP-ATMS by correctly selecting final timestep mask and aligning between source and target. working for num_input_steps = 1, broken for > 1, compute_offsets_scatter_embed not working commit b9a60f3 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Fri Nov 21 18:38:40 2025 +0000 tidy up, remove unused arguments, types commit ece1dd0 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Fri Nov 21 16:22:27 2025 +0000 move build_views_for_stream into masker commit 1a418bf Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Fri Nov 21 12:54:33 2025 +0000 add max_num_samples functionality to tokenizer_masking and pass through in multi_stream_data_sampler. coords_per_cell is a bit nasty commit 91c3d7a Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Fri Nov 21 12:53:31 2025 +0000 add max_num_targets to era5 commit 647e4b2 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Thu Nov 20 18:31:45 2025 +0000 multiple idxs for each teacher, need to confirm for not student case, and updated ModelBatch for this commit 1806ae5 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Thu Nov 20 16:28:30 2025 +0000 tidy up, remove unused build_stream_views in tokenizer_masking commit 9b702c5 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 14:34:34 2025 +0100 Re-enabling inversion of targert ordering. commit 87ad45f Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Thu Nov 20 13:10:34 2025 +0000 add teacher num_views parameter to config commit b34b6da Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Thu Nov 20 13:09:19 2025 +0000 collect num_source_samples and num_target_samples, add loop over teacher masks hence allowing multiple teacher views, and add source_target_idx to keep track of which student belongs to which teacher commit b2be982 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Thu Nov 20 13:07:47 2025 +0000 fix typo in ModelBatch commit d18cf86 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:26:40 2025 +0100 Added todo commit e8ccb8d Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:22:26 2025 +0100 Added required reflexivity between source and target samples to Batch commit 5d5e999 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:21:31 2025 +0100 Linting problems but removed unused ViewMetaData dependence commit 3bca490 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:21:13 2025 +0100 linting commit 6a96065 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:20:42 2025 +0100 Linting commit c1d32fb Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 20 08:20:21 2025 +0100 linting commit 1b1654c Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 22:32:05 2025 +0100 Added basic support for use of ModelBatch class to define rough structure and interface. commit 848880b Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 20:06:41 2025 +0100 Renaming and minor clean up. commit 6d685c0 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 19:57:46 2025 +0100 Moved _get_student_teacher_masks() so that masks are generated for all streams first. commit ed26c02 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 19:57:23 2025 +0100 Changes to have spoofing on a per data reader sample commit 9fe94f5 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 19:30:48 2025 +0100 Changes necessary for spoofing flag per IOReaderData commit 4613f7a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 17:58:10 2025 +0100 Cleaned up parametrization commit 1235aab Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 17:47:40 2025 +0100 More refactoring. Code working again. commit 1e70f5c Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 17:09:20 2025 +0100 More refactoring and cleanup commit 46147d4 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 17:01:29 2025 +0100 More refactoring commit 81cf929 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 15:58:57 2025 +0100 Changes for better student teacher structure commit dfc03f2 Merge: a824bfc 31dc658 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 15:58:37 2025 +0100 Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit a824bfc Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 19 12:23:47 2025 +0100 Not working draft for restructuring commit 31dc658 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Wed Nov 19 11:04:29 2025 +0000 created function for _get_student_teacher_sample_data which returns the streams_data of the teacher and multiple streams_datas for the student views. commit 2536cec Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:40:26 2025 +0000 correct imports with new batch.py commit b3dfa2f Merge: 11ad4e6 c1580c4 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:36:15 2025 +0000 merge changes commit 11ad4e6 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:34:19 2025 +0000 basic if statement to yield the student and teacher views commit 36ea287 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:33:53 2025 +0000 slight restructure of ViewMetadata commit 66cf9cd Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:33:08 2025 +0000 added stream id to era5 config commit 3c26ddc Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Tue Nov 18 17:32:00 2025 +0000 updated default config training_config to allow student-teacher commit c1580c4 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 16:30:44 2025 +0100 Renaming commit 85fa139 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 16:28:46 2025 +0100 Comments commit dd6f85a Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 15:30:22 2025 +0100 Added mode and refactored get_sample_data into separate function. commit 668912d Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 13:47:40 2025 +0100 Partially enabled correct handling of multiple input steps. commit c3b5c3b Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 12:02:17 2025 +0100 Added basic support for multi-step sources. commit ab9eecc Merge: a934f97 c733280 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 10:00:37 2025 +0100 Merge branch 'shmh40/dev/1270-idx-global-local' of github.com:ecmwf/WeatherGenerator into shmh40/dev/1270-idx-global-local commit a934f97 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Tue Nov 18 09:58:19 2025 +0100 NOT WORKING: updating class to handle multiple input steps and improving overall structure commit c733280 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Mon Nov 17 18:32:40 2025 +0000 change view_metadata to dict in ModelInput commit 7d5c300 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Mon Nov 17 18:22:33 2025 +0000 draft of training_config in default_config commit 047b299 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Mon Nov 17 18:19:56 2025 +0000 draft changes to allow global local view generation in masker and tokenizer_masking. generate the mask, otherwise using batchify_source and batchify_target as before, with the capacity to remember what mask we have now when it comes to generating the targets. Update to inputs_metadata structure but not put in to practice commit 761e263 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Mon Nov 17 18:13:57 2025 +0000 update ViewMetadata spec commit 7f3c718 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Mon Nov 17 14:51:01 2025 +0100 Updating config to working version commit ae5a2e6 Author: Sebastian Hickman <seb.hickman@ecmwf.int> Date: Mon Nov 17 11:54:18 2025 +0000 added file with ModelBatch and SampleMetadata dataclasses commit debbb8f Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Mon Nov 17 12:28:07 2025 +0100 Changes to prepare_logging to apply index inversion commit 5d127bf Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Sun Nov 16 17:01:08 2025 +0100 Inversion of target output ordering to match input one in forcast mode. Unclear how to deal with it with MTM commit 8fa544d Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 14 20:43:57 2025 +0100 Removed unused parameters commit ce6c735 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 14 16:56:51 2025 +0100 Removing centroids options for embedding that was unused and should not be used. commit 0634105 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 14 09:59:13 2025 +0100 Enabled support for forecast. Cleaned up some bits and pieces. commit ec38123 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Fri Nov 14 08:27:21 2025 +0100 Fixed remaining problems that occured for NPP-ATMS and SYNOP. TODO: - Forecast still needs to be adapted - Some more cleanup of variable naming, return values etc commit db6f285 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 13 23:26:31 2025 +0100 Fixed linting commit 9229e48 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 13 23:19:21 2025 +0100 Minor cleanup commit a581405 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 13 23:17:29 2025 +0100 Working version for ERA5, NPP-ATMS. Problems with SYNOP with empty cell handling commit e4a9cc0 Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 13 18:58:28 2025 +0100 Masking target is working in principle but errors when feeding data to the model. commit 51f437f Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Thu Nov 13 07:04:23 2025 +0100 NOT WORKING: Finished src, target still to be done. commit 81bd6eb Author: Christian Lessig <christian.lessig@ecmwf.int> Date: Wed Nov 12 09:38:53 2025 +0100 NOT WORKING: initial draft for index-based masking. Implemented for random and healpix masking. Open issues with _coords_local, centroids and probably other things. * batch * adjusted to develop * one line * tiny fix * better messaging * incorporate requested changes * remove extra layer norms (ecmwf#1589) * Iluise/fix lead time (ecmwf#1571) * implement reader merge * working version of merge reader * linter * lint * fix lead time * update to develop * [1539][infra] Adds base config flag (ecmwf#1573) * set base config (ecmwf#1539) * update help message * longer variable name * longer variable name * rename config variable * rename base_configs --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> * Revised config and code quality improvements (ecmwf#1541) * Partially revised config; model is still missing but proper setup of training_config and validation_config * Changes necessary due to changed position of time keys and of run_id * Handling of multiple loss terms / target_aux_calculators and non-LossPhysical ones. * Changed position of run_id in config * Add function to extract batch size from mode_cfg * Changed position of run_id in config * Changes due to revised config. Also proper handling of target_aux_calculator and various other details cleaned up * Revised config structure, in particular for losses, and related changes * Add missing copyright and minor changed to to_device() * Moved sanity checking from trainer here. Also learning_rate sub_part of config is passed to LRScheduler, which leads to major simplifications * Minor cleanups * Changes due to changed structure of losses in config * Changes due to changed structure of losses in config * Minor changes due to changed position of run_id in config * Minor changes to accomodate new config, in particular target_aux_calculator config * Support batch_size > 1. Clean up of various smaller parts * Clean up and implementation for batch_size > 1. * Fix to sharding problem with FSDP2 * Removed scatter offset computation which now happens on the fly in the model * Changes for revised config, simplify overall where possible * Fix issues with source-target sample generation and matching. Work in progress * Linting * Linting * Linting * Linting * Type hint * Linting * Linting * Linting * Renamed loss keys for consistency * implement reader merge * Long list of fixes and improvements * Enabled support for minimal configs without rate * Fixed validation. validation_io still broken * Fixed linting * Fixed problem with target filtering for loss computation for SSL losses * working version of merge reader * linter * lint * fix lead time * Re-instantiated per loss-fct source/target correspondences. Introduced idx and correspondence fields to per sample meta-data which makes correct correspondence for loss computation much easier. * Fixed problem with undefined variable * Revised config * Fixed bug with forecasting * Added sanity check for config * Fix bug with duplicate targets * Linting * Fixed problem when losses is not specified in validation config * Fix DINOv2 * Removed temporary patches; fixed properly in 10b7a28 * Linting * Patched validation IO. Needs to be fixed properly. * Removed unused function * Improved variable naming * Improved encapsulation of functionality: total_batch_size * Fixed broken inference * Fixed problem with test where incorrect config was used * Fixed processing and handling of spoof flag in loss calculation * Fixed problem with pure masking where forecast_steps were 0. Removed duplicate function introduced through merge problem * Fixed bug when output_streams is specified explicitly * Corrected config param for number of samples * Fixed bug in handling of spoof weight * Improved clarity of logging statements * Improved logging msgs * Fix sinkhorn knopp * Fix sinkhorn in multi-GPU mode * Removed some old comments * Fixed inference overwrites * Fixing empty output when masking * Intermediate stage to re-enable integration test * Adjusted thresholds * Renaming * Removing old config files * Adding copyright * Revised default_config. This is a minimal example config for simple training towards forecasting * Changed multiprocessing param * Adapation for new position of multiprocessing param * Adding example config that combines an SSL and physical loss term * More cleanup * Restoring some default values * Restoring default for decoder_type * update to develop * Fixed problem where parameter was expected in old config place * Fixed linting * Simplified interface * Re-enabled forecast step and location weighting * Linting * Using new option to have validate_before_training as an int arg that allows to specify number of samples; Added copyright statement * Added option to have validate_before_training as int argument (specifyiung the number of samples). Fixed some minor subtle problems in validate() to fully distinguish validation and testing. * Refactored correspondence parsing * Sophiex/dev/teacher overrides (ecmwf#1557) * Add option to modify teacher TODO fix ema update * Fix EMA under teacher and student model differences * Attempt to revert newline * Raise error if teacher has weights not in student * Clessig/sophiex/dev/teacher overrides (ecmwf#1585) * Simplified error message * Added support for target_and_aux configs * Fix bug that validation EMA params are not used * Removing unused/superfluous function * Removed debug statement * Changed config so that target_aux params are specified as dict at the appropriate place --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> * Fixed missing default value * Bilinear decoder: adapt code for batchsize > 1 (ecmwf#1592) * Adapt code for batchsize > 1 * Fixed comment --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> * Changed defaults * Linting * Fixed linting issue * Reverting to ERA5-only as default * Fixed problem with train_continue --------- Co-authored-by: Ilaria Luise <luise.ilaria@gmail.com> Co-authored-by: iluise <72020169+iluise@users.noreply.github.com> Co-authored-by: Sophie Xhonneux <24638638+sophie-xhonneux@users.noreply.github.com> * [1601] Remove hardcoded optimizer variable eps (ecmwf#1602) * rm hardcoded optimizer variable eps * set default for eps in optimizer * WeatherGenerator JSON reader (ecmwf#1461) * split WeatherGenReader functionality to allow reading only JSON adding weathergen JSON reader to develop * informative error when metrics are not there * restore JSONreader after rebase * JSONreader mostly restored * MLFlow logging independent of JSON/zarr * linting, properly cheking fsteps, ens, samples in JSONreader * tiny change to restore the MergeReader * lint --------- Co-authored-by: Sebastian Buschow <sbuschow@santis-ln001.cscs.ch> Co-authored-by: Sebastian Buschow <sbuschow@santis-ln002.cscs.ch> Co-authored-by: iluise <72020169+iluise@users.noreply.github.com> Co-authored-by: Ilaria Luise <luise.ilaria@gmail.com> * Filter configs using enabled flag (ecmwf#1604) * Partially revised config; model is still missing but proper setup of training_config and validation_config * Changes necessary due to changed position of time keys and of run_id * Handling of multiple loss terms / target_aux_calculators and non-LossPhysical ones. * Changed position of run_id in config * Add function to extract batch size from mode_cfg * Changed position of run_id in config * Changes due to revised config. Also proper handling of target_aux_calculator and various other details cleaned up * Revised config structure, in particular for losses, and related changes * Add missing copyright and minor changed to to_device() * Moved sanity checking from trainer here. Also learning_rate sub_part of config is passed to LRScheduler, which leads to major simplifications * Minor cleanups * Changes due to changed structure of losses in config * Changes due to changed structure of losses in config * Minor changes due to changed position of run_id in config * Minor changes to accomodate new config, in particular target_aux_calculator config * Support batch_size > 1. Clean up of various smaller parts * Clean up and implementation for batch_size > 1. * Fix to sharding problem with FSDP2 * Removed scatter offset computation which now happens on the fly in the model * Changes for revised config, simplify overall where possible * Fix issues with source-target sample generation and matching. Work in progress * Linting * Linting * Linting * Linting * Type hint * Linting * Linting * Linting * Renamed loss keys for consistency * implement reader merge * Long list of fixes and improvements * Enabled support for minimal configs without rate * Fixed validation. validation_io still broken * Fixed linting * Fixed problem with target filtering for loss computation for SSL losses * working version of merge reader * linter * lint * fix lead time * Re-instantiated per loss-fct source/target correspondences. Introduced idx and correspondence fields to per sample meta-data which makes correct correspondence for loss computation much easier. * Fixed problem with undefined variable * Revised config * Fixed bug with forecasting * Added sanity check for config * Fix bug with duplicate targets * Linting * Fixed problem when losses is not specified in validation config * Fix DINOv2 * Removed temporary patches; fixed properly in 10b7a28 * Linting * Patched validation IO. Needs to be fixed properly. * Removed unused function * Improved variable naming * Improved encapsulation of functionality: total_batch_size * Fixed broken inference * Fixed problem with test where incorrect config was used * Fixed processing and handling of spoof flag in loss calculation * Fixed problem with pure masking where forecast_steps were 0. Removed duplicate function introduced through merge problem * Fixed bug when output_streams is specified explicitly * Corrected config param for number of samples * Fixed bug in handling of spoof weight * Improved clarity of logging statements * Improved logging msgs * Fix sinkhorn knopp * Fix sinkhorn in multi-GPU mode * Removed some old comments * Fixed inference overwrites * Fixing empty output when masking * Intermediate stage to re-enable integration test * Adjusted thresholds * Renaming * Removing old config files * Adding copyright * Revised default_config. This is a minimal example config for simple training towards forecasting * Changed multiprocessing param * Adapation for new position of multiprocessing param * Adding example config that combines an SSL and physical loss term * More cleanup * Restoring some default values * Restoring default for decoder_type * update to develop * Fixed problem where parameter was expected in old config place * Fixed linting * Simplified interface * Re-enabled forecast step and location weighting * Linting * Using new option to have validate_before_training as an int arg that allows to specify number of samples; Added copyright statement * Added option to have validate_before_training as int argument (specifyiung the number of samples). Fixed some minor subtle problems in validate() to fully distinguish validation and testing. * Refactored correspondence parsing * Sophiex/dev/teacher overrides (ecmwf#1557) * Add option to modify teacher TODO fix ema update * Fix EMA under teacher and student model differences * Attempt to revert newline * Raise error if teacher has weights not in student * Clessig/sophiex/dev/teacher overrides (ecmwf#1585) * Simplified error message * Added support for target_and_aux configs * Fix bug that validation EMA params are not used * Removing unused/superfluous function * Removed debug statement * Changed config so that target_aux params are specified as dict at the appropriate place --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> * Fixed missing default value * Bilinear decoder: adapt code for batchsize > 1 (ecmwf#1592) * Adapt code for batchsize > 1 * Fixed comment --------- Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> * Changed defaults * Linting * Fixed linting issue * Reverting to ERA5-only as default * Fixed problem with train_continue * Adding filtering of config based on enabled/disabled --------- Co-authored-by: Ilaria Luise <luise.ilaria@gmail.com> Co-authored-by: iluise <72020169+iluise@users.noreply.github.com> Co-authored-by: Sophie Xhonneux <24638638+sophie-xhonneux@users.noreply.github.com> * Fixed bug with frequency parameter (ecmwf#1611) * Fix problem with str indices in source/target config (ecmwf#1619) * Move register & class tokens to be added earlier * Fix problem with str indices in source/target config * Fixed comment --------- Co-authored-by: Sophie Xhonneux <24638638+sophie-xhonneux@users.noreply.github.com> * Sorcha/dev/zarr3 compaction (ecmwf#1450) * update dependencies to zarr3/experimental anemoi (ecmwf#1253) * upper-bounding eccodes * zarr3 changes * linting * porblem with new evaluate dependencies (removed temporarily for testing common.io) * revert pyproject * first draft * commit to merge * commit to change branch * trying to remove metadata (too many zarr.json files) * zipstore * working (lot of debug prints to remove) * adding flag * WIP: adding flag * neaten up * wrapping zarruserwarning + linting * changes * fixing warnings * fixes * groups * change writer * switch group * reverting, issue is more complex than thought * post review changes * linting * fixing zarrio * linting * fixing create default arg * small change to fix export * linting * Simon/zarr3 compaction/refactoring (ecmwf#1553) * make zarrio subclasses * store string literals for output storage in enum. * debugging * small fix for export * removing stream_dict lines in run_evaluation * removing timeit * pyproject.toml removing change * adding comment * removing zarrio writer in trainer.py * Set output dataset metadata in creation of zarr group to avoid incremental metadata writes. (ecmwf#1593) * WIP:removing zarr_store flag * fixing duplication error * need mode="a" to avoid overwriting * adding comments for mode = "a" * debug w/prints * renaming reader to avoid conflict * debugging * renaming zarrio writer/reader to avoid conflicts * lint-check fix * type-check fixes * lint fix * type-check errors * revert lead_time fix * tidying --------- Co-authored-by: Simon Grasse <161459968+grassesi@users.noreply.github.com> Co-authored-by: Tim Hunter <tim.hunter@ecmwf.int> * Removed unused mask_params return value (ecmwf#1626) * remove unused config parameter (ecmwf#1632) * Update eval_config.yml (ecmwf#1636) Add some supports that are missing as comments. * Jk/develop/1639 fix shard val forward (ecmwf#1642) * rm model_forward assignment in val * rm clutter from diffusion branch * reverse if order * Clessig/develop/fix finetuning 1640 (ecmwf#1641) * Fix bug with diagnostic streams * Avoid that empty decoders are allocated * Sophiex/dev/synop nppatms finetuning configs (ecmwf#1644) * Doing something wrong * Make fine-tuning work * Rename sensibly * Enable multiple student views for one target for JEPA (ecmwf#1617) * Enable multiple student views for one target * Improved readability * Fix test for empty targets in decoder creation (ecmwf#1646) * add regions to integration tests (ecmwf#1648) * Memory pinning (ecmwf#1615) * add pin mem to IOReaderData * add pin mem to sample & modelbatch class * add pin mem to stream data * add pin mem to training loop * run /scripts/actions.sh lint * run ./scripts/actions.sh unit-test * ignore check torch import in package * move pinning to MultiStreamDataSampler * add _pin_tensor & _pin_tensor_list helper func * ruff the code * move back pin mem. to train loop * Remove the ignore-import-error rule and revert to the state before the change * create protocol for pinnable obj * remove pin_mem from IOReaderData class * add pin_memory to Trainer.validate * remove pin_memory from loader_params * Rever export/export_inference.py to state before c3fc9a7 * change name * revise Pinnable class description * add memory_pinning in config, train & va loop * use getattr to avoid CICD warning * use setattr to avoid CICD warning * disable pylint for self.source_tokens_lens * Fixed issues with memory pinning due to rebasing and also adjusted config position of flag * Reverting unadvert changes --------- Co-authored-by: Javad Kasravi <j.kasravi@fz-juelich.de> Co-authored-by: Javad Kasravi <jkasravi@santis-ln002.cscs.ch> Co-authored-by: Javad kasravi <kasravi66@gmail.com> * Allows for writing normalized samples; fixed config to keep it well-structured (ecmwf#1653) * Skipping missing scores in JSONreader (ecmwf#1655) * split WeatherGenReader functionality to allow reading only JSON adding weathergen JSON reader to develop * informative error when metrics are not there * restore JSONreader after rebase * JSONreader mostly restored * MLFlow logging independent of JSON/zarr * linting, properly cheking fsteps, ens, samples in JSONreader * tiny change to restore the MergeReader * lint * enabling JSONreader to skip plots and missing scores gracefully * required reformatting * move skipping of metrics to the reader class * slighly more explicit formulations --------- Co-authored-by: Sebastian Buschow <sbuschow@santis-ln001.cscs.ch> Co-authored-by: Sebastian Buschow <sbuschow@santis-ln002.cscs.ch> Co-authored-by: iluise <72020169+iluise@users.noreply.github.com> Co-authored-by: Ilaria Luise <luise.ilaria@gmail.com> * Remove mini_epoch backward compatibility v2 --------- Co-authored-by: iluise <72020169+iluise@users.noreply.github.com> Co-authored-by: Moritz Hauschulz <60788263+moritzhauschulz@users.noreply.github.com> Co-authored-by: kctezcan <kctezcan@gmail.com> Co-authored-by: Michael Tarnawa <18899420+mtar@users.noreply.github.com> Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int> Co-authored-by: Ilaria Luise <luise.ilaria@gmail.com> Co-authored-by: Sophie Xhonneux <24638638+sophie-xhonneux@users.noreply.github.com> Co-authored-by: Julian Kuehnert <Jubeku@users.noreply.github.com> Co-authored-by: s6sebusc <49226935+s6sebusc@users.noreply.github.com> Co-authored-by: Sebastian Buschow <sbuschow@santis-ln001.cscs.ch> Co-authored-by: Sebastian Buschow <sbuschow@santis-ln002.cscs.ch> Co-authored-by: Sorcha Owens <73587207+enssow@users.noreply.github.com> Co-authored-by: Simon Grasse <161459968+grassesi@users.noreply.github.com> Co-authored-by: Tim Hunter <tim.hunter@ecmwf.int> Co-authored-by: Savvas Melidonis <79579567+SavvasMel@users.noreply.github.com> Co-authored-by: Javad Kasravi <j.kasravi@fz-juelich.de> Co-authored-by: Javad Kasravi <jkasravi@santis-ln002.cscs.ch> Co-authored-by: Javad kasravi <kasravi66@gmail.com>

set base config (ecmwf#1539)

85d6d54

github-project-automation bot added this to WeatherGen-dev Jan 9, 2026

clessig reviewed Jan 11, 2026

View reviewed changes

mtar added 3 commits January 12, 2026 15:13

update help message

8f3721d

longer variable name

9512b78

longer variable name

5d23a10

grassesi reviewed Jan 12, 2026

View reviewed changes

mtar added 2 commits January 13, 2026 09:52

rename config variable

cb080da

rename base_configs

0753739

mtar requested a review from grassesi January 13, 2026 09:10

mtar marked this pull request as ready for review January 13, 2026 09:10

Merge branch 'develop' into 1539-set-base-config

030bc30

clessig approved these changes Jan 13, 2026

View reviewed changes

clessig removed the request for review from grassesi January 13, 2026 11:24

clessig merged commit 6c9365b into ecmwf:develop Jan 13, 2026
5 checks passed

github-project-automation bot moved this to Done in WeatherGen-dev Jan 13, 2026

tjhunter mentioned this pull request Feb 4, 2026

Update strategy for main/develop #1801

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1539][infra] Adds base config flag#1573

[1539][infra] Adds base config flag#1573
clessig merged 7 commits intoecmwf:developfrom
mtar:1539-set-base-config

mtar commented Jan 9, 2026 •

edited

Loading

Uh oh!

clessig left a comment

Uh oh!

Uh oh!

clessig Jan 11, 2026

Uh oh!

mtar Jan 12, 2026

Uh oh!

mtar commented Jan 12, 2026

Uh oh!

grassesi left a comment

Uh oh!

grassesi Jan 12, 2026

Uh oh!

grassesi Jan 12, 2026

Uh oh!

clessig commented Jan 12, 2026

Uh oh!

clessig left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def test_load_with_base_file(base_config_file, private_config_file):
		sub_cf = OmegaConf.load(overwrite_file)

Conversation

mtar commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue Number

Checklist before asking for review

Uh oh!

clessig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clessig Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

mtar Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

mtar commented Jan 12, 2026

Uh oh!

grassesi left a comment

Choose a reason for hiding this comment

Uh oh!

grassesi Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

grassesi Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

clessig commented Jan 12, 2026

Uh oh!

clessig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mtar commented Jan 9, 2026 •

edited

Loading