Skip to content

Comments

Integration of MEC workflow#110

Open
andreaspauling wants to merge 35 commits intomainfrom
MRB-534-Implement-rule-to-generate-namelist
Open

Integration of MEC workflow#110
andreaspauling wants to merge 35 commits intomainfrom
MRB-534-Implement-rule-to-generate-namelist

Conversation

@andreaspauling
Copy link

Add the MEC workflow. The new parts are in green in the DAG: snakemake_dag.pdf

For each valid date a MEC case is set up and run. This includes:

  • creating the directory structure
  • adding the observations
  • organizing the model input including past runs depending on the config
  • rendering the MEC namelist
  • executing MEC for all dates with complete data for all leadtimes (excludes the first ones of the period)
  • storing the final feedback file in a separate place.

All MEC cases can be removed once the final feedback file is produced (removal not yet implemented).

Remarks/Questions:

  • From the DAG I wondered if the green MEC part should also depend on execute_inference?
  • Running evalml with MEC still occasionally fails: problems with the baseline or prepare_mec_input (a number of reasons possible)
  • Topics already raised by Francesco:
    • put folder mec/ in data/mec in order not to mix up init and valid time (MEC is valid time oriented)
    • check globbing options in MEC namelist with DWD (not documented, only FCR_TIME is supported afaik, * etc not). The aim is to avoid copying data.

dnerini and others added 30 commits October 7, 2025 14:01
* Distinguish between primary runs ('candidates') and secondary runs

* Docstrings
* Adopt forecast intervals including the end point

* Fix parsing

* Experiments work

* Update config/forecasters.yaml

* Align init times to availabiliy of COE

* run pre-commit

* Change README to COSMO-E availability

---------

Co-authored-by: Jonas Bhend <jonasbhend@users.noreply.github.com>
Co-authored-by: Jonas Bhend <jonas.bhend@meteoswiss.ch>
* draft changes

* rename workspace resources dir

* working for config/forecasters.yaml

* improve logging

* works for interpolators.yaml

* re-add get_leadtime function

* refactor run directives into script
* add region averages

* add regions to config

* Add regions to verification module, scripts, and rules

* add stratification to forecaster config and fix typo

* fix dict indexing

* fix append error

* read lon/lat from obs dataset

* Add inner verification domain

* Add missing dependency

* add plots by region

* Add regions to dashboard

* Fix dashboard

* Add region name and initializations to plot title (and remove header div)

* Add support for multiple regions

* Fix legend
Copy link
Member

@dnerini dnerini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking very nice @andreaspauling ! I have added a few initial thoughts form a quick look into your changes, but I plan to have a closer look soon!

rule prepare_mec_input:
input:
src_dir=OUT_ROOT / "data/runs/{run_id}/{init_time}/grib",
inference_ok=OUT_ROOT / f"run_inference_all.{EXPERIMENT_HASH}.ok",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inference_ok=OUT_ROOT / f"run_inference_all.{EXPERIMENT_HASH}.ok",
inference_okfile=rules.execute_inference.output.okfile,,

# prepare_mec_input: setup run dir, gather observations and model data in the run dir for the actual init time
rule prepare_mec_input:
input:
src_dir=OUT_ROOT / "data/runs/{run_id}/{init_time}/grib",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no rule giving this an output, so this should to the very list trigger some warnings from snakemake. You could specify it as a parameter instead.

set -euo pipefail

# Run MEC inside sarus container
# Note: pull command currently needed only once to download the container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pull command could then be factored out into a separate rule that is run only once before launching all the parallel MEC jobs

RESULTS_DIR = OUT_ROOT / "results" / EXPERIMENT_NAME

# prefer one rule because snakemake complains about ambiguous rules (same output)
ruleorder: prepare_inference_forecaster > prepare_inference_interpolator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to have a closer look at this, I don0't understand why this problem would appear with your changes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it snakemake complains. Thanks for having a closer look at it.

expand(
OUT_ROOT / "data/runs/{run_id}/fdbk_files/verSYNOP_{init_time}.nc",
init_time=[t.strftime("%Y%m%d%H%M") for t in REFTIMES_MEC],
run_id=collect_all_candidates(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run_id=collect_all_candidates(),
run_id=CANDIDATES,

run_id=CANDIDATES,
),
output:
inference_ok=touch(OUT_ROOT / f"run_inference_all.{EXPERIMENT_HASH}.ok")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is thought to ensure all inference output is there before the MEC rules start

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants