Skip to content

Feature/to xarray options#66

Merged
colonesej merged 6 commits intodevelopfrom
feature/to_xarray_options
Oct 3, 2025
Merged

Feature/to xarray options#66
colonesej merged 6 commits intodevelopfrom
feature/to_xarray_options

Conversation

@colonesej
Copy link
Collaborator

@colonesej colonesej commented Aug 20, 2025

Description

  • Allow cli commands to run as python files. Testing and debugging are a bit easier this way. One can simply do
python -m pdb hat/tools/cli.py hat-extract-timeseries config.yml

of with vscode

launch.json

{
    "configurations": [
        {
            "name": "hat-timeseries-extractor",
            "type": "debugpy",
            "request": "launch",
            "program": "${workspaceFolder}/hat/cli.py",
            "console": "integratedTerminal",
            "args": [
                "hat-extract-timeseries",
                "${workspaceFolder}/config.yml",
            ],
            "stopOnEntry": false,
            "justMyCode": false,
        }
    ]
}
  • support to_xarray_options in grid options to control xarray dataset creation. Larger datasets would not be loaded lazily going out of memory quite easily for ensemble forecasts/reforecasts.

confuguration looks like

config.yml

station:
  file: "outlets.csv"
  # filters: None
  name: "ObsID"
  coords: 
    x: "LisfloodX"
    y: "LisfloodY"

grid:
  coord_x: "longitude"
  coord_y: "latitude"
  to_xarray_options:
    profile: "mars"
    chunks:
      "longitude": -1
      "latitude": -1
      "time": 1
      "number": 1
  source: 
    file:
      path: "fc.*.grb"
  # source:
  #   mars:
  #     class: ce
  #     date: "20230101"
  #     expver: 1
  #     hdate: ""
  #     levtype: sfc
  #     model: lisflood
  #     number: 1/to/51/by/1
  #     origin: ecmf
  #     param: "240023"
  #     step: "6/12/18/"
  #     stream: efrf
  #     time: "00:00:00"
  #     type: pf
  #     target: "pf_efas5_${date}.grb"

output:
  file: "lisflood_${YMD}.nc"
  • Update the expected config structure for extract-timeseries to be all key-value based and the previous one did not support mars type. The down side is having to know the arg names/call signature used in earthkit-data for each source type which can be annoying as they are not consistent, like, .from_source('json', filename) and .from_source('file', path)

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

@colonesej colonesej requested review from Oisin-M and corentincarton and removed request for corentincarton August 20, 2025 13:20
@colonesej
Copy link
Collaborator Author

just FYI, I manually created a pre-release module on the hpc named 0.8pre for Maliko, who needed the chunking option for processing ensemble forecasts data.

@colonesej colonesej merged commit 209beb0 into develop Oct 3, 2025
4 checks passed
@colonesej colonesej deleted the feature/to_xarray_options branch October 3, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant