Skip to content

Test speeding up backtest output saving to a single zarr #512

@Sukh-P

Description

@Sukh-P

Currently our backtest scripts output a .nc file for each forecast t0 specified in the backtest range, sometimes that can mean many 1000s of files, I have found that when opening all of these .nc files into a single xarray dataset like this: xr.open_mfdataset(f"{output_dir}/*.nc", parallel=True) can sometimes be very slow (even with the parallel=True parameter) when it's a large number of files (in my case this was around ~35000 files), I had some success speeding this up using python multiprocessing after following the advice here https://stackoverflow.com/questions/65587633/ways-to-speed-up-open-mfdataset-in-xarray this issue is to benchmark the different ways of doing this and see how much quicker we can make it

Metadata

Metadata

Assignees

No one assigned

    Labels

    ocf-internalAn issue to be addressed internally by Open Climate Fix and not suitable for external contributors

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions