-
-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Labels
ocf-internalAn issue to be addressed internally by Open Climate Fix and not suitable for external contributorsAn issue to be addressed internally by Open Climate Fix and not suitable for external contributors
Description
Currently our backtest scripts output a .nc file for each forecast t0 specified in the backtest range, sometimes that can mean many 1000s of files, I have found that when opening all of these .nc files into a single xarray dataset like this: xr.open_mfdataset(f"{output_dir}/*.nc", parallel=True) can sometimes be very slow (even with the parallel=True parameter) when it's a large number of files (in my case this was around ~35000 files), I had some success speeding this up using python multiprocessing after following the advice here https://stackoverflow.com/questions/65587633/ways-to-speed-up-open-mfdataset-in-xarray this issue is to benchmark the different ways of doing this and see how much quicker we can make it
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
ocf-internalAn issue to be addressed internally by Open Climate Fix and not suitable for external contributorsAn issue to be addressed internally by Open Climate Fix and not suitable for external contributors