-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Your name
Bob Yantosca
Your affiliation
Harvard + GCST
Provide a clear and concise overview of the new feature requested.
In several GCPy functions we use the xr.Dataset.load() function, which loads an entire xarray Dataset object into memory, e.g.:
# Prepare diff-of-diffs datasets if needed
if diff_of_diffs:
refdata, devdata = refdata.load(), devdata.load()
second_ref, second_dev = second_ref.load(), second_dev.load()But according to the xarray documentation, it should not be necessary to call xr.Dataset.load(), as xarray can read data only when it is needed:
Dataset.load(**kwargs)[source]
Trigger loading data into memory and return this dataset.Data will be computed and/or loaded from disk or a remote source.
Unlike .compute, the original dataset is modified and returned.
Normally, it should not be necessary to call this method in user code, because all xarray functions should either work on deferred data or load data automatically. However, this method can be necessary when working with many file objects on disk.
Thus we should remove the xr.Dataset.load() commands from GCPy to improve the performance of I/O.
Will you be implementing this feature yourself?
Yes
Additional information
No response