Skip to content

Remove xr.Dataset.load() functions from xarray Dataset objects to improve performance #394

@yantosca

Description

@yantosca

Your name

Bob Yantosca

Your affiliation

Harvard + GCST

Provide a clear and concise overview of the new feature requested.

In several GCPy functions we use the xr.Dataset.load() function, which loads an entire xarray Dataset object into memory, e.g.:

    # Prepare diff-of-diffs datasets if needed
    if diff_of_diffs:
        refdata, devdata = refdata.load(), devdata.load()
        second_ref, second_dev = second_ref.load(), second_dev.load()

But according to the xarray documentation, it should not be necessary to call xr.Dataset.load(), as xarray can read data only when it is needed:

Dataset.load(**kwargs)[source]
Trigger loading data into memory and return this dataset.

Data will be computed and/or loaded from disk or a remote source.

Unlike .compute, the original dataset is modified and returned.

Normally, it should not be necessary to call this method in user code, because all xarray functions should either work on deferred data or load data automatically. However, this method can be necessary when working with many file objects on disk.

Thus we should remove the xr.Dataset.load() commands from GCPy to improve the performance of I/O.

Will you be implementing this feature yourself?

Yes

Additional information

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions