Skip to content

Remove xr.Dataset.load() commands to improve performance#395

Merged
yantosca merged 1 commit intodevfrom
feature/remove-xr-dataset-load
Jan 15, 2026
Merged

Remove xr.Dataset.load() commands to improve performance#395
yantosca merged 1 commit intodevfrom
feature/remove-xr-dataset-load

Conversation

@yantosca
Copy link
Contributor

Name and Institution (Required)

Name: Bob Yantosca
Institution: Harvard + GCST

Describe the update

This is the companion PR to #394. We have removed using the .load() function for xarray Dataset objects, as this will load all of the data in the Dataset into memory at once. Instead, we can rely on xarray to decide when data needs to be loaded.

Expected changes

This is a no-diff update. We expect that this will reduce the amount of time it takes to generate benchmark plots and tables.

Related Github Issue

@yantosca yantosca added this to the 1.7.0 milestone Jan 15, 2026
@yantosca yantosca requested a review from msulprizio January 15, 2026 17:02
@yantosca yantosca added the topic: Structural Modifications Related to GCPy structural modifications (as opposed to scientific updates) label Jan 15, 2026
Copy link
Contributor

@msulprizio msulprizio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These updates look good. My only comment is that the updates to benchmark_scrape_gchp_timers.py are not noted in the title or description of the PR so I wasn't sure if they were meant to be included here.

return count[char_to_match]


def check_file_for_timing_info(text_file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to include these changes in this PR? I don't see it documented in the description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msulprizio: Now fixed in 6876f06

@yantosca yantosca force-pushed the feature/remove-xr-dataset-load branch from 317de8f to 6927da3 Compare January 15, 2026 21:26
append_grid_corners.py
file_regrid.py
plot/compare_single_level.py
plot/compare_zonal_mean.py
benchmark/modules/benchmark_funcs.py
benchmark/modules/benchmark_mass_cons_table.py
benchmark/modules/benchmark_models_vs_obs.py
examples/diagnostics/compare_diags.py
- Removed .load() commands for xarray Dataset objects.  We can let
  xarray decide when to load data into memory.  This should speed up
  data I/O significantly.

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca yantosca force-pushed the feature/remove-xr-dataset-load branch from 6927da3 to 6876f06 Compare January 15, 2026 21:43
@yantosca yantosca requested a review from msulprizio January 15, 2026 21:44
@yantosca yantosca self-assigned this Jan 15, 2026
@yantosca yantosca added the category: Bug Fix Fixes a bug that was previously reported label Jan 15, 2026
@yantosca yantosca merged commit e337595 into dev Jan 15, 2026
20 checks passed
@yantosca yantosca deleted the feature/remove-xr-dataset-load branch January 15, 2026 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: Bug Fix Fixes a bug that was previously reported topic: Structural Modifications Related to GCPy structural modifications (as opposed to scientific updates)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove xr.Dataset.load() functions from xarray Dataset objects to improve performance

2 participants