Remove xr.Dataset.load() commands to improve performance#395
Merged
Conversation
msulprizio
reviewed
Jan 15, 2026
Contributor
msulprizio
left a comment
There was a problem hiding this comment.
These updates look good. My only comment is that the updates to benchmark_scrape_gchp_timers.py are not noted in the title or description of the PR so I wasn't sure if they were meant to be included here.
| return count[char_to_match] | ||
|
|
||
|
|
||
| def check_file_for_timing_info(text_file): |
Contributor
There was a problem hiding this comment.
Did you mean to include these changes in this PR? I don't see it documented in the description
317de8f to
6927da3
Compare
append_grid_corners.py file_regrid.py plot/compare_single_level.py plot/compare_zonal_mean.py benchmark/modules/benchmark_funcs.py benchmark/modules/benchmark_mass_cons_table.py benchmark/modules/benchmark_models_vs_obs.py examples/diagnostics/compare_diags.py - Removed .load() commands for xarray Dataset objects. We can let xarray decide when to load data into memory. This should speed up data I/O significantly. CHANGELOG.md - Updated accordingly Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
6927da3 to
6876f06
Compare
msulprizio
approved these changes
Jan 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Name and Institution (Required)
Name: Bob Yantosca
Institution: Harvard + GCST
Describe the update
This is the companion PR to #394. We have removed using the
.load()function for xarray Dataset objects, as this will load all of the data in the Dataset into memory at once. Instead, we can rely on xarray to decide when data needs to be loaded.Expected changes
This is a no-diff update. We expect that this will reduce the amount of time it takes to generate benchmark plots and tables.
Related Github Issue