-
Notifications
You must be signed in to change notification settings - Fork 70
Update jigsaw_to_netcdf() for efficiency and robustness
#632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9717387 to
e5347ff
Compare
TestingI verified that an Icos30km base mesh produced with the updated function is exactly identical to one produced with the old code. Here's the test program to compare files: import xarray as xr
import numpy as np
import argparse
def compare_datasets(path1, path2):
ds1 = xr.open_dataset(path1)
ds2 = xr.open_dataset(path2)
vars1 = set(ds1.data_vars)
vars2 = set(ds2.data_vars)
failing_vars = []
# Check for variables present in only one dataset
only_in_ds1 = vars1 - vars2
only_in_ds2 = vars2 - vars1
if only_in_ds1:
failing_vars.extend([(var, 'Only in dataset 1') for var in only_in_ds1])
if only_in_ds2:
failing_vars.extend([(var, 'Only in dataset 2') for var in only_in_ds2])
# Compare variables present in both datasets
for var in vars1 & vars2:
da1 = ds1[var]
da2 = ds2[var]
# Check if dimensions match
if da1.dims != da2.dims:
failing_vars.append((var, 'Dimension mismatch'))
continue
# Check if data values match (using exact comparison)
if not np.array_equal(da1.shape, da2.shape):
failing_vars.append((var, 'Shape mismatch'))
continue
if not np.array_equal(da1.values, da2.values):
failing_vars.append((var, 'Data mismatch'))
else:
print(f"Variable '{var}' matches successfully.")
# Summary of results
if failing_vars:
print("Failing variables:")
for var, reason in failing_vars:
print(f" - {var}: {reason}")
else:
print("All variables match perfectly.")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Compare two NetCDF datasets using xarray.")
parser.add_argument("dataset1", help="Path to the first dataset")
parser.add_argument("dataset2", help="Path to the second dataset")
args = parser.parse_args()
compare_datasets(args.dataset1, args.dataset2) |
| """ | ||
| # Authors: Phillip J. Wolfram, Matthew Hoffman and Xylar Asay-Davis | ||
|
|
||
| grid = NetCDFFile(output_name, 'w', format='NETCDF3_CLASSIC') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the line that mostly concerned time. We don't want to be restricted to JIGSAW meshes that are small enough for this old format.
|
@matthewhoffman and @trhille, I would like your review on this as it will affect your workflows. I'm pretty confident in the work but we should test it on a more complicated mesh. If you tell me which mesh(es) you'd like to have this tested on, I can run a before-and-after comparison in Compass to make sure the resulting meshes are identical. |
|
@xylar, I would like to have this tested on the Greenland, AIS, Thwaites, and Humboldt meshes. Happy to help with any of those tests. |
|
Okay, thanks @trhille. I'll reach out if I run into trouble. |
|
@trhille, I was able to generate all 3 meshes on Perlmutter using the "old" method. I haven't yet set things up to test the "new" method but just wanted to let you know that it looks like no problem. I should have this tested either this evening or tomorrow. |
e5347ff to
2a75050
Compare
|
For the Antarctic mesh, differences are small: Similarly for Greenland: But I would have expected them to be exactly identical. It seems I have some work to do. For Humboldt, they were identical as expected. |
This merge: * switches to xarray instead of netcdf4 * it does not hard-code `NetCDF3-Classic` format but instead uses `write_netcdf()` with the default format. This will be important for meshes with large numbers of vertices. * it uses numpy arrays directly rather than looping over vertices Some formatting has been cleaned up for better consistency with other parts of `mpas_tools`.
2a75050 to
ffb136a
Compare
ffb136a to
232146a
Compare
|
@matthewhoffman and @trhille, I don't seem to be able to get this to produce bit-for-bit identical results. It seems like vectorizing things in the way that I did just inherently leads to order-of-operations changes. I do think the results are identical to machine precision. As an example, here are the vertices in the Antarctic mesh with the biggest change in old method: new method: And similarly the biggest change in old method: new method: If these differences are big enough to bother you, I think there are alternative strategies I could use that would involve vectorizing calls to the old |
|
@xylar thanks for looking into this. I don't think these changes are big enough to matter. It would of course be nice to have them BFB with the previous version, but I'm struggling to think of a reason why this would actually be a problem. Even if we need to reproduce a mesh that has the same vertex locations — for example, we've been talking about making an extended version of the 1–10km GrIS mesh that keeps existing cell locations unchanged and just adds more cells for overlap with the ocean mesh — changes of this magnitude would still allow us to interpolate the optimized friction field onto the new mesh without issues. |
|
You could always replace the vertex locations that correspond to cells that are unchanged with the values from the previous mesh in this scenario if it's important. But I think it will be hard to get a mesh with identical cell locations out of JIGSAW anyway. |
|
Okay, I consider this to be tested and vetted then. Anything else you both want to look over before approving? |
|
(It won't make its way into Compass for a bit, since I'm making these changes with an eye toward Polaris instead.) |
trhille
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving based on visual inspection and @xylar's testing.
|
Thanks @trhille and @matthewhoffman!! |
This merge:
NETCDF3_CLASSICformat but instead useswrite_netcdf()with the default format. This will be important for meshes with large numbers of vertices.Some formatting has been cleaned up for better consistency with
other parts of
mpas_tools.To enable these changes,
circumcenter()has also been updated to work with numpy arrays.