Skip to content

Conversation

@xylar
Copy link
Collaborator

@xylar xylar commented May 27, 2025

This merge:

  • switches to xarray instead of netcdf4
  • it does not hard-code NETCDF3_CLASSIC format but instead uses write_netcdf() with the default format. This will be important for meshes with large numbers of vertices.
  • it uses numpy arrays directly rather than looping over vertices

Some formatting has been cleaned up for better consistency with
other parts of mpas_tools.

To enable these changes, circumcenter() has also been updated to work with numpy arrays.

@xylar xylar self-assigned this May 27, 2025
@xylar xylar force-pushed the clean-up-jigsaw-conversion branch from 9717387 to e5347ff Compare May 27, 2025 15:19
@xylar
Copy link
Collaborator Author

xylar commented May 27, 2025

Testing

I verified that an Icos30km base mesh produced with the updated function is exactly identical to one produced with the old code. Here's the test program to compare files:

import xarray as xr
import numpy as np
import argparse


def compare_datasets(path1, path2):
    ds1 = xr.open_dataset(path1)
    ds2 = xr.open_dataset(path2)

    vars1 = set(ds1.data_vars)
    vars2 = set(ds2.data_vars)

    failing_vars = []

    # Check for variables present in only one dataset
    only_in_ds1 = vars1 - vars2
    only_in_ds2 = vars2 - vars1

    if only_in_ds1:
        failing_vars.extend([(var, 'Only in dataset 1') for var in only_in_ds1])
    if only_in_ds2:
        failing_vars.extend([(var, 'Only in dataset 2') for var in only_in_ds2])

    # Compare variables present in both datasets
    for var in vars1 & vars2:
        da1 = ds1[var]
        da2 = ds2[var]

        # Check if dimensions match
        if da1.dims != da2.dims:
            failing_vars.append((var, 'Dimension mismatch'))
            continue

        # Check if data values match (using exact comparison)
        if not np.array_equal(da1.shape, da2.shape):
            failing_vars.append((var, 'Shape mismatch'))
            continue

        if not np.array_equal(da1.values, da2.values):
            failing_vars.append((var, 'Data mismatch'))
        else:
            print(f"Variable '{var}' matches successfully.")

    # Summary of results
    if failing_vars:
        print("Failing variables:")
        for var, reason in failing_vars:
            print(f"  - {var}: {reason}")
    else:
        print("All variables match perfectly.")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Compare two NetCDF datasets using xarray.")
    parser.add_argument("dataset1", help="Path to the first dataset")
    parser.add_argument("dataset2", help="Path to the second dataset")
    args = parser.parse_args()

    compare_datasets(args.dataset1, args.dataset2)

@xylar xylar marked this pull request as ready for review May 27, 2025 17:26
"""
# Authors: Phillip J. Wolfram, Matthew Hoffman and Xylar Asay-Davis

grid = NetCDFFile(output_name, 'w', format='NETCDF3_CLASSIC')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the line that mostly concerned time. We don't want to be restricted to JIGSAW meshes that are small enough for this old format.

@xylar
Copy link
Collaborator Author

xylar commented May 28, 2025

@matthewhoffman and @trhille, I would like your review on this as it will affect your workflows. I'm pretty confident in the work but we should test it on a more complicated mesh. If you tell me which mesh(es) you'd like to have this tested on, I can run a before-and-after comparison in Compass to make sure the resulting meshes are identical.

@trhille
Copy link
Collaborator

trhille commented May 28, 2025

@xylar, I would like to have this tested on the Greenland, AIS, Thwaites, and Humboldt meshes. Happy to help with any of those tests.

@xylar
Copy link
Collaborator Author

xylar commented May 28, 2025

Okay, thanks @trhille. I'll reach out if I run into trouble.

@xylar
Copy link
Collaborator Author

xylar commented May 29, 2025

@trhille, I was able to generate all 3 meshes on Perlmutter using the "old" method. I haven't yet set things up to test the "new" method but just wanted to let you know that it looks like no problem. I should have this tested either this evening or tomorrow.

@xylar xylar force-pushed the clean-up-jigsaw-conversion branch from e5347ff to 2a75050 Compare May 29, 2025 16:06
@xylar
Copy link
Collaborator Author

xylar commented May 29, 2025

For the Antarctic mesh, differences are small:

  - xVertex: Data mismatch (max abs diff: 3.0081719160079956e-07)
  - yVertex: Data mismatch (max abs diff: 3.197201294824481e-07)

Similarly for Greenland:

  - yVertex: Data mismatch (max abs diff: 5.331821739673615e-07)
  - xVertex: Data mismatch (max abs diff: 5.366746336221695e-07)

But I would have expected them to be exactly identical. It seems I have some work to do.

For Humboldt, they were identical as expected.

xylar added 2 commits May 29, 2025 14:20
This merge:
* switches to xarray instead of netcdf4
* it does not hard-code `NetCDF3-Classic` format but instead uses
  `write_netcdf()` with the default format.  This will be important
  for meshes with large numbers of vertices.
* it uses numpy arrays directly rather than looping over vertices

Some formatting has been cleaned up for better consistency with
other parts of `mpas_tools`.
@xylar xylar force-pushed the clean-up-jigsaw-conversion branch from 2a75050 to ffb136a Compare May 29, 2025 20:45
@xylar xylar force-pushed the clean-up-jigsaw-conversion branch from ffb136a to 232146a Compare May 30, 2025 07:46
@xylar
Copy link
Collaborator Author

xylar commented May 30, 2025

@matthewhoffman and @trhille, I don't seem to be able to get this to produce bit-for-bit identical results. It seems like vectorizing things in the way that I did just inherently leads to order-of-operations changes.

I do think the results are identical to machine precision. As an example, here are the vertices in the Antarctic mesh with the biggest change in xVertex:

Variable 'xVertex' differs. Max absolute difference: 3.0081719160079956e-07, at index: 466915

old method:

(2149013.33940856, 1443042.57600619, 0.0)

new method:

(2149013.33940826, 1443042.57600626, 0.0)

And similarly the biggest change in yVertex:

Variable 'yVertex' differs. Max absolute difference: 3.197201294824481e-07, at index: 794792

old method:

(2573113.05578246, -97387.13758518, 0.0)

new method:

(2573113.05578246, -97387.13758486, 0.0)

If these differences are big enough to bother you, I think there are alternative strategies I could use that would involve vectorizing calls to the old circumcenter() function. This approach would be slower and more complicated than the just calling circumcenter() once with vectors, the approach I am using here. But maybe that doesn't matter since this isn't a major bottleneck of our process at least for the size of mesh we currently have. I'm mostly trying to anticipate problems down the road as I make use of this code in Polaris.

@trhille
Copy link
Collaborator

trhille commented May 30, 2025

@xylar thanks for looking into this. I don't think these changes are big enough to matter. It would of course be nice to have them BFB with the previous version, but I'm struggling to think of a reason why this would actually be a problem. Even if we need to reproduce a mesh that has the same vertex locations — for example, we've been talking about making an extended version of the 1–10km GrIS mesh that keeps existing cell locations unchanged and just adds more cells for overlap with the ocean mesh — changes of this magnitude would still allow us to interpolate the optimized friction field onto the new mesh without issues.

@xylar
Copy link
Collaborator Author

xylar commented May 30, 2025

You could always replace the vertex locations that correspond to cells that are unchanged with the values from the previous mesh in this scenario if it's important. But I think it will be hard to get a mesh with identical cell locations out of JIGSAW anyway.

@matthewhoffman
Copy link
Member

@xylar , I agree with everything that @trhille said. I have no concerns about these small differences breaking backwards reproducibility.

@xylar
Copy link
Collaborator Author

xylar commented May 30, 2025

Okay, I consider this to be tested and vetted then. Anything else you both want to look over before approving?

@xylar
Copy link
Collaborator Author

xylar commented May 30, 2025

(It won't make its way into Compass for a bit, since I'm making these changes with an eye toward Polaris instead.)

Copy link
Collaborator

@trhille trhille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on visual inspection and @xylar's testing.

@xylar
Copy link
Collaborator Author

xylar commented Jun 3, 2025

Thanks @trhille and @matthewhoffman!!

@xylar xylar merged commit d2bbd6f into MPAS-Dev:master Jun 3, 2025
6 checks passed
@xylar xylar deleted the clean-up-jigsaw-conversion branch June 3, 2025 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants