Update `jigsaw_to_netcdf()` for efficiency and robustness #632

xylar · 2025-05-27T15:05:56Z

This merge:

switches to xarray instead of netcdf4
it does not hard-code NETCDF3_CLASSIC format but instead uses write_netcdf() with the default format. This will be important for meshes with large numbers of vertices.
it uses numpy arrays directly rather than looping over vertices

Some formatting has been cleaned up for better consistency with
other parts of mpas_tools.

To enable these changes, circumcenter() has also been updated to work with numpy arrays.

xylar · 2025-05-27T17:25:25Z

Testing

I verified that an Icos30km base mesh produced with the updated function is exactly identical to one produced with the old code. Here's the test program to compare files:

import xarray as xr
import numpy as np
import argparse


def compare_datasets(path1, path2):
    ds1 = xr.open_dataset(path1)
    ds2 = xr.open_dataset(path2)

    vars1 = set(ds1.data_vars)
    vars2 = set(ds2.data_vars)

    failing_vars = []

    # Check for variables present in only one dataset
    only_in_ds1 = vars1 - vars2
    only_in_ds2 = vars2 - vars1

    if only_in_ds1:
        failing_vars.extend([(var, 'Only in dataset 1') for var in only_in_ds1])
    if only_in_ds2:
        failing_vars.extend([(var, 'Only in dataset 2') for var in only_in_ds2])

    # Compare variables present in both datasets
    for var in vars1 & vars2:
        da1 = ds1[var]
        da2 = ds2[var]

        # Check if dimensions match
        if da1.dims != da2.dims:
            failing_vars.append((var, 'Dimension mismatch'))
            continue

        # Check if data values match (using exact comparison)
        if not np.array_equal(da1.shape, da2.shape):
            failing_vars.append((var, 'Shape mismatch'))
            continue

        if not np.array_equal(da1.values, da2.values):
            failing_vars.append((var, 'Data mismatch'))
        else:
            print(f"Variable '{var}' matches successfully.")

    # Summary of results
    if failing_vars:
        print("Failing variables:")
        for var, reason in failing_vars:
            print(f"  - {var}: {reason}")
    else:
        print("All variables match perfectly.")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Compare two NetCDF datasets using xarray.")
    parser.add_argument("dataset1", help="Path to the first dataset")
    parser.add_argument("dataset2", help="Path to the second dataset")
    args = parser.parse_args()

    compare_datasets(args.dataset1, args.dataset2)

xylar · 2025-05-27T17:26:50Z

conda_package/mpas_tools/mesh/creation/jigsaw_to_netcdf.py

    """
-    # Authors: Phillip J. Wolfram, Matthew Hoffman and Xylar Asay-Davis
-
-    grid = NetCDFFile(output_name, 'w', format='NETCDF3_CLASSIC')


This was the line that mostly concerned time. We don't want to be restricted to JIGSAW meshes that are small enough for this old format.

xylar · 2025-05-28T13:02:26Z

@matthewhoffman and @trhille, I would like your review on this as it will affect your workflows. I'm pretty confident in the work but we should test it on a more complicated mesh. If you tell me which mesh(es) you'd like to have this tested on, I can run a before-and-after comparison in Compass to make sure the resulting meshes are identical.

trhille · 2025-05-28T15:01:54Z

@xylar, I would like to have this tested on the Greenland, AIS, Thwaites, and Humboldt meshes. Happy to help with any of those tests.

xylar · 2025-05-28T15:26:06Z

Okay, thanks @trhille. I'll reach out if I run into trouble.

xylar · 2025-05-29T15:56:21Z

@trhille, I was able to generate all 3 meshes on Perlmutter using the "old" method. I haven't yet set things up to test the "new" method but just wanted to let you know that it looks like no problem. I should have this tested either this evening or tomorrow.

xylar · 2025-05-29T17:14:39Z

For the Antarctic mesh, differences are small:

  - xVertex: Data mismatch (max abs diff: 3.0081719160079956e-07)
  - yVertex: Data mismatch (max abs diff: 3.197201294824481e-07)

Similarly for Greenland:

  - yVertex: Data mismatch (max abs diff: 5.331821739673615e-07)
  - xVertex: Data mismatch (max abs diff: 5.366746336221695e-07)

But I would have expected them to be exactly identical. It seems I have some work to do.

For Humboldt, they were identical as expected.

This merge: * switches to xarray instead of netcdf4 * it does not hard-code `NetCDF3-Classic` format but instead uses `write_netcdf()` with the default format. This will be important for meshes with large numbers of vertices. * it uses numpy arrays directly rather than looping over vertices Some formatting has been cleaned up for better consistency with other parts of `mpas_tools`.

xylar · 2025-05-30T07:54:59Z

@matthewhoffman and @trhille, I don't seem to be able to get this to produce bit-for-bit identical results. It seems like vectorizing things in the way that I did just inherently leads to order-of-operations changes.

I do think the results are identical to machine precision. As an example, here are the vertices in the Antarctic mesh with the biggest change in xVertex:

Variable 'xVertex' differs. Max absolute difference: 3.0081719160079956e-07, at index: 466915

old method:

(2149013.33940856, 1443042.57600619, 0.0)

new method:

(2149013.33940826, 1443042.57600626, 0.0)

And similarly the biggest change in yVertex:

Variable 'yVertex' differs. Max absolute difference: 3.197201294824481e-07, at index: 794792

old method:

(2573113.05578246, -97387.13758518, 0.0)

new method:

(2573113.05578246, -97387.13758486, 0.0)

If these differences are big enough to bother you, I think there are alternative strategies I could use that would involve vectorizing calls to the old circumcenter() function. This approach would be slower and more complicated than the just calling circumcenter() once with vectors, the approach I am using here. But maybe that doesn't matter since this isn't a major bottleneck of our process at least for the size of mesh we currently have. I'm mostly trying to anticipate problems down the road as I make use of this code in Polaris.

trhille · 2025-05-30T13:42:49Z

@xylar thanks for looking into this. I don't think these changes are big enough to matter. It would of course be nice to have them BFB with the previous version, but I'm struggling to think of a reason why this would actually be a problem. Even if we need to reproduce a mesh that has the same vertex locations — for example, we've been talking about making an extended version of the 1–10km GrIS mesh that keeps existing cell locations unchanged and just adds more cells for overlap with the ocean mesh — changes of this magnitude would still allow us to interpolate the optimized friction field onto the new mesh without issues.

xylar · 2025-05-30T13:46:43Z

You could always replace the vertex locations that correspond to cells that are unchanged with the values from the previous mesh in this scenario if it's important. But I think it will be hard to get a mesh with identical cell locations out of JIGSAW anyway.

matthewhoffman · 2025-05-30T15:03:42Z

@xylar , I agree with everything that @trhille said. I have no concerns about these small differences breaking backwards reproducibility.

xylar · 2025-05-30T15:43:20Z

Okay, I consider this to be tested and vetted then. Anything else you both want to look over before approving?

xylar · 2025-05-30T15:44:38Z

(It won't make its way into Compass for a bit, since I'm making these changes with an eye toward Polaris instead.)

trhille

Approving based on visual inspection and @xylar's testing.

xylar · 2025-06-03T12:56:47Z

Thanks @trhille and @matthewhoffman!!

xylar self-assigned this May 27, 2025

xylar added clean up conda package labels May 27, 2025

xylar force-pushed the clean-up-jigsaw-conversion branch from 9717387 to e5347ff Compare May 27, 2025 15:19

xylar marked this pull request as ready for review May 27, 2025 17:26

xylar commented May 27, 2025

View reviewed changes

xylar mentioned this pull request May 27, 2025

Add support for NETCDF3_64BIT_DATA in write_netcdf() #633

Merged

xylar requested review from matthewhoffman and trhille May 28, 2025 12:59

xylar force-pushed the clean-up-jigsaw-conversion branch from e5347ff to 2a75050 Compare May 29, 2025 16:06

xylar added 2 commits May 29, 2025 14:20

Update circumcenter() to work with numpy arrays

231ec68

xylar force-pushed the clean-up-jigsaw-conversion branch from 2a75050 to ffb136a Compare May 29, 2025 20:45

Add test for new circumcenter()

232146a

xylar force-pushed the clean-up-jigsaw-conversion branch from ffb136a to 232146a Compare May 30, 2025 07:46

trhille approved these changes May 30, 2025

View reviewed changes

matthewhoffman approved these changes Jun 2, 2025

View reviewed changes

xylar merged commit d2bbd6f into MPAS-Dev:master Jun 3, 2025
6 checks passed

xylar deleted the clean-up-jigsaw-conversion branch June 3, 2025 12:57

Update jigsaw_to_netcdf() for efficiency and robustness #632

Update jigsaw_to_netcdf() for efficiency and robustness #632

Uh oh!

Conversation

xylar commented May 27, 2025

Uh oh!

xylar commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

xylar May 27, 2025

Choose a reason for hiding this comment

Uh oh!

xylar commented May 28, 2025

Uh oh!

trhille commented May 28, 2025

Uh oh!

xylar commented May 28, 2025

Uh oh!

xylar commented May 29, 2025

Uh oh!

xylar commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xylar commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trhille commented May 30, 2025

Uh oh!

xylar commented May 30, 2025

Uh oh!

matthewhoffman commented May 30, 2025

Uh oh!

xylar commented May 30, 2025

Uh oh!

xylar commented May 30, 2025

Uh oh!

trhille left a comment

Choose a reason for hiding this comment

Uh oh!

xylar commented Jun 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update `jigsaw_to_netcdf()` for efficiency and robustness #632

Update `jigsaw_to_netcdf()` for efficiency and robustness #632

xylar commented May 27, 2025 •

edited

Loading

xylar commented May 29, 2025 •

edited

Loading

xylar commented May 30, 2025 •

edited

Loading