Skip to content

Conversation

@VeckoTheGecko
Copy link
Contributor

This PR supercedes #2447
implementing the following additional functionality:

  • Refactor out the FieldSet.from_nemo() functionality into a convert.py module with a nemo_to_sgrid() function. Also added corresponding tests.
  • Make the input parameters of the nemo_to_sgrid function more closely align with the input datasets
  • Update FieldSet.from_sgrid_conventions to infer mesh from the metadata included in the dataset
  • Add get_value_by_id() function to Grid*DMetadata classes allowing to easily refer to parts of the SGRID metadata model

Future work:

  • Improve logging and metadata conversion convert.py - its important that we refactor this module so that its clearer, and have logging so that changes to the input dataset are communicated to the user. To support this, we need
  • Better testing against different NEMO datasets
    • Reach out to the NEMO community asking for ncdump output so that better see the types of nemo models out there

I'll make an issue to track this future work.

erikvansebille and others added 28 commits January 9, 2026 16:24
And update of tutorial_nemo_curvilinear
And updating tutorial
ANd expading comments on loading in the dataset
To use same ordering as in XLinear
This requires smaller selection_dict for the isel, so hopefully faster code
And also separating offset calculation into its own helper function
It's not clear why this is here, nor why removing it causes a test failure. To be investigated another time.

EDIT: This was introduced in
Parcels-code@c311fba
- though we're investigating if this can be implemented another way
  since there should be no particular difference with NEMO
Gradually reducing the dependency on the `mesh` param
Updates the API for conversion to be more closely aligned with the input data. Also handles the U and V fields separately - correctly assigning the dimension naming before merging into a single dataset.
Avoid assuming there's a U and V field. Maybe this should be refactored later...
@VeckoTheGecko VeckoTheGecko force-pushed the fieldset_from_nemo-nick branch from 6d12f21 to 0b9c787 Compare January 9, 2026 15:25
@VeckoTheGecko
Copy link
Contributor Author

I'll do a full self review next week. Feel free to take a look in the meantime @erikvansebille if you'd like

@VeckoTheGecko
Copy link
Contributor Author

Also still a few failing tests that I need to sort out...

Copy link
Member

@erikvansebille erikvansebille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice already. Some first comments below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and other tutorials still need to be updated to use the new convert workflow? Do you want me to do that? Could be a good test-case how user-friendly the convert functions are

Copy link
Contributor Author

@VeckoTheGecko VeckoTheGecko Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they would need to be updated.

Do you want me to do that? Could be a good test-case how user-friendly the convert functions are

That would be great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to change the netcdf ingestion in the Nemo3D-tutorial, but get the error below

Can you check what's going on, @VeckoTheGecko?

data_folder = parcels.download_example_dataset("NemoNorthSeaORCA025-N006_data")
ds_fields = xr.open_mfdataset(
    data_folder.glob("ORCA*.nc"),
    data_vars="minimal",
    coords="minimal",
    compat="override",
)
ds_coords = xr.open_dataset(data_folder / "coordinates.nc", decode_times=False)
ds_fset = parcels.convert.nemo_to_sgrid(U=ds_fields["uo"], V=ds_fields["vo"], W=ds_fields["wo"], coords=ds_coords)
fieldset = parcels.FieldSet.from_sgrid_conventions(ds_fset)
---------------------------------------------------------------------------
CoordinateValidationError                 Traceback (most recent call last)
Cell In[7], line 9
      2 ds_fields = xr.open_mfdataset(
      3     data_folder.glob("ORCA*.nc"),
      4     data_vars="minimal",
      5     coords="minimal",
      6     compat="override",
      7 )
      8 ds_coords = xr.open_dataset(data_folder / "coordinates.nc", decode_times=False)
----> 9 ds_fset = parcels.convert.nemo_to_sgrid(U=ds_fields["uo"], V=ds_fields["vo"], coords=ds_coords)
     10 fieldset = parcels.FieldSet.from_sgrid_conventions(ds_fset)

File ~/Codes/ParcelsCode/src/parcels/convert.py:224, in nemo_to_sgrid(coords, **fields)
    222 ds = _discover_U_and_V(ds, _NEMO_CF_STANDARD_NAME_FALLBACKS)
    223 ds = _maybe_create_depth_dim(ds)
--> 224 ds = _maybe_bring_UV_depths_to_depth(ds)
    225 ds = _drop_unused_dimensions_and_coords(ds, _NEMO_DIMENSION_COORD_NAMES)
    226 ds = _assign_dims_as_coords(ds, _NEMO_DIMENSION_COORD_NAMES)

File ~/Codes/ParcelsCode/src/parcels/convert.py:58, in _maybe_bring_UV_depths_to_depth(ds)
     56 def _maybe_bring_UV_depths_to_depth(ds):
     57     if "U" in ds.variables and "depthu" in ds.U.coords and "depth" in ds.coords:
---> 58         ds["U"] = ds["U"].assign_coords(depthu=ds["depth"].values).rename({"depthu": "depth"})
     59     if "V" in ds.variables and "depthv" in ds.V.coords and "depth" in ds.coords:
     60         ds["V"] = ds["V"].assign_coords(depthv=ds["depth"].values).rename({"depthv": "depth"})

File ~/Codes/ParcelsCode/.pixi/envs/docs/lib/python3.14/site-packages/xarray/core/common.py:664, in DataWithCoords.assign_coords(self, coords, **coords_kwargs)
    661 else:
    662     results = self._calc_assign_results(coords_combined)
--> 664 data.coords.update(results)
    665 return data

File ~/Codes/ParcelsCode/.pixi/envs/docs/lib/python3.14/site-packages/xarray/core/coordinates.py:636, in Coordinates.update(self, other)
    630 # special case for PandasMultiIndex: updating only its dimension coordinate
    631 # is still allowed but depreciated.
    632 # It is the only case where we need to actually drop coordinates here (multi-index levels)
    633 # TODO: remove when removing PandasMultiIndex's dimension coordinate.
    634 self._drop_coords(self._names - coords_to_align._names)
--> 636 self._update_coords(coords, indexes)

File ~/Codes/ParcelsCode/.pixi/envs/docs/lib/python3.14/site-packages/xarray/core/coordinates.py:1104, in DataArrayCoordinates._update_coords(self, coords, indexes)
   1101 def _update_coords(
   1102     self, coords: dict[Hashable, Variable], indexes: dict[Hashable, Index]
   1103 ) -> None:
-> 1104     validate_dataarray_coords(
   1105         self._data.shape, Coordinates._construct_direct(coords, indexes), self.dims
   1106     )
   1108     self._data._coords = coords
   1109     self._data._indexes = indexes

File ~/Codes/ParcelsCode/.pixi/envs/docs/lib/python3.14/site-packages/xarray/core/coordinates.py:1328, in validate_dataarray_coords(shape, coords, dim)
   1326 for d, s in v.sizes.items():
   1327     if d in sizes and s != sizes[d]:
-> 1328         raise CoordinateValidationError(
   1329             f"conflicting sizes for dimension {d!r}: "
   1330             f"length {sizes[d]} on the data but length {s} on "
   1331             f"coordinate {k!r}"
   1332         )

CoordinateValidationError: conflicting sizes for dimension 'depthu': length 75 on the data but length 1 on coordinate 'depthu'

Comment on lines +373 to +375
vector_interp_method=CGrid_Velocity,
# ^Seems to work with AGrid as well? (at least, tests aren't failing -
# either logic needs to be added to choose interpolator, or this interpolator should be renamed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is part of the changes to unit converters in #2455, #2459 and #2461.

In #2461, I propose we attach a new XLinear_Velocity interpolator for A-Grids, for consistency/transparency, but that can be done in a later pR (when the others have merged too)

fields["W"] = Field("W", ds["W"], grid, XLinear)
fields["UVW"] = VectorField("UVW", fields["U"], fields["V"], fields["W"])
fields["UVW"] = VectorField(
"UVW", fields["U"], fields["V"], fields["W"], vector_interp_method=CGrid_Velocity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, it would depend on the grid topology whether we use XLinear_Velocity or CGrid_Velocity

Comment on lines +388 to +392
if mesh == "spherical":
if "U" in fieldset.fields:
fieldset.U.units = GeographicPolar()
if "V" in fieldset.fields:
fieldset.V.units = Geographic()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would all go if we implement #2459. We need to be careful when fixing merge conflicts...

pytest.param(
AdvectionRK4_3D,
marks=pytest.mark.xfail(
reason="from_nemo had 'fieldset.V.units = GeographicPolar()', I'm not sure _why_ this code is needed to get this to pass. To be further investigated.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in the changes that come with #2455, #2459 and #2461

VeckoTheGecko and others added 5 commits January 12, 2026 10:19
Co-authored-by: Erik van Sebille <[email protected]>
Avoiding a FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also getting problems with the curvilinear grid. The lon and lat are defaulted to range(x) and range(y), for some reason

This leads to the plot below, where the right panel does not have lon and lat on the x- and y-axes

Image

Any idea what's going on here, @VeckoTheGecko ?

data_folder = parcels.download_example_dataset("NemoCurvilinear_data")
ds_fields = xr.open_mfdataset(
    data_folder.glob("*_purely_zonal-ORCA025_grid_*.nc4"),
    data_vars="minimal",
    coords="minimal",
    compat="override",
)

ds_coords = xr.open_dataset(data_folder / "mesh_mask.nc4", decode_times=False)

ds_fset = parcels.convert.nemo_to_sgrid(U=ds_fields["U"], V=ds_fields["V"], coords=ds_coords)
display(ds_fset)

fieldset = parcels.FieldSet.from_sgrid_conventions(ds_fset)

print(fieldset.U.grid._ds)
<xarray.Dataset> Size: 35MB
Dimensions:   (depth: 1, time: 1, y_center: 1021, x: 1442, y: 1021,
               x_center: 1442)
Coordinates:
  * depth     (depth) int64 8B 0
  * time      (time) float64 8B 3.0
  * y_center  (y_center) int64 8kB 0 1 2 3 4 5 ... 1015 1016 1017 1018 1019 1020
  * x         (x) int64 12kB 0 1 2 3 4 5 6 ... 1436 1437 1438 1439 1440 1441
  * y         (y) int64 8kB 0 1 2 3 4 5 6 ... 1014 1015 1016 1017 1018 1019 1020
  * x_center  (x_center) int64 12kB 0 1 2 3 4 5 ... 1437 1438 1439 1440 1441
    gphif     (y, x) float64 12MB -76.98 -76.98 -76.98 ... 50.0 49.99 49.99
    glamf     (y, x) float64 12MB 72.88 73.12 73.38 73.62 ... 73.0 73.0 73.0
Data variables:
    U         (depth, time, y_center, x) float32 6MB dask.array<chunksize=(1, 1, 511, 721), meta=np.ndarray>
    V         (depth, time, y, x_center) float32 6MB dask.array<chunksize=(1, 1, 511, 721), meta=np.ndarray>
    grid      int64 8B 0
    lon       (x) int64 12kB 0 1 2 3 4 5 6 ... 1436 1437 1438 1439 1440 1441
    lat       (y) int64 8kB 0 1 2 3 4 5 6 ... 1014 1015 1016 1017 1018 1019 1020

Copy link
Contributor Author

@VeckoTheGecko VeckoTheGecko Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following patch works

commit a985573e9d6874e11354e3af14c98ae186b4231e
Author: Vecko <[email protected]>
Date:   Mon Jan 12 17:04:56 2026 +0100

    Fix glamf and gphif

diff --git a/docs/user_guide/examples/tutorial_nemo_curvilinear.ipynb b/docs/user_guide/examples/tutorial_nemo_curvilinear.ipynb
index 3defa1f8..24422013 100644
--- a/docs/user_guide/examples/tutorial_nemo_curvilinear.ipynb
+++ b/docs/user_guide/examples/tutorial_nemo_curvilinear.ipynb
@@ -71,10 +71,12 @@
     "    compat=\"override\",\n",
     ")\n",
     "\n",
-    "# Drop some auxiliary dimensions - otherwise Parcels will complain\n",
-    "ds_fields = ds_fields.isel(time=0, z_a=0, z=0, drop=True)\n",
+    "ds_coords = xr.open_dataset(data_folder / \"mesh_mask.nc4\", decode_times=False)\n",
+    "ds_fset = parcels.convert.nemo_to_sgrid(\n",
+    "    U=ds_fields[\"U\"], V=ds_fields[\"V\"], coords=ds_coords\n",
+    ")\n",
     "\n",
-    "fieldset = parcels.FieldSet.from_nemo(ds_fields)"
+    "fieldset = parcels.FieldSet.from_sgrid_conventions(ds_fset)"
    ]
   },
   {
@@ -272,7 +274,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "default",
+   "display_name": "test-notebooks-latest",
    "language": "python",
    "name": "python3"
   },
@@ -286,7 +288,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.14.2"
+   "version": "3.13.9"
   }
  },
  "nbformat": 4,
diff --git a/src/parcels/convert.py b/src/parcels/convert.py
index 637f5270..c3818362 100644
--- a/src/parcels/convert.py
+++ b/src/parcels/convert.py
@@ -244,13 +244,17 @@ def nemo_to_sgrid(*, coords: xr.Dataset, **fields: dict[str, xr.Dataset | xr.Dat
         raise ValueError(
             "Dataset already has a 'grid' variable (according to cf_roles). Didn't expect there to be grid metadata on copernicusmarine datasets - please open an issue with more information about your dataset."
         )
+
+    # Update to use lon and lat internally
+
+    ds = ds.rename({"gphif": "lat", "glamf": "lon"})  # TODO: Logging message about rename
     ds["grid"] = xr.DataArray(
         0,
         attrs=sgrid.Grid2DMetadata(
             cf_role="grid_topology",
             topology_dimension=2,
             node_dimensions=("x", "y"),
-            node_coordinates=("glamf", "gphif"),
+            node_coordinates=("lon", "lat"),
             face_dimensions=(
                 sgrid.DimDimPadding("x_center", "x", sgrid.Padding.LOW),
                 sgrid.DimDimPadding("y_center", "y", sgrid.Padding.LOW),
@@ -260,6 +264,6 @@ def nemo_to_sgrid(*, coords: xr.Dataset, **fields: dict[str, xr.Dataset | xr.Dat
     )
 
     # NEMO models are always in degrees
-    ds["glamf"].attrs["units"] = "degrees"
-    ds["gphif"].attrs["units"] = "degrees"
+    ds["lon"].attrs["units"] = "degrees"
+    ds["lat"].attrs["units"] = "degrees"
     return ds
diff --git a/tests/test_convert.py b/tests/test_convert.py
index d7777d49..11a7029d 100644
--- a/tests/test_convert.py
+++ b/tests/test_convert.py
@@ -18,7 +18,7 @@ def test_nemo_to_sgrid():
         "topology_dimension": 2,
         "node_dimensions": "x y",
         "face_dimensions": "x_center:x (padding:low) y_center:y (padding:low)",
-        "node_coordinates": "glamf gphif",
+        "node_coordinates": "lon lat",
         "vertical_dimensions": "z_center:depth (padding:high)",
     }
 

though I wonder if we should relax the constraint that there are coordinates named "lon" and "lat". Thinking about our conversation @erikvansebille , I don't think we have to do this renaming now we have sgrid metadata and in fact having the same variable name here as in the original dataset would be nice in case users want to debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants