Skip to content

Feature/interp improvements#169

Open
bbakernoaa wants to merge 71 commits intonoaa-oar-arl:developfrom
bbakernoaa:feature/interp_improvements
Open

Feature/interp improvements#169
bbakernoaa wants to merge 71 commits intonoaa-oar-arl:developfrom
bbakernoaa:feature/interp_improvements

Conversation

@bbakernoaa
Copy link
Member

@bbakernoaa bbakernoaa commented Jul 18, 2025

This PR is a huge refactor of the MONET codebase.

It includes many new changes while trying to maintain backward compatibility

  • Creates better structure for accessors.
    • Splits the accessors into a base accessor that is inherited by each other accessor
    • Use of the COARDS format for both rectilinear and curvilinear grids
  • More dask friendly regrinding for xesmf and pyresample functions
  • Added a new .remap function to access both the nearest neighbor and bilinear functionality
  • removed pyresample and xesmf regridding backends in favor of Monet-regrid and xregrid
  • Updated stats to be xarray/dask friendly and added many more stats. Stats are now organized by type in the util/stats folder
  • Moved away from legacy setup.py and setup.cfg
  • updated map plotting functions
  • new facet grid plotting functions
  • new compare function to compare two xarray.DataArray's
    • This can do simple differences or calculate any function within stats
  • Updated docs

bbakernoaa and others added 9 commits April 10, 2025 00:05
- Updated `monet/util/__init__.py` to include a new module `coards_tools` and improved documentation.
- Introduced `coards_tools.py` with functions for handling COARDS and CF convention data, including compliance checks, data extraction, and conversion functions.
- Enhanced `combinetool.py` to improve data combination methods, including better handling of naming conflicts and added support for unstructured grid outputs.
- Updated `interp_util.py` to streamline interpolation utilities and improve code efficiency by using `meshgrid` for coordinate definitions.
…o pyproject.toml

- Updated build-system requirements in pyproject.toml to use setuptools>=61.0 and wheel.
- Added project metadata including name, version, description, authors, maintainers, license, keywords, and classifiers.
- Specified project dependencies directly in pyproject.toml.
- Configured setuptools options for package inclusion and data files.
- Removed legacy setup.cfg and setup.py files as part of the transition to PEP 517 compliant build system.
- Enhance `__init__.py` to ensure accessors are registered on import.
- Add `safe_import` method to `BaseAccessor` for clearer error handling during module imports.
- Update `MONETAccessor` to include new remapping methods with parallel processing options and improved flexibility in handling datasets.
- Introduce plotting methods for DataFrames in `MONETAccessorPandas` to visualize points and lines on maps.
- Refactor `rename_for_monet` method to streamline renaming latitude and longitude columns in DataFrames.
- Implement `quick_facet_time_map` method in `MONETAccessorDataset` for faceted map plotting.
- Added input validation for latitude and longitude in `latlon_xarray_to_CoordinateDefinition`, `lonlat_to_swathdefinition`, and `nearest_point_swathdefinition` functions.
- Introduced new functions for creating AreaDefinition from latitude/longitude arrays, datasets, and UGRID-compliant datasets.
- Implemented parallel processing capabilities in `resample_xesmf` and added support for multiple resampling methods.
- Updated `resample` function to allow for different resampling methods and improved handling of variable naming conflicts.
- Added new internal functions for nearest neighbor and bilinear resampling using pyresample's new resampling classes.
- Implemented Fractions Skill Score (FSS) for spatial fields to assess model performance against observed data.
- Added Extreme Dependency Score (EDS) for evaluating rare event detection.
- Introduced Continuous Ranked Probability Score (CRPS) for ensemble forecasts.
- Developed Spread-Error Relationship function to analyze ensemble spread and error.
- Created Structure-Amplitude-Location (SAL) score for spatial verification of model outputs.

Add utility functions for statistical analysis

- Added matchedcompressed function to return compressed values from two masked arrays with matched masks.
- Implemented matchmasks function to ensure two arrays have the same mask for paired statistical calculations.
- Introduced circular bias functions (circlebias_m and circlebias) for wind direction differences, accounting for circularity.
- Introduced new test files for base accessors, coards tools, comparison functionality, and statistical metrics.
- Implemented tests for various methods in BaseAccessor to ensure correct handling of latitude and longitude coordinates.
- Added tests for coards_tools to validate compliance checks and conversions between COARDS and MONET formats.
- Developed comparison tests for DataArrays, including Dask-backed versions, to verify statistical calculations like RMSE, MAE, and custom statistics.
- Enhanced plotting tests to ensure compatibility with Cartopy and validate map generation functions.
- Created extensive statistical tests covering metrics such as Mean Bias, Normalized Mean Bias, RMSE, and various contingency metrics.
- Added a new module to expose all statistical functions for easier access and organization.
@bbakernoaa
Copy link
Member Author

@rschwant I've been working on this. please take a look. I still need to address the conflicts but this should maintain backward compatibility

{{git.username}} and others added 18 commits July 18, 2025 10:59
- Updated BaseAccessor to check for ESMF backend availability, supporting both esmpy and ESMF imports.
- Modified remap methods in MONETAccessor and MONETAccessorDataset to include xESMF as a resampling option, allowing for Dask-backed arrays.
- Improved remapping logic to handle Dask arrays and ensure correct shape matching with target grids.
- Enhanced coordinate assignment in remap methods to avoid conflicts and ensure proper dimensions.
- Added debug statements for better traceability during remapping processes.
- Updated combine_da_to_df_xesmf_strat to ensure DataArray compatibility.
- Improved lonlat_to_xesmf function to handle scalar values and create meshgrids as needed.
- Introduced new utility functions for resampling and grid definition handling.
- Added new test cases to validate the updated remapping functionality and ensure compatibility with Dask-backed arrays.
bbakernoaa and others added 2 commits January 28, 2026 09:38
* Switch interpolation to xregrid and pytspack

- Replaced monet-regrid with xregrid for horizontal remapping.
- Replaced python-stratify with pytspack for vertical interpolation.
- Added support for tension parameter in vertical interpolation.
- Updated pyproject.toml and environment-dev.yml dependencies.
- Updated xarray and pandas accessors to use new backends.
- Handled Dask chunking requirements for pytspack.
- Updated documentation and tests to reflect the changes.
- Switched to ruff exclusively for linting/formatting with 132 line length.
- Fixed missing numpy imports and cleaned up old config files.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>

* Switch interpolation to xregrid and pytspack

- Replaced monet-regrid with xregrid for horizontal remapping.
- Replaced python-stratify with pytspack for vertical interpolation.
- Added support for tension parameter in vertical interpolation.
- Updated pyproject.toml and environment-dev.yml dependencies.
- Updated xarray and pandas accessors to use new backends.
- Handled Dask chunking requirements for pytspack.
- Updated documentation and tests to reflect the changes.
- Switched to ruff exclusively for linting/formatting with 132 line length.
- Fixed missing numpy imports and cleaned up old config files.
- Fixed Giorgi and EPA region tests to handle nan/None consistently across platforms.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>

* Switch interpolation to xregrid and pytspack

- Replaced monet-regrid with xregrid for horizontal remapping.
- Replaced python-stratify with pytspack for vertical interpolation.
- Added support for tension parameter in vertical interpolation.
- Updated pyproject.toml and environment-dev.yml dependencies.
- Updated xarray and pandas accessors to use new backends.
- Handled Dask chunking requirements for pytspack.
- Preserved DataArray names in vertical interpolation.
- Updated documentation and tests to reflect the changes.
- Switched to ruff exclusively for linting/formatting with 132 line length.
- Fixed UP031 ruff errors (percent format to f-strings).
- Fixed Giorgi and EPA region tests to handle nan/None consistently.
- Cleaned up environment-specific binaries and old config files.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>

* Switch interpolation to xregrid and pytspack

- Replaced monet-regrid with xregrid for horizontal remapping.
- Replaced python-stratify with pytspack for vertical interpolation.
- Added support for tension parameter in vertical interpolation.
- Updated pyproject.toml, environment-dev.yml, and docs/environment-docs.yml.
- Updated xarray and pandas accessors to use new backends.
- Handled Dask chunking requirements for pytspack.
- Preserved DataArray names in vertical interpolation.
- Updated documentation and tests to reflect the changes.
- Switched to ruff exclusively for linting/formatting with 132 line length.
- Fixed UP031 ruff errors (percent format to f-strings).
- Fixed Giorgi and EPA region tests to handle nan/None consistently.
- Cleaned up environment-specific binaries and old config files.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
@bbakernoaa
Copy link
Member Author

bbakernoaa commented Jan 28, 2026

@zmoon @noaa-oar-arl/aq-arl @amcz

This is a big update to create a more unified and maintainable system. It breaks up the large monet_accessors.py into a core accessor and then dataset, dataarray, pandas, accessors. It also updates to use pytspack for vertical interpolation instead of a simple linear interpolation using python-stratify. Additionally it replaces xesmf and pyresample with xregrid and Monet-regrid as a fallback option.

@bbakernoaa bbakernoaa requested review from a team, amcz, drnimbusrain and ytangnoaa January 28, 2026 15:36
@bbakernoaa bbakernoaa self-assigned this Jan 28, 2026
@bbakernoaa bbakernoaa added requirements dependencies Pull requests that update a dependency file in-develop Addressed/fixed/resolved in `develop` branch github_actions Pull requests that update GitHub Actions code labels Jan 28, 2026
@bbakernoaa
Copy link
Member Author

@rschwant

bbakernoaa and others added 8 commits February 7, 2026 14:00
- Update `monet/accessors/base.py` to check for `esmpy` and `monet_regrid` availability.
- Update `monet/util/resample.py` to implement silent fallback to `monet-regrid` if `esmpy` (required for `xregrid`) is missing.
- Update `Dataset`, `DataArray`, and `Pandas` accessors to support the dual path.
- Add `tests/test_resample_dual.py` to verify the dual path with both Eager and Lazy data.
- Refactor `isinstance` calls in `base.py` to use Python 3.10+ syntax as suggested by ruff.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
* feat: introduce unified `monet.pair` interface for model and obs

This change introduces a single, robust interface for pairing model data (xarray) with observational data (xarray, pandas, or dask).

Key improvements:
- Unified `pair()` function in `monet.util.combinetool`.
- Consistent `.monet.pair()` accessor method across DataArray, Dataset, and DataFrame.
- Full support for Dask-backed objects (xarray and DataFrames), maintaining laziness wherever possible.
- Improved Dask support in `BaseAccessor` by removing lazy-breaking `.values` and immediate computations.
- Implementation of `interp_time` for both xarray and DataFrame branches.
- Automatic preservation of `siteid` coordinates during pairing.
- Proper tracking of data provenance via the `history` attribute.
- Backward compatibility wrappers for existing `combine_` functions.

Adheres to the Aero Protocol for Pangeo-ready scientific pipelines.

* feat: introduce unified `monet.pair` interface and fix Arrow-backed string issues

- Introduced unified `pair()` function and `.monet.pair()` accessors.
- Improved Dask support and laziness.
- Fixed `IndexError` caused by Arrow-backed strings during `expand_dims` in `MONETAccessorPandas._df_to_da`.
- Added `test_pair_gridded_to_gridded` to verify model-to-model and satellite pairing support.
- Adhered to Aero Protocol by ensuring no immediate computations of data variables and proper history tracking.

* feat: unified pairing interface and improved Dask support

- Introduced `monet.pair` as a unified interface for model and observation pairing.
- Integrated `pair()` method into DataArray, Dataset, and DataFrame accessors.
- Added support for gridded-to-gridded (model-to-model/satellite) pairing.
- Improved Dask support by preserving laziness and avoiding immediate computations of data variables.
- Fixed `IndexError` in `MONETAccessorPandas._df_to_da` caused by Arrow-backed strings during `expand_dims`.
- Adhered to the Aero Protocol for maintainable, high-performance scientific pipelines.
- Added comprehensive tests covering various input combinations and laziness.

* feat: unified pairing interface with comprehensive fix for Arrow strings

- Introduced `monet.pair` and integrated it into accessors.
- Added support for model-to-model and satellite pairing.
- Improved Dask support and laziness.
- More aggressively converted Arrow-backed strings to object dtype to avoid `TypeError` and `IndexError` in CI environments where Arrow arrays have limited support in Xarray/Dask operations.
- Added missing `numpy` import in `combinetool.py`.
- Verified with a full test suite.

* feat: unified pairing interface with robust Arrow string handling

- Introduced `monet.pair` and integrated it into accessors.
- Added support for model-to-model and satellite pairing.
- Improved Dask support and laziness.
- Aggressively convert Arrow-backed strings to object dtype in both pandas/dask DataFrames and xarray Datasets before conversion or merging. This fixes CI failures related to Dask's `normalize_chunks` not supporting Arrow strings.
- Added missing `numpy` import in `combinetool.py`.
- Verified with a full test suite.

---------

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
…rt (#45)

* Modernize pairing/interpolation for Aero Protocol and UGRID support

- Modernized pairing and interpolation pipelines to adhere to the Aero Protocol.
- Implemented automatic UGRID detection in `BaseAccessor._dataset_to_monet`.
- Removed deprecated and eager `remap_nearest_unstructured` method.
- Added NumPy-style docstrings and strict type hinting to core functions.
- Enabled automatic provenance tracking via `attrs['history']`.
- Ensured backend agnosticism (NumPy/Dask) in `resample` and `pair`.
- Added comprehensive unit tests in `tests/test_aero_protocol.py` verifying UGRID pairing and Dask consistency.
- Provided a "Two-Track" visualization example in `monet/plots/visualization_example.py`.

* Modernize pairing/interpolation for Aero Protocol and UGRID support (Final Fixes)

- Modernized pairing and interpolation pipelines to adhere to the Aero Protocol.
- Implemented automatic UGRID detection in `BaseAccessor._dataset_to_monet`.
- Removed deprecated and eager `remap_nearest_unstructured` method from code and docs.
- Added NumPy-style docstrings and strict type hinting to core functions.
- Enabled automatic provenance tracking via `attrs['history']`.
- Ensured backend agnosticism (NumPy/Dask) in `resample` and `pair`.
- Fixed ambiguous boolean evaluation for dask-backed DataArrays in accessor checks.
- Added comprehensive unit tests in `tests/test_aero_protocol.py` verifying UGRID pairing and Dask consistency.
- Updated documentation (User Guide, Developer Guide, Tutorial, API) to reflect modern remapping and UGRID support.
- Provided a "Two-Track" visualization example in `monet/plots/visualization_example.py`.

* Modernize pairing/interpolation for Aero Protocol and UGRID support (Final CI Fixes)

- Modernized pairing and interpolation pipelines to adhere to the Aero Protocol.
- Implemented automatic UGRID detection in `BaseAccessor._dataset_to_monet`.
- Removed deprecated and eager `remap_nearest_unstructured` method from code and docs.
- Added NumPy-style docstrings and strict type hinting to core functions.
- Enabled automatic provenance tracking via `attrs['history']`.
- Ensured backend agnosticism (NumPy/Dask) in `resample` and `pair`.
- Fixed ambiguous boolean evaluation for dask-backed DataArrays by explicitly checking `.coords` or `.variables`.
- Made `xregrid.Regridder` mockable in tests by moving imports to module level in `resample.py`.
- Updated `test_pair_aero_protocol` to correctly handle Dask DataFrames.
- Updated documentation (User Guide, Developer Guide, Tutorial, API) to reflect modern remapping and UGRID support.
- Provided a "Two-Track" visualization example in `monet/plots/visualization_example.py`.

* Modernize pairing and interpolation pipelines according to Aero Protocol

Architected scientific pipelines that balance speed, maintainability, provenance, and visualization.

Key Changes:
- **UGRID & CF Support:** Integrated automatic UGRID detection and unstructured grid support into the core standardization pipeline (_dataset_to_monet).
- **Backend Agnostic Logic:** Refactored resample.py and combinetool.py (pairing) to support both NumPy and Dask backends without forced computations.
- **Strict Maintainability:** Added PEP 484 type hints and NumPy-style docstrings to all modernized functions.
- **Data Provenance:** Implemented automatic tracking of data lineage via the history attribute.
- **Two-Track Visualization:** Added a hybrid visualization example for both publication-quality (Matplotlib/Cartopy) and interactive (HvPlot) plots.
- **Robust Validation:** Created a new test suite tests/test_aero_protocol.py that verifies logic consistency across eager and lazy execution.
- **API Cleanup:** Deprecated and removed eager, non-compliant methods like remap_nearest_unstructured in favor of the unified, protocol-compliant remap and pair interfaces.
- **CI Fix:** Resolved Dask/Pandas string dtype mismatch in pairing tests and fixed import sorting issues.

Follows the Aero 🍃⚡ Protocol rules for Pangeo-scale Earth Science data engineering.

---------

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
* docs: migrate from Sphinx to MkDocs + mkdocstrings

- Replaced Sphinx with MkDocs and Material theme.
- Configured NOAA-inspired color scheme (#00467f, #0085ca).
- Set up automated API documentation via mkdocstrings.
- Converted all RST files to Markdown.
- Added mkdocs-gallery support and a visualization example.
- Updated documentation content for xregrid and Aero Protocol.
- Removed all legacy Sphinx-related files (conf.py, Makefile, .rst).
- Updated environment-docs.yml and pyproject.toml with new doc dependencies.

* docs: fix CI and RTD configuration for MkDocs migration

- Moved mkdocs and its plugins to the pip section in environment-docs.yml to ensure they are found by micromamba/conda.
- Updated .github/workflows/ci.yml to replace sphinx-build with mkdocs build.
- Updated .readthedocs.yaml to use the mkdocs builder instead of sphinx.
- Cleaned up navigation and styles for the Material theme.

* docs: update target branch to develop in config and guide

- Updated edit_uri in mkdocs.yml to point to the develop branch.
- Updated the developer guide to instruct contributors to open PRs to the develop branch.

---------

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
Implement `calc_fv3_pressure` and `calc_fv3_height` in `monet/util/vertical.py`.
These functions are backend-agnostic (supporting both NumPy and Dask)
and adhere to the Aero Protocol for scientific data engineering.

- `calc_fv3_pressure`: Calculates pressure using the hybrid formula P = ak + bk * ps.
- `calc_fv3_height`: Calculates geopotential height via hypsometric integration from the surface.
- Full "Zero-Trust" test coverage in `tests/test_vertical.py`.
- Automated provenance tracking via the `history` attribute.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
* feat: enhance accessors with convention-aware detection and purge legacy deps

- Enhanced BaseAccessor with property-based lat/lon detection (CF/COARDS/UGRID).
- Refactored Dataset and DataArray accessors to avoid forced renaming.
- Purged xesmf and pyresample dependencies.
- Added Aero Protocol compliant unit tests for Eager/Lazy consistency.
- Improved Pandas accessor flexibility and index preservation.
- Ensured provenance tracking via 'history' attributes.
- Fixed several naming and type hint issues across accessors.

* fix: restore legacy regridding methods and fix stratify dimension issues

- Restored `remap_xesmf` and other deprecated regridding methods as wrappers for `remap` to maintain backward compatibility with existing tests.
- Fixed `ValueError` in `stratify` interpolation by converting `levels` to raw arrays before passing to `pytspack`, preventing `xarray.apply_ufunc` dimension conflicts.
- Updated `tests/test_accessors_aero.py` to skip `global_land_mask` dependent tests if the package is not installed.
- Ensured all imports and type hints are consistent across accessors.
- Cleaned up `interp_util.py` and restored generic aliases.

* feat: final polish of accessors for convention-awareness and provenance

- Updated visualization wrappers (`quick_map`, etc.) to be convention-aware, removing redundant forced renaming.
- Enhanced `cartopy_utils.py` to automatically detect latitude/longitude coordinate names if not provided.
- Added `standardize` method to `BaseAccessor` as a non-destructive alternative to `structure_for_monet`.
- Improved `DataArray.monet.compare` with provenance and Dask-safe logic.
- Ensured all history entries are strictly consistent across all accessor methods.
- Fixed missing imports for type hints in accessor modules.
- Added more unit tests for `standardize` and `compare`.
- Cleaned up unreachable code and duplications in `cartopy_utils.py`.

* feat: ultimate polish of accessors for convention-aware processing

- Refactored `monet.pair` and `combinetool.py` to be fully convention-aware, removing forced renames and supporting custom latitude/longitude names in both model (xarray) and observation (pandas/dask) data.
- Enhanced `MONETAccessorDataset` and `MONETAccessor` to remove all remaining `pyresample` and `xesmf` references, including removing the `window` method.
- Updated `BaseAccessor` with a `standardize` method to allow non-destructive metadata enhancement.
- Improved coordinate detection in `cartopy_utils.py` for all quick plot methods, enabling seamless visualization of data with non-standard coordinate names.
- Fixed dimension conflicts in `stratify` interpolation when passing DataArray levels.
- Ensured absolute provenance tracking across all accessor methods.
- Restored deprecated methods as wrappers for backward compatibility.
- Added comprehensive unit tests for new functionality following the Aero Protocol.

* Refactor accessors for convention-awareness and Aero Protocol

* Fix legacy expansion in combine_da_to_da and improve lonlat_to_dataset

---------

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
* Refactor land/ocean masking to be backend-agnostic and preserve provenance

- Consolidated `is_land` and `is_ocean` into `BaseAccessor`.
- Used `xarray.apply_ufunc` with `dask='parallelized'` for lazy support.
- Ensured consistent return of Xarray objects (no lazy breakers).
- Added automated `history` attribute updates for scientific hygiene.
- Cleaned up redundant implementations and imports.
- Added advanced Dask-laziness verification to `tests/test_accessors_aero.py`.

* Fix CI failure by adding missing dependency and skip decorator

- Added `global-land-mask` to `pyproject.toml` and `environment-dev.yml`.
- Added `@pytest.mark.skipif` to `test_is_land_ocean_advanced_lazy`.
- Restored original tests in `tests/test_accessors_aero.py` and appended new ones.
- Improved import hygiene in `monet/accessors/base.py`.

* Fix CI failure by moving global-land-mask to pip dependencies

- Moved `global-land-mask` from conda to pip section in `environment-dev.yml`.
- Ensured `pyproject.toml` also includes the dependency.
- Preserved all tests in `tests/test_accessors_aero.py` and added Dask verification.
- Improved logic in `BaseAccessor` to be backend-agnostic and track provenance.

* Major refactoring of MONET accessors for Aero Protocol compliance

- Consolidated common Xarray accessor methods (`is_land`, `is_ocean`, `remap`, `interp_*`, `tidy`, `wrap_longitudes`, `cftime_to_datetime64`) into `BaseAccessor`.
- Implemented backend-agnostic logic using `xarray.apply_ufunc(..., dask='parallelized')` to support both Eager (NumPy) and Lazy (Dask) backends.
- Added automated provenance tracking via `history` attributes.
- Improved `BaseAccessor` to support Pandas DataFrames for geographic properties and land/ocean masking.
- Removed redundant method implementations from `MONETAccessor` and `MONETAccessorDataset`.
- Added missing `global-land-mask` dependency to `pyproject.toml` and `environment-dev.yml` (fixed CI installation).
- Added comprehensive unit tests in `tests/test_accessors_aero.py` verifying protocol compliance and identical Eager/Lazy results.

* Fix xregrid DataArray compatibility and land mask installation

- Implemented robust DataArray-to-Dataset conversion in `resample` for `xregrid` compatibility.
- Updated `tests/test_accessors_aero.py` with standard dimensions and attributes.
- Ensured `global-land-mask` is correctly specified for installation.
- Improved coordinate detection in `BaseAccessor` for both Xarray and Pandas.

* Fix Ruff linting error and further improve accessor robustness

- Fixed unused import of `global_land_mask` in tests by adding `# noqa: F401`.
- Cleaned up try-except blocks for dependency detection in tests.
- Re-verified all accessor refactorings and tests.

---------

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
* feat: replace global-land-mask with internal enhanced mask system

This commit replaces the external `global-land-mask` dependency with a robust,
internal mask system in `monet.util.mask`.

Key changes:
- Removed `global-land-mask` from dependencies.
- Added `geopandas`, `rasterio`, and `shapely` as optional `regions` dependencies.
- Implemented `monet.util.mask` containing:
    - `RegionDefinitions`: Downloaders and definitions for Giorgi, IPCC AR6, EPA regions, etc.
    - `MaskBuilder`: Utility to build raster masks from polygons.
    - `EarthMask`: Reader supporting Xarray/Dask via `apply_ufunc`.
- Updated `BaseAccessor` to use the new system for `is_land` and `is_ocean`.
- Added `get_region` accessor method for easy regional masking.
- Added `add_mask` utility to `monet.util.tools`.
- Added `monet/util/preprocess_masks.py` for pre-building masks.
- Updated and added tests to verify the new implementation following Aero Protocol.

The new system is backend-agnostic, supports Dask, and maintains data provenance.

* feat: finalize enhanced mask system with dependency updates and improved test logic

- Added `requests` to `pyproject.toml` dependencies.
- Improved test skip logic in `tests/test_accessors_aero.py` to check for pre-built masks if GIS dependencies are missing.
- Verified preprocessing for Giorgi, EPA Admin, Land, and Timezones.
- Clarified status of IPCC AR6 and EPA Eco Regions masks (pending valid URLs).

---------

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
bbakernoaa and others added 7 commits February 18, 2026 10:43
…FutureWarnings (#51)

- Added automatic drop_duplicates('time') in _pair_xarray for both observation and model datasets.
- Changed xr.merge join method to 'left' in _pair_xarray to ensure alignment with observation coordinates and suppress FutureWarnings.
- Added a unit test in tests/test_pair.py to verify handling of duplicate times.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
This commit resolves a common issue where pairing a model with
observations of different time sizes would fail with an AlignmentError
during the spatial remapping step.

Key changes:
- Removed erroneous dask-based source/target swapping logic in
  BaseAccessor.remap to ensure consistent API behavior.
- Updated _pair_xarray to handle time-mismatches and trajectories
  more robustly by either using a single time slice for fixed grids
  or pre-interpolating in time for moving platforms.
- Adjusted unit test assertions in tests/test_accessors.py to match
  the consistent remap API.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
* Fix AlignmentError when pairing trajectory data

Align model time to observation time before spatial remapping for moving platforms (trajectories) to avoid dimension mismatch errors in xregrid.
- Use `reindex(method='nearest')` when `interp_time=False`.
- Existing `interp` logic is preserved when `interp_time=True`.
- Added regression test `tests/test_pair_trajectory.py`.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>

* Fix AlignmentError when pairing trajectory data and fix linter issues

Align model time to observation time before spatial remapping for moving platforms (trajectories) to avoid dimension mismatch errors in xregrid.
- Use `reindex(method='nearest')` when `interp_time=False`.
- Existing `interp` logic is preserved when `interp_time=True`.
- Added regression test `tests/test_pair_trajectory.py`.
- Fixed ruff linter issues in `tests/test_pair_trajectory.py`.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>

* Fix AlignmentError when pairing trajectory data

Align model time to observation time before spatial remapping for moving platforms (trajectories) to avoid dimension mismatch errors in xregrid.
- Use `reindex(method='nearest')` when `interp_time=False`.
- Using `obs.time.values` in reindex to avoid MultiIndex naming conflicts (fixes CI `RuntimeError`).
- Existing `interp` logic is preserved when `interp_time=True`.
- Added regression test `tests/test_pair_trajectory.py`.
- Fixed ruff linter issues in `tests/test_pair_trajectory.py`.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>

* Fix pairing errors for trajectory and fixed-grid cases

- Fix AlignmentError when pairing trajectory data: align model time to observation time (using `reindex` or `interp`) before spatial remapping.
- Fix RuntimeError for trajectory pairing: use `obs.time.values` in `reindex` to avoid MultiIndex naming conflicts.
- Fix ValueError for fixed-grid pairing: drop `time` scalar coordinate from target grid to avoid conflict with model's time dimension in output.
- Added regression test `tests/test_pair_trajectory.py` and resolved linter issues.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>

* Fix pairing errors for trajectory and fixed-grid cases

- Fix AlignmentError when pairing trajectory data: align model time to observation time (using `reindex` or `interp`) before spatial remapping.
- Fix RuntimeError for trajectory pairing: use `obs.time.values` in `reindex` to avoid MultiIndex naming conflicts.
- Fix ValueError for fixed-grid pairing: drop `time` scalar coordinate from target grid to avoid conflict with model's time dimension in output.
- Fix MergeError when merging results: use `compat='override'` to handle slight coordinate mismatches (e.g. floating point precision).
- Added regression test `tests/test_pair_trajectory.py` and resolved linter issues.

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
* Add integration test for AERONET pairing with CF and UGRID grids

This test verifies the integration between monetio (develop branch) and
monet (using the new xregrid branch) for pairing AERONET observations
with both standard gridded (CF) and unstructured (UGRID) model data.

The test uses actual monetio and xregrid libraries and follows the
standard monet.pair workflow.

* Fix CI failure by adding monetio dependency and improving test robustness

- Added `monetio` (develop branch) to `pyproject.toml` and `environment-dev.yml`.
- Updated `tests/test_aeronet_pairing.py` to use `pytest.importorskip` for better handling of missing dependencies.
- Confirmed that `monetio` was the missing piece causing the `ModuleNotFoundError` in CI.

* Move monetio to optional dependencies and improve test robustness

- Moved `monetio` from main dependencies to `optional-dependencies.obs` in `pyproject.toml`.
- Added `pytest.importorskip` for `xregrid` and `esmpy` in `tests/test_aeronet_pairing.py` to ensure it skips gracefully if they are missing.
- Replied to PR feedback.

* Improve masking system with repository-cached masks and add integration test

- Updated `monet/util/mask.py` to support masks stored within the package data directory.
- Added pre-computed masks for `giorgi`, `epa_admin`, `land`, and `timezones` at 0.05 resolution to `monet/util/data/masks/`.
- Updated `pyproject.toml` to include `.npz` mask files in the package distribution.
- Enhanced `tests/test_aeronet_pairing.py` to verify that masking works on paired datasets without requiring optional building dependencies.
- Verified that `monet.pair` and masking integration works for both CF and UGRID grids.
- Addressed PR feedback by making `monetio` an optional dependency.

---------

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
* Enhance UGRID/CF support and expand documentation

- Centralized coordinate naming heuristics in `BaseAccessor`.
- Improved convention-aware coordinate detection (CF attributes, units, UGRID node/face/edge coordinates).
- Refactored `monet.util.tools` regional functions to be convention-aware using MONET accessors.
- Added comprehensive documentation for utility tools, vertical interpolation, and comparison methods.
- Updated `mkdocs.yml` to include new documentation sections.
- Added unit tests for enhanced convention detection.

* Further enhance UGRID/CF support and expand documentation

- Fixed CI failure by formatting tests with ruff.
- Centralized coordinate naming heuristics in `BaseAccessor`.
- Improved convention-aware coordinate detection (CF attributes, units, UGRID node/face/edge coordinates).
- Refactored `monet.util.tools` regional functions to be convention-aware using MONET accessors.
- Added comprehensive documentation for utility tools, vertical interpolation, comparison methods, and visualization.
- Updated `mkdocs.yml` to include new documentation sections.
- Added unit tests for enhanced convention detection.

---------

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
* Refactor monet/met_funcs.py for Aero Protocol compliance

- Replaced np.asarray() and explicit NumPy math with a backend-agnostic
  dispatch mechanism using xarray.apply_ufunc with dask='parallelized'.
- Added _apply_aero helper for consistent dispatching and provenance tracking.
- Updated ds.attrs['history'] with timestamped transformation records.
- Ensured NumPy-style docstrings and strict type hints throughout.
- Added tests/test_met_funcs_aero.py to verify Eager vs Lazy consistency.
- Updated tests/conftest.py with mocks for heavy dependencies to enable
  testing in simplified environments.

* Fix CI failure by using conditional mocking in tests/conftest.py

- Updated tests/conftest.py to only mock dependencies if they are missing
  from the environment. This prevents overwriting real packages in CI.
- Added __path__ and __version__ to mocked packages to avoid collection
  errors in tests that expect them.
- Expanded the list of mocked cartopy submodules to support imports in
  monet.plots.cartopy_utils.
- Fixed a ruff linting error in conftest.py by using importlib.util.find_spec.

* Remove test_mock.py to fix CI linting failure

Deleted the temporary test_mock.py file that was accidentally included in
the previous submission and caused a ruff format check failure in CI.

* Refactor monet.util for Aero Protocol and de-duplicate utilities

- Moved `nearest` and `calc_13_category_usda_soil_type` from `util/__init__.py`
  to `util/tools.py`.
- Refactored `wsdir2uv`, `get_relhum`, and `calc_13_category_usda_soil_type`
  to be backend-agnostic using `xr.apply_ufunc` with `dask='parallelized'`.
- Implemented provenance tracking via `history` attribute updates.
- De-duplicated `monet/util/__init__.py` by replacing legacy definitions
  with imports from `.tools`.
- Ensured NumPy-style docstrings and strict type hints.
- Added `tests/test_tools_aero.py` to verify Eager vs Lazy consistency.
- Maintained backward compatibility for all public utility functions.

---------

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
* feat: add robust coordinate and UGRID detection following Aero Protocol

- Created `monet/util/conventions.py` to centralize and robustify coordinate and grid detection for CF, COARDS, and UGRID/UXarray conventions, borrowing from `xregrid` internal logic.
- Refactored `BaseAccessor` and `monet/util/coards_tools.py` to use the new conventions utility, ensuring backend-agnostic and lazy coordinate identification.
- Updated `_coards_to_netcdf` and `_dataarray_coards_to_netcdf` to use `xr.broadcast` for meshgrid generation, maintaining Dask laziness for 2D coordinates.
- Implemented UGRID-specific metadata and coordinate extraction helpers.
- Added comprehensive validation tests in `tests/test_conventions_aero.py` covering various grid types and both Eager (NumPy) and Lazy (Dask) backends.
- Ensured all transformation functions update provenance in `ds.attrs['history']`.
- Fixed several linting issues and potential TypeErrors related to eager checks.

* feat: robust coordinate and UGRID detection following Aero Protocol (Fixed)

- Created `monet/util/conventions.py` to centralize coordinate and grid detection using `cf_xarray` and robust heuristics.
- Refactored `BaseAccessor` and `monet/util/coards_tools.py` to use the new conventions utility.
- Fixed regressions in UGRID detection and coordinate identification by improving `find_coords` and `detect_grid_type`.
- Replaced `numpy.meshgrid` and `.values` calls with lazy `xarray.broadcast` and `xarray.concat` for 2D coordinate and bounds generation, ensuring Dask laziness.
- Corrected potential `TypeError` in `is_curvilinear_grid` and `monet_to_coards` by replacing `.reduce(np.abs)` with `abs(...).max()`.
- Implemented UGRID-specific metadata extraction and coordinate retrieval helpers.
- Added comprehensive validation tests in `tests/test_conventions_aero.py`.
- Ensured all pre-commit checks pass and transformation routines update data provenance.

---------

Co-authored-by: bbakernoaa <22104759+bbakernoaa@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file enhancement github_actions Pull requests that update GitHub Actions code in-develop Addressed/fixed/resolved in `develop` branch maintenance requirements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant