Skip to content

NGWPC NWM PI-3 Formulation Selection Delivery 2026-02-25#27

Open
cmaynard-ngwpc wants to merge 688 commits intoNOAA-OWP:masterfrom
NGWPC:ngwpc-3.1.2.3.0
Open

NGWPC NWM PI-3 Formulation Selection Delivery 2026-02-25#27
cmaynard-ngwpc wants to merge 688 commits intoNOAA-OWP:masterfrom
NGWPC:ngwpc-3.1.2.3.0

Conversation

@cmaynard-ngwpc
Copy link
Copy Markdown

This pull request delivers significant robustness, correctness, and scalability improvements to ngen-forcing, with a strong focus on MPI-safe error handling, retry logic, forcing data reliability, and containerized execution.

It expands forcing support across multiple domains (including oCONUS), improves regridding and weight-file workflows, strengthens I/O and race-condition handling, and integrates the EWTS logging framework with MPI-aware diagnostics.

The PR also modernizes path handling, improves CI/CD coverage, refactors large execution paths for clarity and timing analysis, and resolves several long-standing edge cases in GFS, AORC, NBM, and NWM forcing pipelines.


Additions

Forcing & Domain Support

  • Added NBM ANA supplemental precipitation support for Puerto Rico.
  • Added historical forcing data loaders.
  • Added SFINCS and SCHISM domain handling, including:
    • Correct file expectations
    • Hawaii domain updates
    • Fixes for SCHISM multi-day simulations
  • Added oCONUS parameterization for NWM domains.
  • Added ZARR as a supported file type.
  • Added Model_tpxo10_atlas support.
  • Added automatic fallback logic for NWM data downloads using NCEI and NODD AWS sources.

Reliability, Error Handling & MPI

  • Introduced a fully MPI-aware retry decorator:
    • Supports abort vs retry semantics
    • Avoids MPI deadlocks
    • Handles partial-rank failures safely
  • Expanded FileNotFoundError handling across all temporary-file cleanup paths.
  • Added full MPI barriers and reduced error aggregation logic.
  • Added program-status checks at key execution points.
  • Added hash- and random-ID-based uniqueness for output and weight files.
  • Added cache management and garbage collection improvements.

Logging, Debugging & Observability

  • Integrated the EWTS package (including new performance log level).
  • Added timing blocks and refactored execution into discrete timed phases.
  • Added support for:
    • MPI debug environment variables
    • Debug-wait environment flags
  • Improved log formatting, verbosity control, and message consistency.

Changes

Data Processing & Regridding

  • Refactored regridding workflows:

    • Retry-decorated ESMF calls
    • Safer execution ordering
    • Isolated regrid object execution
  • Refactored weight-file lifecycle:

    • Unique naming per realization
    • Safer temp-file writes
    • Improved loading, writing, and caching logic
  • Improved AORC, GFS, and NBM processing efficiency.

  • Extended time slices to handle forecast boundary edge cases.

Containerization & Path Handling

  • Converted all critical paths to absolute path handling for container safety.

  • Fixed configuration, domain, and model file resolution inside containers.

  • Updated Dockerfiles:

    • Added missing dependencies (e.g., wget, netcdf4)
    • Added domain-specific images (SFINCS, Delft dashboard)
  • Removed binary and data files from container images.

Performance & Scalability

  • Improved MPI data loading strategy.

  • Reduced redundant cache file opens and I/O operations.

  • Improved handling of:

    • NFS cleanup
    • EBUSY errors
    • Race conditions during downloads and writes
  • Kept full simulation time ranges in memory where appropriate to reduce reloads.

Code Quality & Refactoring

  • Large-scale refactors for:

    • Retry utilities
    • Error handlers
    • Regridding and execution orchestration
  • Improved type hints, formatting, naming consistency, and docstrings.

  • Split large methods into clearer properties and helpers.

  • Removed deprecated, unused, or commented-out logic.

CI/CD & Tooling

  • Updated CI/CD workflows, including ngwpc-candidate and ngwpc-release branches.
  • Synced scripts with latest data-processing expectations.
  • Improved unit-test and package installation behavior.

Removals

  • Removed deprecated diagnostics and stray debug code.
  • Removed lumped forcing from the build.
  • Removed obsolete abort paths in favor of MPI-safe error handling.
  • Removed assumptions about script-relative paths.
  • Removed unused properties, imports, and legacy warnings.

mkarim-rtx and others added 30 commits October 14, 2025 16:08
* WIP for github cicd migration

* udpated cicd.yml

* updated cicd.yml

* updated cicd.yml

* updated cicd.yml

* updated cicd file

* updated ngencoastal dockerfile

* updated cicd file

* updated cicd file

* updated cicd file

* updated cicd file

* wip

* 1)Updated Dockerfile.ngencoastal to be able to install the compiled boost libraries. 2)Added configuration files for the SCHISM calibration and forecast use cases.

* 1)Updated Dockerfile.ngencoastal to be able to install the compiled boost libraries. 2)Added configuration files for the SCHISM calibration and forecast use cases.

* 1) changed the anaconda installation to miniforge. 2) reduced the image sizee to 13.1 GB. 3) Updated repository to github.

* Removed anaconda default channels.

* 1) Updated the FVCOM_download script to be compatiable with the recent updates (removed the regulargrid and forecast files) on the Amazon AWS server. 2) Added the forcing download script.

* Updated the calibration master run script to use the sfincs configuration format for configuration files.

---------

Co-authored-by: Miguel.Pena <miguel.pena@rtx.com>
Co-authored-by: Miguel Pena <miguelp1986@gmail.com>
Co-authored-by: Parallel Works app-run user <Zhengtao.Cui@mgmt-zhengtaocui-ngenhydrooezcuisnapshotcoastaloe-00100.optimizationuseast1-5.pw.local>
…dated the configuration to include schism configuratioins.
… to be consistent with the setting on the integration cluster. Updated coastal/SFINCS/requirements.txt.
kyle-larkin and others added 30 commits February 2, 2026 21:25
Merge development into ngwpc-candidate for release 3.1.2.3.0-rc1
…index.html, has been updated since the script was developed. Now it has data only back to 2024. Changed the script to use the https://www.ncei.noaa.gov/ server. It has data back to 2019, although it doesn't have the most recent forecast data like the current month. Now the script will download data from https://www.ncei.noaa.gov/ for dates range from 2019 upto two months before the current date. For dates from two month ago to present, the date will be downloaded from the NODD AWS Cloud server.
… the model files are relative to the executed script. This prevents the containerized code from finding these files because the container doesn't have those files relative to the executed script inside the container. Changed the Python main script to not use the path relative to the executed script.
…t specified. This is not reqired for schism because there is no such sfincs.nc file for schism. Skip this step when schism is selected.
adds nbm ana supplemental precip functionality for Puerto Rico. fixes…
Update dataprocessor for SCHISM to skipping checking epsg code in sfincs.nc.
changing order of supplemental precip for pr_ana
Merge ngwpc-candidate into ngwpc-release for release 3.1.2.3.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.