Test Empire with NANs by abkfenris · Pull Request #121 · gulfofmaine/buoy_retriever

abkfenris · 2026-01-24T22:22:39Z

There are some NaNs in the 2025-10-12 Empire met data. This makes sure the NaNs are dropped out, so that processing can continue.

xref https://github.com/gulfofmaine/NERACOOS_ERDDAP_K8S/pull/1185

This is part 2 of 3 in a stack made with GitButler:

3 S3 - add daily and monthly tests for South Fork wave #125
2 Test Empire with NANs #121 👈
1 Initial S3 timeseries tests #108

codecov · 2026-01-24T22:23:17Z

Codecov Report

❌ Patch coverage is 76.19048% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.65%. Comparing base (bd3fe04) to head (e9bdc14).
⚠️ Report is 13 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
pipeline/s3_timeseries/pipeline.py	76.19%	5 Missing ⚠️

❌ Your patch status has failed because the patch coverage (76.19%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #121      +/-   ##
==========================================
- Coverage   76.66%   76.65%   -0.02%     
==========================================
  Files          29       29              
  Lines         827      848      +21     
  Branches       21       21              
==========================================
+ Hits          634      650      +16     
- Misses        191      196       +5     
  Partials        2        2

Flag	Coverage Δ
common	`53.57% <ø> (ø)`
s3_timeseries	`63.85% <76.19%> (+0.42%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR adds handling for NaN values in Empire meteorological data to prevent processing failures when encountering "NAN" string values in the dataset. It introduces a new utility function clean_up_dtypes_and_nas() to replace NaN values and convert columns to numeric types, along with a test case using real-world data from October 2025 that contains NaN values.

Changes:

Added clean_up_dtypes_and_nas() function to handle NaN string replacement and dtype conversion
Integrated NaN cleanup into the monthly_ds asset pipeline
Added test case test_monthly_asset_with_nans() with test data files containing NaN values

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
pipeline/s3_timeseries/pipeline.py	Added `clean_up_dtypes_and_nas()` function and integrated it into the monthly data processing pipeline
pipeline/s3_timeseries/tests/test_empire_met.py	Added new test `test_monthly_asset_with_nans()` to verify NaN handling
pipeline/s3_timeseries/tests/test_data/empire_met/2025-10-12.csv	Test data file for October 12, 2025
pipeline/s3_timeseries/tests/test_data/empire_met/2025-10-13.csv	Test data file for October 13, 2025 containing "NAN" values
pipeline/s3_timeseries/tests/test_data/empire_met/test_empire_met_monthly_asset_with_nans.nc	NetCDF snapshot file for test validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pipeline/s3_timeseries/pipeline.py

Copilot · 2026-01-25T17:51:13Z

pipeline/s3_timeseries/pipeline.py

+) -> pd.DataFrame:
+    """Clean up data types and NA values in a dataframe"""
+    if na_values is not None:
+        df = df.replace(na_values, pd.NA).dropna()


The dropna() call will remove entire rows if any column contains NA values after replacement. This could lead to unexpected data loss if only specific columns had NAN values. Consider using dropna() with the 'subset' parameter to target specific columns, or use dropna(how='all') to only drop rows where all values are NA. Alternatively, you could use df.replace() with inplace=False and handle NA values column by column.

Suggested change

df = df.replace(na_values, pd.NA).dropna()

df = df.replace(na_values, pd.NA).dropna(how="all")

pipeline/s3_timeseries/pipeline.py

Copilot · 2026-01-25T17:51:14Z

pipeline/s3_timeseries/pipeline.py

+    for c in df.columns:
+        try:
+            df[c] = pd.to_numeric(df[c])


The pd.to_numeric() conversion is attempted on all columns indiscriminately, including columns that should remain as strings (like 'time'/'datetime' before conversion, or 'station'). While the try-except will prevent errors, this could lead to unintended conversions. Consider checking column types first or having a list of columns to convert, or using pd.to_numeric with errors='coerce' to handle non-numeric values more gracefully.

There are some NANs getting passed along with Empire, this makes sure they are dropped out, so that processing can continue. xref gulfofmaine/NERACOOS_ERDDAP_K8S#1185

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

abkfenris mentioned this pull request Jan 24, 2026

Initial S3 timeseries tests #108

Merged

abkfenris marked this pull request as ready for review January 24, 2026 22:28

abkfenris added this to the Alpha milestone Jan 24, 2026

abkfenris force-pushed the initial_s3_pipeline branch from e48de07 to bce3af6 Compare January 25, 2026 01:24

abkfenris force-pushed the empire-nans branch from d048d5a to 9de6c2d Compare January 25, 2026 01:24

abkfenris requested a review from Copilot January 25, 2026 17:47

Copilot started reviewing on behalf of abkfenris January 25, 2026 17:47 View session

Copilot AI reviewed Jan 25, 2026

View reviewed changes

abkfenris force-pushed the empire-nans branch from 9de6c2d to 1f583ac Compare January 26, 2026 00:14

abkfenris mentioned this pull request Jan 28, 2026

S3 - add daily and monthly tests for South Fork wave #125

Merged

Base automatically changed from initial_s3_pipeline to main March 4, 2026 19:54

Test Empire with NANs

8920fae

There are some NANs getting passed along with Empire, this makes sure they are dropped out, so that processing can continue. xref gulfofmaine/NERACOOS_ERDDAP_K8S#1185

abkfenris force-pushed the empire-nans branch from 8281941 to 23bc376 Compare March 6, 2026 20:20

Apply suggestions from code review

e9bdc14

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

abkfenris force-pushed the empire-nans branch from 23bc376 to e9bdc14 Compare March 6, 2026 22:08

abkfenris merged commit 88ee71e into main Mar 8, 2026
11 of 14 checks passed

abkfenris deleted the empire-nans branch March 8, 2026 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Empire with NANs#121

Test Empire with NANs#121
abkfenris merged 2 commits intomainfrom
empire-nans

abkfenris commented Jan 24, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 25, 2026

Uh oh!

Uh oh!

Copilot AI Jan 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	df = df.replace(na_values, pd.NA).dropna()
	df = df.replace(na_values, pd.NA).dropna(how="all")

Conversation

abkfenris commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abkfenris commented Jan 24, 2026 •

edited

Loading

codecov bot commented Jan 24, 2026 •

edited

Loading