Skip to content

Test Empire with NANs#121

Merged
abkfenris merged 2 commits intomainfrom
empire-nans
Mar 8, 2026
Merged

Test Empire with NANs#121
abkfenris merged 2 commits intomainfrom
empire-nans

Conversation

@abkfenris
Copy link
Member

@abkfenris abkfenris commented Jan 24, 2026

There are some NaNs in the 2025-10-12 Empire met data. This makes sure the NaNs are dropped out, so that processing can continue.

xref https://github.com/gulfofmaine/NERACOOS_ERDDAP_K8S/pull/1185


This is part 2 of 3 in a stack made with GitButler:

@codecov
Copy link

codecov bot commented Jan 24, 2026

Codecov Report

❌ Patch coverage is 76.19048% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.65%. Comparing base (bd3fe04) to head (e9bdc14).
⚠️ Report is 13 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
pipeline/s3_timeseries/pipeline.py 76.19% 5 Missing ⚠️

❌ Your patch status has failed because the patch coverage (76.19%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #121      +/-   ##
==========================================
- Coverage   76.66%   76.65%   -0.02%     
==========================================
  Files          29       29              
  Lines         827      848      +21     
  Branches       21       21              
==========================================
+ Hits          634      650      +16     
- Misses        191      196       +5     
  Partials        2        2              
Flag Coverage Δ
common 53.57% <ø> (ø)
s3_timeseries 63.85% <76.19%> (+0.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@abkfenris abkfenris marked this pull request as ready for review January 24, 2026 22:28
@abkfenris abkfenris added this to the Alpha milestone Jan 24, 2026
@abkfenris abkfenris force-pushed the initial_s3_pipeline branch from e48de07 to bce3af6 Compare January 25, 2026 01:24
@abkfenris abkfenris requested a review from Copilot January 25, 2026 17:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds handling for NaN values in Empire meteorological data to prevent processing failures when encountering "NAN" string values in the dataset. It introduces a new utility function clean_up_dtypes_and_nas() to replace NaN values and convert columns to numeric types, along with a test case using real-world data from October 2025 that contains NaN values.

Changes:

  • Added clean_up_dtypes_and_nas() function to handle NaN string replacement and dtype conversion
  • Integrated NaN cleanup into the monthly_ds asset pipeline
  • Added test case test_monthly_asset_with_nans() with test data files containing NaN values

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pipeline/s3_timeseries/pipeline.py Added clean_up_dtypes_and_nas() function and integrated it into the monthly data processing pipeline
pipeline/s3_timeseries/tests/test_empire_met.py Added new test test_monthly_asset_with_nans() to verify NaN handling
pipeline/s3_timeseries/tests/test_data/empire_met/2025-10-12.csv Test data file for October 12, 2025
pipeline/s3_timeseries/tests/test_data/empire_met/2025-10-13.csv Test data file for October 13, 2025 containing "NAN" values
pipeline/s3_timeseries/tests/test_data/empire_met/test_empire_met_monthly_asset_with_nans.nc NetCDF snapshot file for test validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

) -> pd.DataFrame:
"""Clean up data types and NA values in a dataframe"""
if na_values is not None:
df = df.replace(na_values, pd.NA).dropna()
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dropna() call will remove entire rows if any column contains NA values after replacement. This could lead to unexpected data loss if only specific columns had NAN values. Consider using dropna() with the 'subset' parameter to target specific columns, or use dropna(how='all') to only drop rows where all values are NA. Alternatively, you could use df.replace() with inplace=False and handle NA values column by column.

Suggested change
df = df.replace(na_values, pd.NA).dropna()
df = df.replace(na_values, pd.NA).dropna(how="all")

Copilot uses AI. Check for mistakes.
Comment on lines +374 to +376
for c in df.columns:
try:
df[c] = pd.to_numeric(df[c])
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pd.to_numeric() conversion is attempted on all columns indiscriminately, including columns that should remain as strings (like 'time'/'datetime' before conversion, or 'station'). While the try-except will prevent errors, this could lead to unintended conversions. Consider checking column types first or having a list of columns to convert, or using pd.to_numeric with errors='coerce' to handle non-numeric values more gracefully.

Copilot uses AI. Check for mistakes.
Base automatically changed from initial_s3_pipeline to main March 4, 2026 19:54
There are some NANs getting passed along with Empire, this makes sure they are dropped out, so that processing can continue.

xref gulfofmaine/NERACOOS_ERDDAP_K8S#1185
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@abkfenris abkfenris merged commit 88ee71e into main Mar 8, 2026
11 of 14 checks passed
@abkfenris abkfenris deleted the empire-nans branch March 8, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants