Conversation
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (76.19%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #121 +/- ##
==========================================
- Coverage 76.66% 76.65% -0.02%
==========================================
Files 29 29
Lines 827 848 +21
Branches 21 21
==========================================
+ Hits 634 650 +16
- Misses 191 196 +5
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e48de07 to
bce3af6
Compare
d048d5a to
9de6c2d
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds handling for NaN values in Empire meteorological data to prevent processing failures when encountering "NAN" string values in the dataset. It introduces a new utility function clean_up_dtypes_and_nas() to replace NaN values and convert columns to numeric types, along with a test case using real-world data from October 2025 that contains NaN values.
Changes:
- Added
clean_up_dtypes_and_nas()function to handle NaN string replacement and dtype conversion - Integrated NaN cleanup into the
monthly_dsasset pipeline - Added test case
test_monthly_asset_with_nans()with test data files containing NaN values
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pipeline/s3_timeseries/pipeline.py | Added clean_up_dtypes_and_nas() function and integrated it into the monthly data processing pipeline |
| pipeline/s3_timeseries/tests/test_empire_met.py | Added new test test_monthly_asset_with_nans() to verify NaN handling |
| pipeline/s3_timeseries/tests/test_data/empire_met/2025-10-12.csv | Test data file for October 12, 2025 |
| pipeline/s3_timeseries/tests/test_data/empire_met/2025-10-13.csv | Test data file for October 13, 2025 containing "NAN" values |
| pipeline/s3_timeseries/tests/test_data/empire_met/test_empire_met_monthly_asset_with_nans.nc | NetCDF snapshot file for test validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) -> pd.DataFrame: | ||
| """Clean up data types and NA values in a dataframe""" | ||
| if na_values is not None: | ||
| df = df.replace(na_values, pd.NA).dropna() |
There was a problem hiding this comment.
The dropna() call will remove entire rows if any column contains NA values after replacement. This could lead to unexpected data loss if only specific columns had NAN values. Consider using dropna() with the 'subset' parameter to target specific columns, or use dropna(how='all') to only drop rows where all values are NA. Alternatively, you could use df.replace() with inplace=False and handle NA values column by column.
| df = df.replace(na_values, pd.NA).dropna() | |
| df = df.replace(na_values, pd.NA).dropna(how="all") |
| for c in df.columns: | ||
| try: | ||
| df[c] = pd.to_numeric(df[c]) |
There was a problem hiding this comment.
The pd.to_numeric() conversion is attempted on all columns indiscriminately, including columns that should remain as strings (like 'time'/'datetime' before conversion, or 'station'). While the try-except will prevent errors, this could lead to unintended conversions. Consider checking column types first or having a list of columns to convert, or using pd.to_numeric with errors='coerce' to handle non-numeric values more gracefully.
9de6c2d to
1f583ac
Compare
There are some NANs getting passed along with Empire, this makes sure they are dropped out, so that processing can continue. xref gulfofmaine/NERACOOS_ERDDAP_K8S#1185
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There are some NaNs in the 2025-10-12 Empire met data. This makes sure the NaNs are dropped out, so that processing can continue.
xref https://github.com/gulfofmaine/NERACOOS_ERDDAP_K8S/pull/1185
This is part 2 of 3 in a stack made with GitButler: