Skip to content

Commit 243ef04

Browse files
jatorreclaude
andcommitted
Add IQR-based outlier detection for float statistics
When nodata is not explicitly set (common with EE exports), use IQR-based outlier detection to filter extreme fill values that would otherwise corrupt the band statistics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 72392b1 commit 243ef04

File tree

1 file changed

+25
-6
lines changed

1 file changed

+25
-6
lines changed

raquet/raster2raquet.py

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -527,13 +527,32 @@ def read_statistics_numpy(
527527
) -> RasterStats | None:
528528
"""Calculate statistics for array of numeric values and optional nodata value"""
529529
total_pixels = values.size
530-
if nodata is not None:
531-
bad_values_mask = (values == nodata) | numpy.isnan(values)
532-
masked_values = numpy.ma.masked_array(values, bad_values_mask)
533-
value_count = int(masked_values.count())
530+
531+
# Start with NaN filtering for float types
532+
if values.dtype in (numpy.float16, numpy.float32, numpy.float64):
533+
bad_values_mask = ~numpy.isfinite(values)
534534
else:
535-
masked_values = values
536-
value_count = values.size
535+
bad_values_mask = numpy.zeros(values.shape, dtype=bool)
536+
537+
# Add explicit nodata masking
538+
if nodata is not None:
539+
bad_values_mask = bad_values_mask | (values == nodata)
540+
541+
# For float data without explicit nodata, use IQR-based outlier detection
542+
# to filter extreme fill values (common in EE exports)
543+
if nodata is None and values.dtype in (numpy.float16, numpy.float32, numpy.float64):
544+
finite_values = values[numpy.isfinite(values)]
545+
if len(finite_values) > 100:
546+
q1 = numpy.percentile(finite_values, 25)
547+
q3 = numpy.percentile(finite_values, 75)
548+
iqr = q3 - q1
549+
# Use 10*IQR for very permissive outlier detection (catches extreme fill values)
550+
lower_bound = q1 - 10 * iqr
551+
upper_bound = q3 + 10 * iqr
552+
bad_values_mask = bad_values_mask | (values < lower_bound) | (values > upper_bound)
553+
554+
masked_values = numpy.ma.masked_array(values, bad_values_mask)
555+
value_count = int(masked_values.count())
537556

538557
if value_count == 0:
539558
return None

0 commit comments

Comments
 (0)