Fix: WHERE clause with zero matching rows incorrectly fails for hasMi… by sohama4 · Pull Request #677 · awslabs/deequ

sohama4 · 2026-03-20T15:00:30Z

…n/hasMax/isComplete/satisfies

When a WHERE clause filters out all rows, several analyzers produce metrics that cause both aggregate check failure and incorrect row-level results (false instead of true/null for filtered rows).

Root cause per analyzer:

Minimum/Maximum: min(null)/max(null) returns null, fromAggregationResult returns None, metric loses fullColumn, row-level falls back to lit(false)
Completeness: sum returns 0, count returns 0, state is (0,0), metric value is NaN (0/0), assertion fails
Compliance: sum(criterion) returns null, fromAggregationResult returns None, same as Minimum/Maximum

Fix (per analyzer):

Minimum/Maximum: override computeMetricFrom to preserve fullColumn (criterion) when state is None and WHERE clause is present
Completeness: override computeMetricFrom to detect count=0 with WHERE clause and return Failure metric with fullColumn (rowLevelResults)
Compliance: override computeMetricFrom to preserve fullColumn (rowLevelResults) when state is None and WHERE clause is present
AnalysisBasedConstraint: treat EmptyStateException as Success when analyzer has a filter condition -- there are no matching rows to violate the constraint

Note: 21 filterable analyzers exist but only the 4 used by ColumnValues rules are fixed here. A generic fix would require each analyzer to expose its row-level column and empty-state detection, which is a larger refactor.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…n/hasMax/isComplete/satisfies When a WHERE clause filters out all rows, several analyzers produce metrics that cause both aggregate check failure and incorrect row-level results (false instead of true/null for filtered rows). Root cause per analyzer: - Minimum/Maximum: min(null)/max(null) returns null, fromAggregationResult returns None, metric loses fullColumn, row-level falls back to lit(false) - Completeness: sum returns 0, count returns 0, state is (0,0), metric value is NaN (0/0), assertion fails - Compliance: sum(criterion) returns null, fromAggregationResult returns None, same as Minimum/Maximum Fix (per analyzer): - Minimum/Maximum: override computeMetricFrom to preserve fullColumn (criterion) when state is None and WHERE clause is present - Completeness: override computeMetricFrom to detect count=0 with WHERE clause and return Failure metric with fullColumn (rowLevelResults) - Compliance: override computeMetricFrom to preserve fullColumn (rowLevelResults) when state is None and WHERE clause is present - AnalysisBasedConstraint: treat EmptyStateException as Success when analyzer has a filter condition -- there are no matching rows to violate the constraint Note: 21 filterable analyzers exist but only the 4 used by ColumnValues rules are fixed here. A generic fix would require each analyzer to expose its row-level column and empty-state detection, which is a larger refactor.

shriyavanvari · 2026-03-25T15:45:44Z

src/main/scala/com/amazon/deequ/analyzers/Maximum.scala

+  // When WHERE clause filters all rows, max(null) returns null, so fromAggregationResult
+  // returns None. We still need the fullColumn (criterion) for correct row-level results.
+  // Note: criterion is the raw augmented column ["FilteredData", null]. The FilteredRowOutcome
+  // treatment (TRUE vs NULL) is applied downstream by the assertion UDF in


not completely relevant to this logic but the UDF will have to be replaced. FGAC does not support UDF and this will cause issues in Glue.

I barely followed this, but i take it that you're talking about some upstream UDF that exists irrespective of this, right?

In the code comment it mentions treatment (TRUE vs NULL) is applied downstream by the assertion UDF - we would have to replace this for Glue. UDFs don't work when FGAC is enabled. So doesn't have to happen in this PR but we need to remove it if there's any such UDF.

shriyavanvari · 2026-03-26T02:31:42Z

src/main/scala/com/amazon/deequ/analyzers/Maximum.scala

+  override def computeMetricFrom(state: Option[MaxState]): DoubleMetric = {
+    state match {
+      case None if where.isDefined =>
+        metricFromEmpty(this, "Maximum", column).copy(fullColumn = Some(criterion))


can we create helper for this? We have done this multiple times.

Fixed in new commit.

shriyavanvari · 2026-03-26T02:46:26Z

src/main/scala/com/amazon/deequ/analyzers/Minimum.scala

+    state match {
+      case None if where.isDefined =>
+        metricFromEmpty(this, "Minimum", column).copy(fullColumn = Some(criterion))
+      case _ => super.computeMetricFrom(state)


Just wanted to highlight something - maybe we should document in the PR or a comment in the code :
Any code that relies on successMetricsAsDataFrame or successMetricsAsJson to populate metric dashboards, history tables, or anomaly detection will start seeing missing data points for these analyzers in the "all rows filtered" case.

This is probably okay! Don't think there's meaningful value to report when zero rows match but I think worth documenting as a known behavior change, especially for consumers that distinguish between "metric not computed" and "metric = 0" or "metric = NaN".

Also in DQETL we will not report Minimum and Maximum statistics for a column when all rows are filtered - which is probably expected? But documentation will help.

Added a code comment - idk how we typically add documentation, let me know if you'd prefer smth else.

Yes code comment for here is fine. Also a note in PR description would help so consumers can track down what changed when they see different data

…ness behavior change - Add metricFromEmptyWithColumn() to Analyzers object to deduplicate the metricFromEmpty(...).copy(fullColumn = Some(...)) pattern across Maximum, Minimum, Completeness, and Compliance - Document that Completeness now returns Failure instead of Success(NaN) when WHERE clause filters all rows, consistent with the other analyzers and avoiding NaN in success metrics consumers

shriyavanvari reviewed Mar 26, 2026

View reviewed changes

shriyavanvari approved these changes Mar 27, 2026

View reviewed changes

sohama4 merged commit ae670ce into master Mar 30, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: WHERE clause with zero matching rows incorrectly fails for hasMi…#677

Fix: WHERE clause with zero matching rows incorrectly fails for hasMi…#677
sohama4 merged 2 commits intomasterfrom
sa_bug_fix

sohama4 commented Mar 20, 2026

Uh oh!

shriyavanvari Mar 25, 2026

Uh oh!

sohama4 Mar 26, 2026

Uh oh!

shriyavanvari Mar 26, 2026

Uh oh!

shriyavanvari Mar 26, 2026

Uh oh!

sohama4 Mar 26, 2026

Uh oh!

shriyavanvari Mar 26, 2026

Uh oh!

sohama4 Mar 26, 2026

Uh oh!

shriyavanvari Mar 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sohama4 commented Mar 20, 2026

Uh oh!

shriyavanvari Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

sohama4 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

shriyavanvari Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

shriyavanvari Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

sohama4 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

shriyavanvari Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

sohama4 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

shriyavanvari Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shriyavanvari Mar 26, 2026 •

edited

Loading