Skip to content

Fix AnalyzerOptions in DQDL rule converters#688

Open
sudsali wants to merge 1 commit intoawslabs:masterfrom
sudsali:fix-analyzer-options
Open

Fix AnalyzerOptions in DQDL rule converters#688
sudsali wants to merge 1 commit intoawslabs:masterfrom
sudsali:fix-analyzer-options

Conversation

@sudsali
Copy link
Copy Markdown
Contributor

@sudsali sudsali commented Mar 26, 2026

Issue #, if available:

Description of changes:
This PR fixes how AnalyzerOptions are passed to Deequ Check methods in the DQDL rule converters, ensuring correct NULL handling when WHERE clauses are present.

Why was this necessary:
Without this fix, DQDL rule converters pass no AnalyzerOptions to Deequ Check methods, defaulting to NullBehavior.Ignore. This causes:

  • WHERE clause rules to compute incorrect metrics (filtered NULLs silently ignored instead of treated as empty strings)
  • ColumnLength to ignore NULLs instead of treating them as length 0
  • ColumnValues EQUALS to ignore NULLs instead of failing the assertion
  • Entropy to incorrectly filter rows when a WHERE clause is present (Entropy is a global metric)
  • EMPTY/WHITESPACES_ONLY keywords to produce SQL without NULL guards, causing incorrect compliance ratios when NULLs are present

Changes:

  • DQDLRuleConverter.scala: Add DEFAULT_ANALYZER_OPTION constant and analyzerOptionsForWhereClause() helper
  • CompletenessRule, IsCompleteRule, UniquenessRule, IsUniqueRule, UniqueValueRatioRule: Pass AnalyzerOptions when WHERE clause present
  • IsPrimaryKeyRule: Pass AnalyzerOptions + fix multi-column bug where completeness constraints were silently dropped
  • ColumnLengthRule: Pass AnalyzerOptions unconditionally (NULLs treated as length 0)
  • ColumnValuesRule: Pass AnalyzerOptions for WHERE clause; use NullBehavior.Fail for EQUALS min/max; add NULL guards for EMPTY and WHITESPACES_ONLY keywords in generated SQL
  • EntropyRule: Remove WHERE clause filtering (global metric should not be filtered)
  • ColumnValuesRuleSpec: Update 5 existing tests for new constraint representations
  • AnalyzerOptionParitySpec (new): 11 integration tests covering all changes

All 1020 tests pass.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant