Fix AnalyzerOptions in DQDL rule converters#688
Open
sudsali wants to merge 1 commit intoawslabs:masterfrom
Open
Fix AnalyzerOptions in DQDL rule converters#688sudsali wants to merge 1 commit intoawslabs:masterfrom
sudsali wants to merge 1 commit intoawslabs:masterfrom
Conversation
… and NULL behavior
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
Description of changes:
This PR fixes how AnalyzerOptions are passed to Deequ Check methods in the DQDL rule converters, ensuring correct NULL handling when WHERE clauses are present.
Why was this necessary:
Without this fix, DQDL rule converters pass no
AnalyzerOptionsto Deequ Check methods, defaulting toNullBehavior.Ignore. This causes:WHEREclause rules to compute incorrect metrics (filtered NULLs silently ignored instead of treated as empty strings)WHEREclause is present (Entropy is a global metric)EMPTY/WHITESPACES_ONLYkeywords to produce SQL without NULL guards, causing incorrect compliance ratios when NULLs are presentChanges:
DQDLRuleConverter.scala: AddDEFAULT_ANALYZER_OPTIONconstant andanalyzerOptionsForWhereClause()helperCompletenessRule,IsCompleteRule,UniquenessRule,IsUniqueRule,UniqueValueRatioRule: Pass AnalyzerOptions when WHERE clause presentIsPrimaryKeyRule: Pass AnalyzerOptions + fix multi-column bug where completeness constraints were silently droppedColumnLengthRule: Pass AnalyzerOptions unconditionally (NULLs treated as length 0)ColumnValuesRule: Pass AnalyzerOptions for WHERE clause; useNullBehavior.Failfor EQUALS min/max; add NULL guards forEMPTYandWHITESPACES_ONLYkeywords in generated SQLEntropyRule: Remove WHERE clause filtering (global metric should not be filtered)ColumnValuesRuleSpec: Update 5 existing tests for new constraint representationsAnalyzerOptionParitySpec(new): 11 integration tests covering all changesAll 1020 tests pass.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.