Skip to content

[BugFix] Fix COALESCE(null, 42) returning wrong type string instead of int (#5175)#5315

Closed
qianheng-aws wants to merge 1 commit intoopensearch-project:mainfrom
qianheng-aws:coalesce-type-5175-baseline-v4
Closed

[BugFix] Fix COALESCE(null, 42) returning wrong type string instead of int (#5175)#5315
qianheng-aws wants to merge 1 commit intoopensearch-project:mainfrom
qianheng-aws:coalesce-type-5175-baseline-v4

Conversation

@qianheng-aws
Copy link
Copy Markdown
Collaborator

Description

Fix COALESCE returning wrong type string instead of int when called with null and integer literals.

Root Cause: In PPL, null is parsed as a QualifiedName (field name), not a Literal. When COALESCE encounters an unresolvable field name like null, it falls back to replaceWithNullLiteralInCoalesce() which incorrectly created a null literal typed as VARCHAR. This caused leastRestrictive([VARCHAR, INTEGER]) to return VARCHAR, making COALESCE(null, 42) return string type.

Fix (three parts):

  1. Primary (QualifiedNameResolver.java): Changed the null literal type from SqlTypeName.VARCHAR to SqlTypeName.NULL (Calcite's bottom type, compatible with any type). This makes leastRestrictive([NULL, INTEGER]) correctly return INTEGER.
  2. Defense-in-depth (EnhancedCoalesceFunction.java): Filter out NULL-typed operands in the return type inference before calling leastRestrictive, with fallback to VARCHAR when all operands are NULL.
  3. Scope fix (CalciteRexNodeVisitor.java): Scope the inCoalesceFunction context flag to direct COALESCE arguments only, preventing unresolved field names in nested function calls (e.g., nomv rewriting to coalesce(mvjoin(array_compact(field), ...))) from being silently replaced with null literals.

Related Issues

Resolves #5175

Check List

  • New functionality includes testing
  • Javadoc added for new public API
  • Commits signed per DCO (-s)
  • spotlessCheck passed
  • Unit tests passed
  • Integration tests passed
  • Grammar changes: all three .g4 files synced (N/A - no grammar changes)

…f int (opensearch-project#5175)

Root cause: In PPL, `null` is parsed as a QualifiedName (field name), not a
literal. When COALESCE encounters an unresolvable field, the fallback method
replaceWithNullLiteralInCoalesce() created a null literal typed as VARCHAR
instead of NULL. This caused leastRestrictive([VARCHAR, INTEGER]) to return
VARCHAR, producing string output for what should be an integer result.

Fix (three parts):
1. QualifiedNameResolver: Use SqlTypeName.NULL instead of SqlTypeName.VARCHAR
   for null literals in COALESCE, so leastRestrictive([NULL, INTEGER]) returns
   INTEGER correctly.
2. EnhancedCoalesceFunction: Filter out NULL-typed operands before calling
   leastRestrictive in return type inference, with VARCHAR fallback when all
   operands are NULL.
3. CalciteRexNodeVisitor: Scope the inCoalesceFunction flag to direct COALESCE
   arguments only, preventing unresolved field names in nested functions from
   being silently replaced with null literals.

Signed-off-by: Heng Qian <qianheng@amazon.com>
@qianheng-aws
Copy link
Copy Markdown
Collaborator Author

Decision Log

Root Cause: QualifiedNameResolver.replaceWithNullLiteralInCoalesce() typed null literals as SqlTypeName.VARCHAR instead of SqlTypeName.NULL. When COALESCE(null, 42) was processed, null (parsed as a QualifiedName) was resolved to null:VARCHAR, causing leastRestrictive([VARCHAR, INTEGER]) to return VARCHAR — hence the incorrect string output type.

Approach: Three-part fix:

  1. Use SqlTypeName.NULL (Calcite's bottom type) for unresolved field null literals. NULL is compatible with any type, so leastRestrictive([NULL, INTEGER]) correctly returns INTEGER.
  2. Add defense-in-depth filtering in EnhancedCoalesceFunction.getReturnTypeInference() — filter out NULL-typed operands before calling leastRestrictive.
  3. Scope inCoalesceFunction flag in CalciteRexNodeVisitor.visitFunction() — save/restore the flag around nested function calls so it only applies to direct COALESCE arguments.

Alternatives Rejected:

  • Only fixing QualifiedNameResolver: Would have caused a regression in CalcitePPLNoMvTest.testNoMvNonExistentField because nomv rewrites to coalesce(mvjoin(array_compact(field), ...)) and the leaked inCoalesceFunction flag would silently replace unresolved fields inside array_compact with null:NULL (accepted as bottom type) instead of throwing an error.

Pitfalls:

  • The inCoalesceFunction flag was a flat boolean that leaked into nested function calls. With VARCHAR, the leak was mostly harmless because array_compact(null:VARCHAR) would error on type mismatch. With NULL (bottom type), array_compact(null:NULL) could silently succeed — hence the need for the scoping fix.
  • Existing unit tests referenced null:VARCHAR in logical plan expectations — these needed updating to null:NULL.

Things to Watch: The inCoalesceFunction flag design could benefit from a stack-based approach if more nuanced nesting is needed in the future.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Fix null literal type and COALESCE context scoping in core

Relevant files:

  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLEnhancedCoalesceTest.java

Sub-PR theme: Defense-in-depth type filtering in EnhancedCoalesceFunction with tests

Relevant files:

  • core/src/main/java/org/opensearch/sql/expression/function/udf/condition/EnhancedCoalesceFunction.java
  • core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactoryTest.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLEnhancedCoalesceIT.java
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/5175.yml

⚡ Recommended focus areas for review

Context Restore

The wasInCoalesce flag is saved before the isCoalesce check, but the flag is unconditionally set to false for non-COALESCE functions even if wasInCoalesce was true. This means if a COALESCE argument is itself a non-COALESCE function call (e.g., coalesce(some_func(null), 42)), the inCoalesceFunction flag will be set to false during processing of some_func, and then restored to true after. While the restore logic is correct, the intermediate false assignment for non-COALESCE functions could cause the null inside some_func(null) to NOT be replaced with a null literal, which may be the intended behavior per the PR description. However, this edge case should be explicitly tested to confirm the behavior is correct.

} else {
  // Disable the flag for nested non-COALESCE functions so that unresolved field
  // names inside e.g. array_compact(...) are not silently replaced with null.
  context.setInCoalesceFunction(false);
}
Nullability Loss

The original leastRestrictive call returned the type as-is (which may already include nullability information from Calcite). The new code wraps the result with createTypeWithNullability(commonType, true), always forcing nullable. While this is generally correct for COALESCE semantics, it may override a non-nullable result type that Calcite would have inferred if all non-null operands are guaranteed non-null. This could affect downstream type checking or optimization. Verify that forcing nullable is always correct here.

var commonType = opBinding.getTypeFactory().leastRestrictive(nonNullTypes);
if (commonType != null) {
  // Result should be nullable since COALESCE may return null
  return opBinding.getTypeFactory().createTypeWithNullability(commonType, true);
}
NULL Type Compatibility

Using SqlTypeName.NULL as the type for the null literal relies on Calcite's leastRestrictive correctly handling NULL as a bottom type. The new unit tests in OpenSearchTypeFactoryTest validate this for OpenSearchTypeFactory, but it should be confirmed that makeNullLiteral with SqlTypeName.NULL type is valid in all Calcite versions used and doesn't cause issues in SQL generation or translation to Spark SQL.

private static Optional<RexNode> replaceWithNullLiteralInCoalesce(CalcitePlanContext context) {
  log.debug("replaceWithNullLiteralInCoalesce() called");
  if (context.isInCoalesceFunction()) {
    return Optional.of(
        context.rexBuilder.makeNullLiteral(
            context.rexBuilder.getTypeFactory().createSqlType(SqlTypeName.NULL)));
  }
  return Optional.empty();
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Return NULL type when all operands are null-typed

When all operands are NULL-typed, the fallback returns a nullable VARCHAR. However,
it would be more semantically correct to return a nullable NULL type, since the
result of COALESCE(null, null) is truly null with no concrete type. Returning
VARCHAR may cause unexpected implicit type coercions downstream when the result is
used in further expressions.

core/src/main/java/org/opensearch/sql/expression/function/udf/condition/EnhancedCoalesceFunction.java [104-110]

-// If all operands are NULL-typed, fall back to VARCHAR
+// If all operands are NULL-typed, return nullable NULL type
 if (nonNullTypes.isEmpty()) {
   return opBinding
       .getTypeFactory()
       .createTypeWithNullability(
-          opBinding.getTypeFactory().createSqlType(SqlTypeName.VARCHAR), true);
+          opBinding.getTypeFactory().createSqlType(SqlTypeName.NULL), true);
 }
Suggestion importance[1-10]: 5

__

Why: Returning SqlTypeName.NULL instead of VARCHAR when all operands are null-typed is semantically more accurate for COALESCE(null, null). However, this could potentially cause issues downstream if Calcite or other components don't handle NULL type well as a return type, and the PR's existing test for testCoalesceWithAllNonExistentFields expects VARCHAR fallback behavior implicitly.

Low
Avoid unnecessary flag reset for non-nested cases

When a non-COALESCE function is encountered, the flag is unconditionally set to
false, even if the outer context was already inside a COALESCE. This means that if
COALESCE contains a nested non-COALESCE function call (e.g.,
coalesce(some_func(nonexistent_field), 42)), the unresolved field inside some_func
will not be replaced with null as intended. The else branch should only set the flag
to false if it was previously true (i.e., only suppress the flag for nested
non-COALESCE calls within a COALESCE context), but the restore in finally already
handles this correctly. However, the else branch overrides wasInCoalesce effectively
making the restore a no-op for the non-COALESCE case — the logic is correct only
because wasInCoalesce is captured before the override. This is subtle and correct,
but the comment is misleading; consider clarifying that the else branch
intentionally overrides the inherited context for the duration of this non-COALESCE
function's argument processing.

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java [502-509]

 boolean wasInCoalesce = context.isInCoalesceFunction();
 if (isCoalesce) {
   context.setInCoalesceFunction(true);
-} else {
-  // Disable the flag for nested non-COALESCE functions so that unresolved field
-  // names inside e.g. array_compact(...) are not silently replaced with null.
+} else if (wasInCoalesce) {
+  // Temporarily disable the flag for nested non-COALESCE functions so that unresolved
+  // field names inside e.g. array_compact(...) are not silently replaced with null.
   context.setInCoalesceFunction(false);
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion proposes only resetting the flag when wasInCoalesce is true, but the current code is functionally correct because wasInCoalesce is captured before the override and restored in finally. The else branch setting to false when wasInCoalesce is already false is a no-op in terms of final state. The suggestion adds a minor clarity improvement but doesn't fix a real bug.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] COALESCE(null, 42) returns wrong type string instead of int due to leastRestrictive type inference

1 participant