Skip to content

Conversation

szehon-ho
Copy link
Member

@szehon-ho szehon-ho commented Oct 2, 2025

What changes were proposed in this pull request?

Fix the analysis of default value expression to not include column names

Why are the changes needed?

The following query:

CREATE TABLE t (current_timestamp DEFAULT current_timestamp)

fails with an exception:

[INVALID_DEFAULT_VALUE.NOT_CONSTANT] Failed to execute CREATE TABLE command because the destination column or variable `current_timestamp` has a DEFAULT value CURRENT_TIMESTAMP, which is not a constant expression whose equivalent value is known at query planning time. SQLSTATE: 42623;

This is introduced in : #50631, there CreateTable child's ResolvedIdentifier starts to have output, which are the CREATE TABLE columns. Thus the analyzer will resolve the default value against the other columns, causing the regression. Previously the CreateTable output is empty, so the resolver will fail to resolve against the columns and fallback to literal functions.

Does this PR introduce any user-facing change?

Should fix a regression of Spark 4.0.

How was this patch tested?

Add new unit test in DataSourceV2DataFrameSuite

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Oct 2, 2025
u.copy(child = newChild)
}

case d @ DefaultValueExpression(u: UnresolvedAttribute, _, _) =>
Copy link
Member Author

@szehon-ho szehon-ho Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: before this fix, Default value expression would fall to UnresolvedAttribute above. It would then think the default value refers to the conflicting column name and fail.

case u @ UnresolvedAttribute(nameParts) =>
          val result = withPosition(u) {
            resolveColumnByName(nameParts)
              .orElse(LiteralFunctionResolution.resolve(nameParts))
              .map {
                // We trim unnecessary alias here. Note that, we cannot trim the alias at top-level,
                // as we should resolve `UnresolvedAttribute` to a named expression. The caller side
                // can trim the top-level alias if it's safe to do so. Since we will call
                // CleanupAliases later in Analyzer, trim non top-level unnecessary alias is safe.
                case Alias(child, _) if !isTopLevel => child
                case other => other
              }
              .getOrElse(u)
          }
          logDebug(s"Resolving $u to $result")
          result

@szehon-ho szehon-ho force-pushed the default_value_conflict branch from b0350a5 to da190b3 Compare October 2, 2025 06:19
case d @ DefaultValueExpression(u: UnresolvedAttribute, _, _) =>
d.copy(child = LiteralFunctionResolution.resolve(u.nameParts)
.map {
case Alias(child, _) if !isTopLevel => child
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied this from the other code, we dont need it right?

@szehon-ho
Copy link
Member Author

failure may not be related:

[error] org.apache.spark.sql.kafka010.KafkaMicroBatchV1SourceWithConsumerSuite, rerunning to verify

@szehon-ho szehon-ho changed the title [SPARK-53786][SQL] Default value should not conflict with special column name [SPARK-53786][SQL] Default value with special column name should not conflict with real column Oct 2, 2025
u.copy(child = newChild)
}

case d @ DefaultValueExpression(c: Expression, _, _) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this works because we resolve expressions top to bottom, hence we see DefaultValueExpression before we see the unresolved attribute?

}
}

private def resolveLiteralColumns(e: Expression) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little bit worried that this doesn't fix the root problem that ResolvedIdentifier output in CREATE actually is used as candidates for resolving default values. Let me think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe it does solve the root problem. Am I right we will not recurse into DefaultValueExpression child so we effectively ensure that default value resolution doesn't have access to attributes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is matched before children

}
}

test("test default value special column name conflicting with real column name") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests when a default value references other data columns? It is illegal, just to want to make sure we throw a good error message in this case.

col1 INT,
col2 INT DEFAULT col1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes , do you mean like "test default value should not refer to real column", later in the file?

@aokolnychyi
Copy link
Contributor

@szehon-ho, could you also update the description to describe the role ResolvedIdentifier in CREATE?

@szehon-ho
Copy link
Member Author

szehon-ho commented Oct 9, 2025

@aokolnychyi thanks for looking , addressed comments, and edited the pr description.

@gengliangwang
Copy link
Member

Here is a detailed RCA:
When parsing the default value expression, the expression in the reproduction will produce UnresolvedAttribute

  private def getDefaultExpression(
      exprCtx: ExpressionContext,
      place: String): DefaultValueExpression = {
    // Make sure it can be converted to Catalyst expressions.
    val expr = expression(exprCtx)
    if (expr.containsPattern(PARAMETER)) {
      throw QueryParsingErrors.parameterMarkerNotAllowed(place, expr.origin)
    }
    DefaultValueExpression(expr, getOriginalText(exprCtx))
  }
  override def visitCurrentLike(ctx: CurrentLikeContext): Expression = withOrigin(ctx) {
    ...
      UnresolvedAttribute.quoted(ctx.name.getText)
  }

However, in ColumnResolutionHelper.innerResolve , Spark will try to resolve UnresolvedAttribute as Literal Function if it fail to find the column from the child node:

        case u @ UnresolvedAttribute(nameParts) =>
          val result = withPosition(u) {
            resolveColumnByName(nameParts)
              .orElse(LiteralFunctionResolution.resolve(nameParts)) 

After commit fc1cb78, the child node of CreateTable has the output and Spark can find the column from it, thus it is resolved as a column instead of function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants