-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53786][SQL] Default value with special column name should not conflict with real column #52504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
u.copy(child = newChild) | ||
} | ||
|
||
case d @ DefaultValueExpression(u: UnresolvedAttribute, _, _) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: before this fix, Default value expression would fall to UnresolvedAttribute above. It would then think the default value refers to the conflicting column name and fail.
case u @ UnresolvedAttribute(nameParts) =>
val result = withPosition(u) {
resolveColumnByName(nameParts)
.orElse(LiteralFunctionResolution.resolve(nameParts))
.map {
// We trim unnecessary alias here. Note that, we cannot trim the alias at top-level,
// as we should resolve `UnresolvedAttribute` to a named expression. The caller side
// can trim the top-level alias if it's safe to do so. Since we will call
// CleanupAliases later in Analyzer, trim non top-level unnecessary alias is safe.
case Alias(child, _) if !isTopLevel => child
case other => other
}
.getOrElse(u)
}
logDebug(s"Resolving $u to $result")
result
b0350a5
to
da190b3
Compare
case d @ DefaultValueExpression(u: UnresolvedAttribute, _, _) => | ||
d.copy(child = LiteralFunctionResolution.resolve(u.nameParts) | ||
.map { | ||
case Alias(child, _) if !isTopLevel => child |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied this from the other code, we dont need it right?
failure may not be related: [error] org.apache.spark.sql.kafka010.KafkaMicroBatchV1SourceWithConsumerSuite, rerunning to verify |
u.copy(child = newChild) | ||
} | ||
|
||
case d @ DefaultValueExpression(c: Expression, _, _) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this works because we resolve expressions top to bottom, hence we see DefaultValueExpression before we see the unresolved attribute?
} | ||
} | ||
|
||
private def resolveLiteralColumns(e: Expression) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little bit worried that this doesn't fix the root problem that ResolvedIdentifier output in CREATE actually is used as candidates for resolving default values. Let me think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe it does solve the root problem. Am I right we will not recurse into DefaultValueExpression child so we effectively ensure that default value resolution doesn't have access to attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is matched before children
} | ||
} | ||
|
||
test("test default value special column name conflicting with real column name") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests when a default value references other data columns? It is illegal, just to want to make sure we throw a good error message in this case.
col1 INT,
col2 INT DEFAULT col1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes , do you mean like "test default value should not refer to real column", later in the file?
sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
Outdated
Show resolved
Hide resolved
@szehon-ho, could you also update the description to describe the role |
@aokolnychyi thanks for looking , addressed comments, and edited the pr description. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSuite.scala
Outdated
Show resolved
Hide resolved
Here is a detailed RCA:
However, in ColumnResolutionHelper.innerResolve , Spark will try to resolve UnresolvedAttribute as Literal Function if it fail to find the column from the child node:
After commit fc1cb78, the child node of CreateTable has the output and Spark can find the column from it, thus it is resolved as a column instead of function. |
What changes were proposed in this pull request?
Fix the analysis of default value expression to not include column names
Why are the changes needed?
The following query:
CREATE TABLE t (current_timestamp DEFAULT current_timestamp)
fails with an exception:
This is introduced in : #50631, there CreateTable child's ResolvedIdentifier starts to have output, which are the CREATE TABLE columns. Thus the analyzer will resolve the default value against the other columns, causing the regression. Previously the CreateTable output is empty, so the resolver will fail to resolve against the columns and fallback to literal functions.
Does this PR introduce any user-facing change?
Should fix a regression of Spark 4.0.
How was this patch tested?
Add new unit test in DataSourceV2DataFrameSuite
Was this patch authored or co-authored using generative AI tooling?
No