Skip to content

SNOW-2019483: fix select SQL in dynamic table#3259

Merged
sfc-gh-yuwang merged 28 commits intomainfrom
SNOW-2019483
May 30, 2025
Merged

SNOW-2019483: fix select SQL in dynamic table#3259
sfc-gh-yuwang merged 28 commits intomainfrom
SNOW-2019483

Conversation

@sfc-gh-yuwang
Copy link
Collaborator

@sfc-gh-yuwang sfc-gh-yuwang commented Apr 14, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-2019483

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    When creating a dynamic table, if it is created from a table function, the select on that table function must specify the column name instead of a '*', please refer to the Jira for details

@sfc-gh-snowflakedb-snyk-sa
Copy link

sfc-gh-snowflakedb-snyk-sa commented Apr 14, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@sfc-gh-yuwang sfc-gh-yuwang requested a review from sfc-gh-jdu May 16, 2025 22:44
@sfc-gh-yuwang sfc-gh-yuwang marked this pull request as ready for review May 16, 2025 22:46
@sfc-gh-yuwang sfc-gh-yuwang requested review from a team as code owners May 16, 2025 22:46
Comment on lines +1333 to +1336
plan_2_resolve = None
for node in plan.children_plan_nodes:
plan_2_resolve = self.find_table_function_in_sql_tree(node)
if plan_2_resolve:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be a logic issue in this loop. The current implementation overwrites plan_2_resolve in each iteration, which means only the result from the last child node will be preserved. If a table function is found in any child node except the last one, that result will be lost.

Consider modifying the loop to return immediately when a match is found:

for node in plan.children_plan_nodes:
    plan_2_resolve = self.find_table_function_in_sql_tree(node)
    if plan_2_resolve:
        return plan_2_resolve

This ensures that the first matching table function found in any child node will be properly returned and processed.

Suggested change
plan_2_resolve = None
for node in plan.children_plan_nodes:
plan_2_resolve = self.find_table_function_in_sql_tree(node)
if plan_2_resolve:
plan_2_resolve = None
for node in plan.children_plan_nodes:
plan_2_resolve = self.find_table_function_in_sql_tree(node)
if plan_2_resolve:
break
if plan_2_resolve:

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +1360 to +1362
child = copy.deepcopy(child)
child = self.find_table_function_in_sql_tree(child)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find_table_function_in_sql_tree method can return None, but the code doesn't check for this before trying to access child.queries. Add a null check to use the resolved child only if it's not None: 'child = copy.deepcopy(child); resolved_child = self.find_table_function_in_sql_tree(child); if resolved_child is not None: child = resolved_child'

Spotted by Diamond (based on CI logs)

Is this helpful? React 👍 or 👎 to let us know.

Copy link
Contributor

@sfc-gh-jrose sfc-gh-jrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fill out the PR description? The changes in this PR don't seem trivial to understand.

Comment on lines +1336 to +1338
plan_2_resolve = (
self.find_table_function_in_sql_tree(node) or plan_2_resolve # type: ignore
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible there are multiple table function call in one single DF operation?
from the logic the newer one will overwrite the existing one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested that you cannot do nested table function call, this piece of code is just meant to know if there are changes happened in deeper layer to decide whether re-resolve current plan, so overwrite is ok.

plan.snowflake_plan.source_plan.right_cols == ["*"]
and len(plan.snowflake_plan.children_plan_nodes) == 1
):
child_plan = plan.snowflake_plan.children_plan_nodes[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're doing in-place update, is this on purpose? usually when we want to modify a plan, we deepcopy and update the new one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the deepcopy code is outside this function, I'll figure out a way to move deep copy inside

Comment on lines +1325 to +1327
plan.snowflake_plan.quoted_identifiers[
len(child_plan.quoted_identifiers) :
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is to flatten "*" to all identifiers.
I have a dumb question, what does plan.snowflake_plan.quoted_identifiers contain and why we are only selecting elements after len(child_plan.quoted_identifiers)

Comment on lines +1336 to +1338
plan_2_resolve = (
self.find_table_function_in_sql_tree(node) or plan_2_resolve # type: ignore
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try using BFS. In the past, we have discovered that using DFS for snowpark plans can generate MaxRecursionDepthExceeded errors

Comment on lines +1317 to +1321
if (
plan.snowflake_plan.source_plan.right_cols == ["*"]
and len(plan.snowflake_plan.children_plan_nodes) == 1
):
child_plan = plan.snowflake_plan.children_plan_nodes[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comments here to explain what we are trying to extract from TableFunctionJoin node

Comment on lines +1334 to +1336
plan_2_resolve = None
for node in plan.children_plan_nodes:
plan_2_resolve = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part also needs some explanation. what are we trying to achieve here?

source_plan: Optional[LogicalPlan],
iceberg_config: Optional[dict] = None,
) -> SnowflakePlan:
child_find_table_function = copy.deepcopy(child)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should move this deepcopy into find_table_function_in_sql_tree

source_plan,
)

def find_table_function_in_sql_tree(self, plan: SnowflakePlan):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstring or comment for this function

Copy link
Collaborator

@sfc-gh-jdu sfc-gh-jdu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment explaining it


def find_table_function_in_sql_tree(self, plan: SnowflakePlan):
"""This function is meant to find any table function call from a create dynamic table plan and
replace '*' with explicit identifier in the select of table function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
replace '*' with explicit identifier in the select of table function.
replace '*' with explicit column identifiers in the select of table function.

)

def find_table_function_in_sql_tree(self, plan: SnowflakePlan):
"""This function is meant to find any table function call from a create dynamic table plan and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add a comment that why we need to do this, and actually it's for udtf, but we are not able to differentiate between udtf and other table functions, so we have to do it for all table functions.

Comment on lines +1370 to +1371
deepcopied_plan.snowflake_plan.source_plan # type: ignore
if isinstance(deepcopied_plan, Selectable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is possible for a Selebtable to have a snowflake_plan where source_plan is None. Can we make sure that is not the case here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case shall not happen here, @sfc-gh-jdu can you help me confirm?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's possible. But to be safe, can we exit the recursion if it's None?

queue.append(node)

# the bug only happen when create dynamic table on top of a table function
# this is meant to decide whether the plan is select from a tale function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the comment: tale function should be table function

Suggested change
# this is meant to decide whether the plan is select from a tale function
# this is meant to decide whether the plan is select from a table function

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

Copy link
Contributor

@sfc-gh-aalam sfc-gh-aalam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pinged offline there are still a couple of issues with this implementation

Comment on lines +1345 to +1346
for node in plan_of_child.children_plan_nodes:
queue.append(node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also make sure that the same node is not added twice in the queue? an example where this can happen is in a dimond join case.

@sfc-gh-yuwang sfc-gh-yuwang merged commit ef4fe53 into main May 30, 2025
37 of 39 checks passed
@sfc-gh-yuwang sfc-gh-yuwang deleted the SNOW-2019483 branch May 30, 2025 16:52
@github-actions github-actions bot locked and limited conversation to collaborators May 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants