SNOW-2019483: fix select SQL in dynamic table#3259
Conversation
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
5df8537 to
d15ea60
Compare
| plan_2_resolve = None | ||
| for node in plan.children_plan_nodes: | ||
| plan_2_resolve = self.find_table_function_in_sql_tree(node) | ||
| if plan_2_resolve: |
There was a problem hiding this comment.
There appears to be a logic issue in this loop. The current implementation overwrites plan_2_resolve in each iteration, which means only the result from the last child node will be preserved. If a table function is found in any child node except the last one, that result will be lost.
Consider modifying the loop to return immediately when a match is found:
for node in plan.children_plan_nodes:
plan_2_resolve = self.find_table_function_in_sql_tree(node)
if plan_2_resolve:
return plan_2_resolveThis ensures that the first matching table function found in any child node will be properly returned and processed.
| plan_2_resolve = None | |
| for node in plan.children_plan_nodes: | |
| plan_2_resolve = self.find_table_function_in_sql_tree(node) | |
| if plan_2_resolve: | |
| plan_2_resolve = None | |
| for node in plan.children_plan_nodes: | |
| plan_2_resolve = self.find_table_function_in_sql_tree(node) | |
| if plan_2_resolve: | |
| break | |
| if plan_2_resolve: | |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
| child = copy.deepcopy(child) | ||
| child = self.find_table_function_in_sql_tree(child) | ||
|
|
There was a problem hiding this comment.
The find_table_function_in_sql_tree method can return None, but the code doesn't check for this before trying to access child.queries. Add a null check to use the resolved child only if it's not None: 'child = copy.deepcopy(child); resolved_child = self.find_table_function_in_sql_tree(child); if resolved_child is not None: child = resolved_child'
Spotted by Diamond (based on CI logs)
Is this helpful? React 👍 or 👎 to let us know.
sfc-gh-jrose
left a comment
There was a problem hiding this comment.
Can you fill out the PR description? The changes in this PR don't seem trivial to understand.
| plan_2_resolve = ( | ||
| self.find_table_function_in_sql_tree(node) or plan_2_resolve # type: ignore | ||
| ) |
There was a problem hiding this comment.
is it possible there are multiple table function call in one single DF operation?
from the logic the newer one will overwrite the existing one
There was a problem hiding this comment.
I have tested that you cannot do nested table function call, this piece of code is just meant to know if there are changes happened in deeper layer to decide whether re-resolve current plan, so overwrite is ok.
| plan.snowflake_plan.source_plan.right_cols == ["*"] | ||
| and len(plan.snowflake_plan.children_plan_nodes) == 1 | ||
| ): | ||
| child_plan = plan.snowflake_plan.children_plan_nodes[0] |
There was a problem hiding this comment.
we're doing in-place update, is this on purpose? usually when we want to modify a plan, we deepcopy and update the new one
There was a problem hiding this comment.
the deepcopy code is outside this function, I'll figure out a way to move deep copy inside
| plan.snowflake_plan.quoted_identifiers[ | ||
| len(child_plan.quoted_identifiers) : | ||
| ] |
There was a problem hiding this comment.
I guess this is to flatten "*" to all identifiers.
I have a dumb question, what does plan.snowflake_plan.quoted_identifiers contain and why we are only selecting elements after len(child_plan.quoted_identifiers)
| plan_2_resolve = ( | ||
| self.find_table_function_in_sql_tree(node) or plan_2_resolve # type: ignore | ||
| ) |
There was a problem hiding this comment.
Try using BFS. In the past, we have discovered that using DFS for snowpark plans can generate MaxRecursionDepthExceeded errors
| if ( | ||
| plan.snowflake_plan.source_plan.right_cols == ["*"] | ||
| and len(plan.snowflake_plan.children_plan_nodes) == 1 | ||
| ): | ||
| child_plan = plan.snowflake_plan.children_plan_nodes[0] |
There was a problem hiding this comment.
please add comments here to explain what we are trying to extract from TableFunctionJoin node
| plan_2_resolve = None | ||
| for node in plan.children_plan_nodes: | ||
| plan_2_resolve = ( |
There was a problem hiding this comment.
this part also needs some explanation. what are we trying to achieve here?
| source_plan: Optional[LogicalPlan], | ||
| iceberg_config: Optional[dict] = None, | ||
| ) -> SnowflakePlan: | ||
| child_find_table_function = copy.deepcopy(child) |
There was a problem hiding this comment.
maybe we should move this deepcopy into find_table_function_in_sql_tree
| source_plan, | ||
| ) | ||
|
|
||
| def find_table_function_in_sql_tree(self, plan: SnowflakePlan): |
There was a problem hiding this comment.
add docstring or comment for this function
sfc-gh-jdu
left a comment
There was a problem hiding this comment.
add a comment explaining it
|
|
||
| def find_table_function_in_sql_tree(self, plan: SnowflakePlan): | ||
| """This function is meant to find any table function call from a create dynamic table plan and | ||
| replace '*' with explicit identifier in the select of table function. |
There was a problem hiding this comment.
| replace '*' with explicit identifier in the select of table function. | |
| replace '*' with explicit column identifiers in the select of table function. |
| ) | ||
|
|
||
| def find_table_function_in_sql_tree(self, plan: SnowflakePlan): | ||
| """This function is meant to find any table function call from a create dynamic table plan and |
There was a problem hiding this comment.
also add a comment that why we need to do this, and actually it's for udtf, but we are not able to differentiate between udtf and other table functions, so we have to do it for all table functions.
| deepcopied_plan.snowflake_plan.source_plan # type: ignore | ||
| if isinstance(deepcopied_plan, Selectable) |
There was a problem hiding this comment.
is is possible for a Selebtable to have a snowflake_plan where source_plan is None. Can we make sure that is not the case here?
There was a problem hiding this comment.
I think the case shall not happen here, @sfc-gh-jdu can you help me confirm?
There was a problem hiding this comment.
I'm not sure it's possible. But to be safe, can we exit the recursion if it's None?
| queue.append(node) | ||
|
|
||
| # the bug only happen when create dynamic table on top of a table function | ||
| # this is meant to decide whether the plan is select from a tale function |
There was a problem hiding this comment.
There's a typo in the comment: tale function should be table function
| # this is meant to decide whether the plan is select from a tale function | |
| # this is meant to decide whether the plan is select from a table function |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
sfc-gh-aalam
left a comment
There was a problem hiding this comment.
pinged offline there are still a couple of issues with this implementation
| for node in plan_of_child.children_plan_nodes: | ||
| queue.append(node) |
There was a problem hiding this comment.
can we also make sure that the same node is not added twice in the queue? an example where this can happen is in a dimond join case.
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-2019483
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
When creating a dynamic table, if it is created from a table function, the select on that table function must specify the column name instead of a '*', please refer to the Jira for details