Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/snowflake/snowpark/_internal/server_connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
create_rlock,
create_thread_local,
escape_quotes,
is_ast_enabled,
get_application_name,
get_version,
is_in_stored_procedure,
Expand Down Expand Up @@ -430,7 +431,7 @@ def execute_and_notify_query_listener(
self, query: str, **kwargs: Any
) -> SnowflakeCursor:
notify_kwargs = {}
if DATAFRAME_AST_PARAMETER in kwargs:
if DATAFRAME_AST_PARAMETER in kwargs and is_ast_enabled():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need is_ast_enabled() here? It feels a bit odd to have a check here. Much simpler to pass a parameter or not. If the parameter is not there, it won't be used. Callers are responsible to provide meaningful input to internal APIs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @sfc-gh-azhan

@sfc-gh-lspiegelberg and I had a discussion on what to do in the case when dataframe (fragment or otherwise) AST is provided and ast_enabled=false. Based on the current design we don't expect this situation to occur except as a result of a bug in our code likely around the top level APIs improperly defaulting _emit_ast parameter. There are couple options we discussed based on whether we treat precondition violation as a hard failure or not.

  1. Add an assertion (i.e., assert not (DATAFRAME_AST_PARAMETER in kwargs) or is_ast_enabled()) that raises a hard failure if the precondition is violated. The benefit here is an immediate failure occurs which propagates back to the original pyspark client (actually @sfc-gh-azhan can you confirm if this is the case as it would need to get propagated back through several layers). Presumably customers would need to notify or we would monitor for this failure. While in PrPr it would be better to have hard failures so we don't miss and can fix the underlying issue, avoid the risk of this becoming a silent failure.

  2. Add a check that only requests the dataframe AST if ast_enabled=true. This effectively forces the server connection to execute as sql and avoids treating as a hard failure. The benefit is that we don't fail the upstream callers (pyspark client) and the workload can make progress uninterrupted. We would need to augment this with a way to report the precondition violation to we can know to fix the bug. @sfc-gh-azhan Do you know if there's a good way to do this, i.e. snowflake non-fatal incident log from python code? Also, do we know how to monitor and create bugs from these reports.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. assertion may block query which can be a problem;
  2. is much better and I don't see a better way than this.

If the client disables ast, it should never send ast.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this bug fix and we don't need other @publicapi annotation changes right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the @publicapi changes can be cleaned up (avoid using for internal methods).

I think it would be less disruptive to ensure we don't unnecessarily make into a hard failure when there's a very safe alternative action here (execute sql query.). Anyone else would be good to chime in? @sfc-gh-oplaton @sfc-gh-azwiegincew

notify_kwargs["dataframeAst"] = kwargs[DATAFRAME_AST_PARAMETER]

try:
Expand Down
1 change: 1 addition & 0 deletions src/snowflake/snowpark/column.py
Original file line number Diff line number Diff line change
Expand Up @@ -826,6 +826,7 @@ def bitxor(

# Note: For the operator overrides we always emit ast, it simply gets ignored in a call chain.

@publicapi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunder/magic methods won't accept additional arguments afaik, so I don't think we should decorate them. Is this working in our tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, no need to decorate dunder methods.

def __neg__(self) -> "Column":
"""Unary minus."""

Expand Down
1 change: 1 addition & 0 deletions src/snowflake/snowpark/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -4732,6 +4732,7 @@ def row_to_string(row: List[str]) -> str:
+ line
)

@publicapi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why this has been added here, I did so previously myself in an earlier PR before removing it. The semantics of decorating an internal method with @publicapi is incorrect imho. I do think we need to protect internal methods with our decorator, but we should discuss renaming it if there is a strong case for using the decorator on many internal methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree the naming is bad. Perhaps we should have an @internalapi which alias onto @publicapi. Would that be better? Either way, this is a bug and we can also address the naming separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a public API. It came up many, many times. A good name would be as_spark_string or so. Let's do a proper job here, no issue in making this a public API. There's only upside. Let's document it and make it maintainable.

def _show_string_spark(
self,
num_rows: int = 20,
Expand Down
4 changes: 2 additions & 2 deletions src/snowflake/snowpark/table.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ class WhenMatchedClause:
condition: An optional :class:`Column` object representing the
specified condition. For example, ``col("a") == 1``.
"""

@publicapi
def __init__(
self, condition: Optional[Column] = None, _emit_ast: bool = True
) -> None:
Expand Down Expand Up @@ -178,7 +178,7 @@ class WhenNotMatchedClause:
condition: An optional :class:`Column` object representing the
specified condition.
"""

@publicapi
def __init__(
self, condition: Optional[Column] = None, _emit_ast: bool = True
) -> None:
Expand Down
Loading