-
Notifications
You must be signed in to change notification settings - Fork 146
SNOW-2049599 Fix missing publicapi decorators and don't send dataframeAST if ast_enabled is set. #3285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-2049599 Fix missing publicapi decorators and don't send dataframeAST if ast_enabled is set. #3285
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -826,6 +826,7 @@ def bitxor( | |
|
|
||
| # Note: For the operator overrides we always emit ast, it simply gets ignored in a call chain. | ||
|
|
||
| @publicapi | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dunder/magic methods won't accept additional arguments afaik, so I don't think we should decorate them. Is this working in our tests?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1, no need to decorate dunder methods. |
||
| def __neg__(self) -> "Column": | ||
| """Unary minus.""" | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4732,6 +4732,7 @@ def row_to_string(row: List[str]) -> str: | |
| + line | ||
| ) | ||
|
|
||
| @publicapi | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand why this has been added here, I did so previously myself in an earlier PR before removing it. The semantics of decorating an internal method with
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I agree the naming is bad. Perhaps we should have an @internalapi which alias onto @publicapi. Would that be better? Either way, this is a bug and we can also address the naming separately.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this should be a public API. It came up many, many times. A good name would be |
||
| def _show_string_spark( | ||
| self, | ||
| num_rows: int = 20, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need
is_ast_enabled()here? It feels a bit odd to have a check here. Much simpler to pass a parameter or not. If the parameter is not there, it won't be used. Callers are responsible to provide meaningful input to internal APIs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @sfc-gh-azhan
@sfc-gh-lspiegelberg and I had a discussion on what to do in the case when dataframe (fragment or otherwise) AST is provided and ast_enabled=false. Based on the current design we don't expect this situation to occur except as a result of a bug in our code likely around the top level APIs improperly defaulting _emit_ast parameter. There are couple options we discussed based on whether we treat precondition violation as a hard failure or not.
Add an assertion (i.e., assert not (DATAFRAME_AST_PARAMETER in kwargs) or is_ast_enabled()) that raises a hard failure if the precondition is violated. The benefit here is an immediate failure occurs which propagates back to the original pyspark client (actually @sfc-gh-azhan can you confirm if this is the case as it would need to get propagated back through several layers). Presumably customers would need to notify or we would monitor for this failure. While in PrPr it would be better to have hard failures so we don't miss and can fix the underlying issue, avoid the risk of this becoming a silent failure.
Add a check that only requests the dataframe AST if ast_enabled=true. This effectively forces the server connection to execute as sql and avoids treating as a hard failure. The benefit is that we don't fail the upstream callers (pyspark client) and the workload can make progress uninterrupted. We would need to augment this with a way to report the precondition violation to we can know to fix the bug. @sfc-gh-azhan Do you know if there's a good way to do this, i.e. snowflake non-fatal incident log from python code? Also, do we know how to monitor and create bugs from these reports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the client disables ast, it should never send ast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this bug fix and we don't need other
@publicapiannotation changes right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the @publicapi changes can be cleaned up (avoid using for internal methods).
I think it would be less disruptive to ensure we don't unnecessarily make into a hard failure when there's a very safe alternative action here (execute sql query.). Anyone else would be good to chime in? @sfc-gh-oplaton @sfc-gh-azwiegincew