SNOW-2049599 Fix missing publicapi decorators and don't send dataframeAST if ast_enabled is set.#3285
Conversation
…eAST if ast_enabled is set.
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
|
|
||
| # Note: For the operator overrides we always emit ast, it simply gets ignored in a call chain. | ||
|
|
||
| @publicapi |
There was a problem hiding this comment.
Dunder/magic methods won't accept additional arguments afaik, so I don't think we should decorate them. Is this working in our tests?
There was a problem hiding this comment.
+1, no need to decorate dunder methods.
| + line | ||
| ) | ||
|
|
||
| @publicapi |
There was a problem hiding this comment.
I understand why this has been added here, I did so previously myself in an earlier PR before removing it. The semantics of decorating an internal method with @publicapi is incorrect imho. I do think we need to protect internal methods with our decorator, but we should discuss renaming it if there is a strong case for using the decorator on many internal methods.
There was a problem hiding this comment.
Yes, I agree the naming is bad. Perhaps we should have an @internalapi which alias onto @publicapi. Would that be better? Either way, this is a bug and we can also address the naming separately.
There was a problem hiding this comment.
this should be a public API. It came up many, many times. A good name would be as_spark_string or so. Let's do a proper job here, no issue in making this a public API. There's only upside. Let's document it and make it maintainable.
| ) -> SnowflakeCursor: | ||
| notify_kwargs = {} | ||
| if DATAFRAME_AST_PARAMETER in kwargs: | ||
| if DATAFRAME_AST_PARAMETER in kwargs and is_ast_enabled(): |
There was a problem hiding this comment.
why do we need is_ast_enabled() here? It feels a bit odd to have a check here. Much simpler to pass a parameter or not. If the parameter is not there, it won't be used. Callers are responsible to provide meaningful input to internal APIs.
There was a problem hiding this comment.
cc: @sfc-gh-azhan
@sfc-gh-lspiegelberg and I had a discussion on what to do in the case when dataframe (fragment or otherwise) AST is provided and ast_enabled=false. Based on the current design we don't expect this situation to occur except as a result of a bug in our code likely around the top level APIs improperly defaulting _emit_ast parameter. There are couple options we discussed based on whether we treat precondition violation as a hard failure or not.
-
Add an assertion (i.e., assert not (DATAFRAME_AST_PARAMETER in kwargs) or is_ast_enabled()) that raises a hard failure if the precondition is violated. The benefit here is an immediate failure occurs which propagates back to the original pyspark client (actually @sfc-gh-azhan can you confirm if this is the case as it would need to get propagated back through several layers). Presumably customers would need to notify or we would monitor for this failure. While in PrPr it would be better to have hard failures so we don't miss and can fix the underlying issue, avoid the risk of this becoming a silent failure.
-
Add a check that only requests the dataframe AST if ast_enabled=true. This effectively forces the server connection to execute as sql and avoids treating as a hard failure. The benefit is that we don't fail the upstream callers (pyspark client) and the workload can make progress uninterrupted. We would need to augment this with a way to report the precondition violation to we can know to fix the bug. @sfc-gh-azhan Do you know if there's a good way to do this, i.e. snowflake non-fatal incident log from python code? Also, do we know how to monitor and create bugs from these reports.
There was a problem hiding this comment.
- assertion may block query which can be a problem;
- is much better and I don't see a better way than this.
If the client disables ast, it should never send ast.
There was a problem hiding this comment.
I like this bug fix and we don't need other @publicapi annotation changes right?
There was a problem hiding this comment.
Yes, the @publicapi changes can be cleaned up (avoid using for internal methods).
I think it would be less disruptive to ensure we don't unnecessarily make into a hard failure when there's a very safe alternative action here (execute sql query.). Anyone else would be good to chime in? @sfc-gh-oplaton @sfc-gh-azwiegincew
|
Follow up in offline discussion thread. |
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-2049599 Fix missing publicapi decorators and don't send dataframeAST if ast_enabled is set
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Please write a short description of how your code change solves the related issue.