Skip to content

SNOW-2049599 Fix missing publicapi decorators and don't send dataframeAST if ast_enabled is set.#3285

Closed
sfc-gh-evandenberg wants to merge 2 commits intomainfrom
evandenberg-SNOW-2049599-add-missing-publicapi-decorators
Closed

SNOW-2049599 Fix missing publicapi decorators and don't send dataframeAST if ast_enabled is set.#3285
sfc-gh-evandenberg wants to merge 2 commits intomainfrom
evandenberg-SNOW-2049599-add-missing-publicapi-decorators

Conversation

@sfc-gh-evandenberg
Copy link
Collaborator

@sfc-gh-evandenberg sfc-gh-evandenberg commented Apr 19, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-2049599 Fix missing publicapi decorators and don't send dataframeAST if ast_enabled is set

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

@sfc-gh-snowflakedb-snyk-sa
Copy link

sfc-gh-snowflakedb-snyk-sa commented Apr 19, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)


# Note: For the operator overrides we always emit ast, it simply gets ignored in a call chain.

@publicapi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunder/magic methods won't accept additional arguments afaik, so I don't think we should decorate them. Is this working in our tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, no need to decorate dunder methods.

+ line
)

@publicapi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why this has been added here, I did so previously myself in an earlier PR before removing it. The semantics of decorating an internal method with @publicapi is incorrect imho. I do think we need to protect internal methods with our decorator, but we should discuss renaming it if there is a strong case for using the decorator on many internal methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree the naming is bad. Perhaps we should have an @internalapi which alias onto @publicapi. Would that be better? Either way, this is a bug and we can also address the naming separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a public API. It came up many, many times. A good name would be as_spark_string or so. Let's do a proper job here, no issue in making this a public API. There's only upside. Let's document it and make it maintainable.

) -> SnowflakeCursor:
notify_kwargs = {}
if DATAFRAME_AST_PARAMETER in kwargs:
if DATAFRAME_AST_PARAMETER in kwargs and is_ast_enabled():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need is_ast_enabled() here? It feels a bit odd to have a check here. Much simpler to pass a parameter or not. If the parameter is not there, it won't be used. Callers are responsible to provide meaningful input to internal APIs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @sfc-gh-azhan

@sfc-gh-lspiegelberg and I had a discussion on what to do in the case when dataframe (fragment or otherwise) AST is provided and ast_enabled=false. Based on the current design we don't expect this situation to occur except as a result of a bug in our code likely around the top level APIs improperly defaulting _emit_ast parameter. There are couple options we discussed based on whether we treat precondition violation as a hard failure or not.

  1. Add an assertion (i.e., assert not (DATAFRAME_AST_PARAMETER in kwargs) or is_ast_enabled()) that raises a hard failure if the precondition is violated. The benefit here is an immediate failure occurs which propagates back to the original pyspark client (actually @sfc-gh-azhan can you confirm if this is the case as it would need to get propagated back through several layers). Presumably customers would need to notify or we would monitor for this failure. While in PrPr it would be better to have hard failures so we don't miss and can fix the underlying issue, avoid the risk of this becoming a silent failure.

  2. Add a check that only requests the dataframe AST if ast_enabled=true. This effectively forces the server connection to execute as sql and avoids treating as a hard failure. The benefit is that we don't fail the upstream callers (pyspark client) and the workload can make progress uninterrupted. We would need to augment this with a way to report the precondition violation to we can know to fix the bug. @sfc-gh-azhan Do you know if there's a good way to do this, i.e. snowflake non-fatal incident log from python code? Also, do we know how to monitor and create bugs from these reports.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. assertion may block query which can be a problem;
  2. is much better and I don't see a better way than this.

If the client disables ast, it should never send ast.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this bug fix and we don't need other @publicapi annotation changes right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the @publicapi changes can be cleaned up (avoid using for internal methods).

I think it would be less disruptive to ensure we don't unnecessarily make into a hard failure when there's a very safe alternative action here (execute sql query.). Anyone else would be good to chime in? @sfc-gh-oplaton @sfc-gh-azwiegincew

@sfc-gh-evandenberg
Copy link
Collaborator Author

Follow up in offline discussion thread.

@github-actions github-actions bot locked and limited conversation to collaborators May 19, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants