-
Notifications
You must be signed in to change notification settings - Fork 141
SNOW-2872192: Support targeted delete-insert in save_as_table #4031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-2872192: Support targeted delete-insert in save_as_table #4031
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two general questions about this PR:
- In slack you mentioned this behavior is similar to Spark's
DataFrameWriterV2.overwritemethod. Why not implement that instead of adding a flag tosaveAsTable? Doing so would keep code simpler, and avoid the potential semantic mismatch between themodeflag andoverwrite_conditionthat @sfc-gh-aling mentioned. - What happens if the schema of the new dataframe differs from that of the original table? My guess from looking at the code is that the INSERT query would fail, which I assume would fail the whole transaction, but if
mode="overwrite"we would expect to drop the original table and the operation should succeed, as per our documentation:
”overwrite”: Overwrite the existing table by dropping old table.
…upport for append mode
Great questions @sfc-gh-joshi.
Now that we've restricted
If |
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-2872192
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Please write a short description of how your code change solves the related issue.
Added override_condition parameter to DataFrameWriter.save_as_table() that enables atomic targeted delete-insert operations when used with
mode="append". Delete and insert is wrapped in transaction to ensure atomicity and protect tables from entering a bad state.This performs a similar operation to PySpark's DataFrameWriterV2.overwrite(condition), where rows matching the condition are deleted from the target table before inserting all rows from the DataFrame.
For more details on this PR, refer to this JIRA that contains customer's code snippet.
Monorepo for AST: https://github.com/snowflake-eng/snowflake/pull/368680