Skip to content

Conversation

@sfc-gh-mayliu
Copy link
Collaborator

@sfc-gh-mayliu sfc-gh-mayliu commented Dec 10, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-2872192

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

Added override_condition parameter to DataFrameWriter.save_as_table() that enables atomic targeted delete-insert operations when used with mode="append". Delete and insert is wrapped in transaction to ensure atomicity and protect tables from entering a bad state.

This performs a similar operation to PySpark's DataFrameWriterV2.overwrite(condition), where rows matching the condition are deleted from the target table before inserting all rows from the DataFrame.

For more details on this PR, refer to this JIRA that contains customer's code snippet.

Monorepo for AST: https://github.com/snowflake-eng/snowflake/pull/368680

@sfc-gh-mayliu sfc-gh-mayliu requested a review from a team December 11, 2025 02:47
Copy link
Contributor

@sfc-gh-joshi sfc-gh-joshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two general questions about this PR:

  1. In slack you mentioned this behavior is similar to Spark's DataFrameWriterV2.overwrite method. Why not implement that instead of adding a flag to saveAsTable? Doing so would keep code simpler, and avoid the potential semantic mismatch between the mode flag and overwrite_condition that @sfc-gh-aling mentioned.
  2. What happens if the schema of the new dataframe differs from that of the original table? My guess from looking at the code is that the INSERT query would fail, which I assume would fail the whole transaction, but if mode="overwrite" we would expect to drop the original table and the operation should succeed, as per our documentation:

”overwrite”: Overwrite the existing table by dropping old table.

@sfc-gh-mayliu
Copy link
Collaborator Author

I have two general questions about this PR:

  1. In slack you mentioned this behavior is similar to Spark's DataFrameWriterV2.overwrite method. Why not implement that instead of adding a flag to saveAsTable? Doing so would keep code simpler, and avoid the potential semantic mismatch between the mode flag and overwrite_condition that @sfc-gh-aling mentioned.
  2. What happens if the schema of the new dataframe differs from that of the original table? My guess from looking at the code is that the INSERT query would fail, which I assume would fail the whole transaction, but if mode="overwrite" we would expect to drop the original table and the operation should succeed, as per our documentation:

”overwrite”: Overwrite the existing table by dropping old table.

Great questions @sfc-gh-joshi.

  1. The existing dataframe_writer.py APIs (save_as_table, copy_into_location, csv, json, etc.) each map to specific Snowflake SQL patterns. We are cautious when adding a new API -- since Snowflake SQL does not natively support targeted delete-insert, this is only a client-level change to provide similar functionality to Spark users. Then adding a completely new API (i.e. overwrite()) would deviate from the existing pattern and expand the API surface unnecessarily.

Now that we've restricted overwrite_condition to only work with mode="overwrite", the semantics are clearer:

  • mode="overwrite" without overwrite_condition: Full table replacement (DROP + CREATE)
  • mode="overwrite" with overwrite_condition: Selective overwrite (DELETE matching rows + INSERT)
    Both are overwrite operations, just with different scopes.
  1. You are right that if the schema differs, insert will fail and cause the entire transaction to rollback. But this is intentional behavior -- selective overwrite requires schema compatibility. If users explicitly provide the optional overwrite_condition, it's their responsibility to ensure both new dataframe's and table's schemas match overwrite_condition. This distinction is similar to PySpark's DataFrameWriterV2.overwrite(condition) which also requires schema compatibility when selectively overwriting partitions.

If overwrite_condition is not provided, the original "overwrite the existing table by dropping old table" semantics still preserves by default.

@sfc-gh-mayliu sfc-gh-mayliu merged commit 266334b into main Dec 11, 2025
29 checks passed
@sfc-gh-mayliu sfc-gh-mayliu deleted the SNOW-2872192-saveAsTable-targeted-delete-insert branch December 11, 2025 20:52
@github-actions github-actions bot locked and limited conversation to collaborators Dec 11, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants