feat: Complete hard_deletes='new_record' implementation for snapshots #1263

benc-db · 2025-11-14T22:23:05Z

Migration of #1244

Fixes #1176 Implements full support for hard_deletes configuration in snapshot materializations, enabling users to track deleted source records with dedicated deletion records marked by dbt_is_deleted=True. The dbt-core snapshot_staging_table macro generates a deletion_records CTE that relies on get_column_schema_from_query() for source columns, which returns proper column schema objects with .name attributes. However, when building the list of snapshotted_cols from the target table, it used get_columns_in_relation() which returns agate.Row tuples like ('col_name', 'data_type', 'comment'). The deletion_records CTE tried to iterate these tuples using .name attribute (via get_list_of_column_names()), which doesn't exist on tuples. This caused the column matching logic to fail silently, preventing deletion records from being properly constructed with the correct columns from the snapshotted table. This resulted in deletion records being inserted with NULL values for all source columns (id, name, etc.) instead of the actual values from the deleted records, causing malformed output as reported in issue #1176. Created databricks__snapshot_staging_table override that properly extracts column names from agate.Row tuples by accessing index [0] instead of .name attribute. This ensures the deletion_records CTE receives correct column lists for both source and target tables, allowing proper column matching when inserting deletion records. Additionally, overrode databricks__build_snapshot_table to include dbt_is_deleted column in initial snapshot table creation when hard_deletes='new_record', ensuring the column exists from the start and doesn't need to be added later. **New file: dbt/include/databricks/macros/materializations/snapshot_helpers.sql** - databricks__build_snapshot_table: Adds dbt_is_deleted column for new_record mode - databricks__snapshot_staging_table: Complete override to fix column name extraction - Properly extracts column names from agate.Row tuples using index [0] - Filters out Databricks metadata rows (starting with '#') - Generates correct deletion_records CTE with proper column matching **New file: dbt/include/databricks/macros/materializations/snapshot_merge.sql** - databricks__snapshot_merge_sql: Implements hard_deletes-aware MERGE logic - Supports 'invalidate' mode with WHEN NOT MATCHED BY SOURCE clause - Uses 'insert *' pattern to include all staging table columns including dbt_is_deleted **New file: tests/functional/adapter/simple_snapshot/test_hard_deletes.py** - Comprehensive functional tests for all three hard_deletes modes - TestHardDeleteIgnore: Verifies deleted records remain unchanged (default) - TestHardDeleteInvalidate: Verifies dbt_valid_to is set for deleted records - TestHardDeleteNewRecord: Verifies new deletion records with dbt_is_deleted=True **hard_deletes='ignore'** (default) - Deleted records remain unchanged in snapshot - dbt_valid_to stays NULL for records no longer in source - Maintains backward compatibility **hard_deletes='invalidate'** - Deleted records are invalidated by setting dbt_valid_to timestamp - Uses Delta Lake's WHEN NOT MATCHED BY SOURCE clause - Original records marked as no longer valid when removed from source **hard_deletes='new_record'** - Original records are invalidated (dbt_valid_to set) - New deletion records inserted with dbt_is_deleted=True and actual source column values - Provides complete audit trail of deletions - Resolves malformed output issue where deletion records had NULL values - All 3 functional tests passing (ignore, invalidate, new_record) - Code quality checks passing (ruff, ruff-format, mypy) - No regressions in existing snapshot functionality - Verified with Databricks Delta Lake MERGE operations - Tested against Unity Catalog cluster - dbt/include/databricks/macros/materializations/snapshot_helpers.sql (new, 221 lines) - dbt/include/databricks/macros/materializations/snapshot_merge.sql (new, 32 lines) - tests/functional/adapter/simple_snapshot/test_hard_deletes.py (new, 298 lines) - .gitignore (added docs/plans/ exclusion) Signed-off-by: Randy Pitcher <[email protected]> Co-Authored-By: Claude <[email protected]> Signed-off-by: Randy Pitcher <[email protected]>

Code quality improvements: - Move SQL fixtures from inline to fixtures.py per project patterns - Fix boolean literals: 'True'/'False' strings → true/false booleans - Databricks has native BOOLEAN support, use it for type safety - Aligns with test expectations (WHERE dbt_is_deleted = true) - Standardize Jinja2 indentation to consistent 4-space increments - Improve Jinja2 comment syntax consistency - Add whitespace control (prefer {%- -%}) to prevent blank lines Documentation: - Add Jinja2 whitespace control guidance to AGENTS.md - Document preference (not requirement) for using - in Jinja tags Build config: - Remove orphaned 'path = .hatch' line from pyproject.toml Testing: - All 701 unit tests pass ✓ - All 17 snapshot functional tests pass ✓ - New hard_deletes tests (ignore/invalidate/new_record) pass ✓

github-actions · 2025-11-14T22:36:13Z

Coverage report

This PR does not seem to contain any modification to coverable code.

@randypitcherii

Add entry for PR #1263 which implements hard_deletes='new_record' support in snapshot materializations. Credits @randypitcherii for the original PR. References: - Issue #1176: SCD2 column check and timestamp check malformed output - PR #1263: feat: Complete hard_deletes='new_record' implementation

randypitcherii and others added 5 commits October 29, 2025 15:48

updated gitignore to exclude .cusrsor folder

941bfce

Update .gitignore to include uv.lock and docs/plans/

ae4bba0

Merge branch 'main' into feature/1176-hard-delete-processing

012dd56

benc-db requested a review from ericj-db as a code owner November 14, 2025 22:23

benc-db temporarily deployed to azure-prod November 14, 2025 22:23 — with GitHub Actions Inactive

benc-db had a problem deploying to azure-prod November 14, 2025 22:23 — with GitHub Actions Failure

benc-db temporarily deployed to azure-prod November 14, 2025 23:18 — with GitHub Actions Inactive

benc-db temporarily deployed to azure-prod November 15, 2025 00:13 — with GitHub Actions Inactive

benc-db deployed to azure-prod November 15, 2025 00:13 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Complete hard_deletes='new_record' implementation for snapshots #1263

feat: Complete hard_deletes='new_record' implementation for snapshots #1263

benc-db commented Nov 14, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Complete hard_deletes='new_record' implementation for snapshots #1263

Are you sure you want to change the base?

feat: Complete hard_deletes='new_record' implementation for snapshots #1263

Conversation

benc-db commented Nov 14, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

Coverage report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants