Skip to content

[Bug] Snapshot with multiple unique key and hard_deletes = new_record #12415

@cespilez

Description

@cespilez

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When a snapshot is defined with multiple unique_keys and hard_deletes option is set to 'new_record', new columns called: "dbt_unique_key_1", "dbt_unique_key_2", ... , "dbt_unique_key_n" are added to target table.

First execution creates the table correctly with all metadata fields: dbt_scd_id, dbt_valid_from, dbt_valid_to, dbt_updated_at and dbt_is_deleted. Second execution adds those new fields to target table ("dbt_unique_key_1" and so on ...). From this point, execution fails because "dbt_unique_key_" fields are both on target and source table during joins so ambiguous error occurs.

Expected Behavior

"dbt_unique_key_" fields are not added to target table after snapshot operative when multiple unique_key is provided.

Steps To Reproduce

  1. Environment: dbt-core: 1.10.15, athena adapter: 1.9.5
  2. snapshots/dim_product_sources.yml:
    version: 2

snapshots:

  • name: dim_product_sources
    description: ""
    relation: ref('int_snapshot_dim_product_sources')
    config:
    strategy: check
    unique_key:
    - id_source
    - source
    check_cols: all
    hard_deletes: new_record
    alias: dim_product_sources

    columns:

    • name: id_master
    • name: id_source
    • name: desc_source
    • name: source
    • name: dbt_valid_from
    • name: dbt_valid_to
    • name: dbt_scd_id
    • name: dbt_updated_at
    • name: dbt_is_deleted
  1. Run dbt snapshot --select +dim_product_sources
  2. Make a change in ref('int_snapshot_dim_product_sources') manually
  3. Run again dbt snapshot --select +dim_product_sources

Relevant log output

snapshotted_data as (
    select *, 
    
        
            id_source as dbt_unique_key_1 ,
        
            source as dbt_unique_key_2
        
    
    from "awsdatacatalog"."db_test"."dim_product_sources"
        where
            
                dbt_valid_to is null
            
),

insertions as (
    select
            'insert' as dbt_change_type,
            source_data.*,'False' as dbt_is_deleted
    from insertions_source_data as source_data
        left outer join snapshotted_data
            on 
    
        
	    
	    
	    (snapshotted_data.dbt_unique_key_1 = source_data.dbt_unique_key_1)
 and
        
	    
	    
	    (snapshotted_data.dbt_unique_key_2 = source_data.dbt_unique_key_2)
    
    
        where 
    
        snapshotted_data.dbt_unique_key_1 is null
    
        or (
    
        snapshotted_data.dbt_unique_key_1 is not null
    
 and (
               (snapshotted_data."id_master" != source_data."id_master"
        or
        (
            ((snapshotted_data."id_master" is null) and not (source_data."id_master" is null))
            or
            ((not snapshotted_data."id_master" is null) and (source_data."id_master" is null))
        ) or snapshotted_data."id_source" != source_data."id_source"
        or
        (
            ((snapshotted_data."id_source" is null) and not (source_data."id_source" is null))
            or
            ((not snapshotted_data."id_source" is null) and (source_data."id_source" is null))
        ) or snapshotted_data."desc_source" != source_data."desc_source"
        or
        (
            ((snapshotted_data."desc_source" is null) and not (source_data."desc_source" is null))
            or
            ((not snapshotted_data."desc_source" is null) and (source_data."desc_source" is null))
        ) or snapshotted_data."source" != source_data."source"
        or
        (
            ((snapshotted_data."source" is null) and not (source_data."source" is null))
            or
            ((not snapshotted_data."source" is null) and (source_data."source" is null))
        ))or snapshotted_data.dbt_is_deleted = 'True' 
            )
    )
),
updates as (
    select
            'update' as dbt_change_type,
            source_data.*,
            snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted
    from updates_source_data as source_data
        join snapshotted_data
            on 
    
        
	    
	    
	    (snapshotted_data.dbt_unique_key_1 = source_data.dbt_unique_key_1)
 and
        
	    
	    
	    (snapshotted_data.dbt_unique_key_2 = source_data.dbt_unique_key_2)
    
    
    where (
            (snapshotted_data."id_master" != source_data."id_master"
        or
        (
            ((snapshotted_data."id_master" is null) and not (source_data."id_master" is null))
            or
            ((not snapshotted_data."id_master" is null) and (source_data."id_master" is null))
        ) or snapshotted_data."id_source" != source_data."id_source"
        or
        (
            ((snapshotted_data."id_source" is null) and not (source_data."id_source" is null))
            or
            ((not snapshotted_data."id_source" is null) and (source_data."id_source" is null))
        ) or snapshotted_data."desc_source" != source_data."desc_source"
        or
        (
            ((snapshotted_data."desc_source" is null) and not (source_data."desc_source" is null))
            or
            ((not snapshotted_data."desc_source" is null) and (source_data."desc_source" is null))
        ) or snapshotted_data."source" != source_data."source"
        or
        (
            ((snapshotted_data."source" is null) and not (source_data."source" is null))
            or
            ((not snapshotted_data."source" is null) and (source_data."source" is null))
        ))or snapshotted_data.dbt_is_deleted = 'True' 
        )
    )
    ,
    deletes as (
    select
            'delete' as dbt_change_type,
            source_data.*,
            
    cast(now() as timestamp(6))
 as dbt_valid_from,
            
    cast(now() as timestamp(6))
 as dbt_updated_at,
            
    cast(now() as timestamp(6))
 as dbt_valid_to,
            snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted
        from snapshotted_data
        left join deletes_source_data as source_data
            on 
    
        
	    
	    
	    (snapshotted_data.dbt_unique_key_1 = source_data.dbt_unique_key_1)
 and
        
	    
	    
	    (snapshotted_data.dbt_unique_key_2 = source_data.dbt_unique_key_2)
    
    
        where 
    
        source_data.dbt_unique_key_1 is null
    
        and not (
                --avoid updating the record's valid_to if the latest entry is marked as deleted
                snapshotted_data.dbt_is_deleted = 'True'
                and
                snapshotted_data.dbt_valid_to is null
            )
    )
        
        
    ,
    deletion_records as (
    select
            'insert' as dbt_change_type,
            snapshotted_data."id_master",snapshotted_data."id_source",snapshotted_data."desc_source",snapshotted_data."source",snapshotted_data.id_source as dbt_unique_key_1,
                snapshotted_data.source as dbt_unique_key_2,
                
    cast(now() as timestamp(6))
 as dbt_valid_from,
            
    cast(now() as timestamp(6))
 as dbt_updated_at,
            snapshotted_data.dbt_valid_to as dbt_valid_to,
            to_hex(md5(to_utf8(coalesce(cast(dbt_scd_id as varchar ), '')
         || '|' || coalesce(cast(
    cast(now() as timestamp(6))
 as varchar ), '')
        ))) as dbt_scd_id,
            'True' as dbt_is_deleted
        from snapshotted_data
        left join deletes_source_data as source_data
            on 
    
        
	    
	    
	    (snapshotted_data.dbt_unique_key_1 = source_data.dbt_unique_key_1)
 and
        
	    
	    
	    (snapshotted_data.dbt_unique_key_2 = source_data.dbt_unique_key_2)
    
    
    where 
    
        source_data.dbt_unique_key_1 is null
    
    and not (
            --avoid inserting a new record if the latest one is marked as deleted
            snapshotted_data.dbt_is_deleted = 'True'
            and
            snapshotted_data.dbt_valid_to is null
            )

Environment

- OS: Linux 
- Python: 3.12
- dbt: 1.10.15

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

Process run with adapter athena=1.9.5

The problems is that in "snapshotted_data" query definition, both dbt_unique_key_1 and dbt_unique_key_2 fields are created as it can be seen in log:

snapshotted_data as (
select ,
id_source as dbt_unique_key_1 ,
source as dbt_unique_key_2
from "awsdatacatalog"."db_test"."dim_product_sources"
where
dbt_valid_to is null
),
but after that, "insertions", "updates", "deletes" and "deletion_records" subqueries include these "dbt_unique_key_" fields again in their select statement by "source_data.
" or just referencing them directly.
They should not be included in final "insertions", "updates", "deletes" and "deletion_records" subqueries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions