-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Is this a new bug in dbt-core?
- I believe this is a new bug in dbt-core
- I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
When a snapshot is defined with multiple unique_keys and hard_deletes option is set to 'new_record', new columns called: "dbt_unique_key_1", "dbt_unique_key_2", ... , "dbt_unique_key_n" are added to target table.
First execution creates the table correctly with all metadata fields: dbt_scd_id, dbt_valid_from, dbt_valid_to, dbt_updated_at and dbt_is_deleted. Second execution adds those new fields to target table ("dbt_unique_key_1" and so on ...). From this point, execution fails because "dbt_unique_key_" fields are both on target and source table during joins so ambiguous error occurs.
Expected Behavior
"dbt_unique_key_" fields are not added to target table after snapshot operative when multiple unique_key is provided.
Steps To Reproduce
- Environment: dbt-core: 1.10.15, athena adapter: 1.9.5
- snapshots/dim_product_sources.yml:
version: 2
snapshots:
-
name: dim_product_sources
description: ""
relation: ref('int_snapshot_dim_product_sources')
config:
strategy: check
unique_key:
- id_source
- source
check_cols: all
hard_deletes: new_record
alias: dim_product_sourcescolumns:
- name: id_master
- name: id_source
- name: desc_source
- name: source
- name: dbt_valid_from
- name: dbt_valid_to
- name: dbt_scd_id
- name: dbt_updated_at
- name: dbt_is_deleted
- Run dbt snapshot --select +dim_product_sources
- Make a change in ref('int_snapshot_dim_product_sources') manually
- Run again dbt snapshot --select +dim_product_sources
Relevant log output
snapshotted_data as (
select *,
id_source as dbt_unique_key_1 ,
source as dbt_unique_key_2
from "awsdatacatalog"."db_test"."dim_product_sources"
where
dbt_valid_to is null
),
insertions as (
select
'insert' as dbt_change_type,
source_data.*,'False' as dbt_is_deleted
from insertions_source_data as source_data
left outer join snapshotted_data
on
(snapshotted_data.dbt_unique_key_1 = source_data.dbt_unique_key_1)
and
(snapshotted_data.dbt_unique_key_2 = source_data.dbt_unique_key_2)
where
snapshotted_data.dbt_unique_key_1 is null
or (
snapshotted_data.dbt_unique_key_1 is not null
and (
(snapshotted_data."id_master" != source_data."id_master"
or
(
((snapshotted_data."id_master" is null) and not (source_data."id_master" is null))
or
((not snapshotted_data."id_master" is null) and (source_data."id_master" is null))
) or snapshotted_data."id_source" != source_data."id_source"
or
(
((snapshotted_data."id_source" is null) and not (source_data."id_source" is null))
or
((not snapshotted_data."id_source" is null) and (source_data."id_source" is null))
) or snapshotted_data."desc_source" != source_data."desc_source"
or
(
((snapshotted_data."desc_source" is null) and not (source_data."desc_source" is null))
or
((not snapshotted_data."desc_source" is null) and (source_data."desc_source" is null))
) or snapshotted_data."source" != source_data."source"
or
(
((snapshotted_data."source" is null) and not (source_data."source" is null))
or
((not snapshotted_data."source" is null) and (source_data."source" is null))
))or snapshotted_data.dbt_is_deleted = 'True'
)
)
),
updates as (
select
'update' as dbt_change_type,
source_data.*,
snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted
from updates_source_data as source_data
join snapshotted_data
on
(snapshotted_data.dbt_unique_key_1 = source_data.dbt_unique_key_1)
and
(snapshotted_data.dbt_unique_key_2 = source_data.dbt_unique_key_2)
where (
(snapshotted_data."id_master" != source_data."id_master"
or
(
((snapshotted_data."id_master" is null) and not (source_data."id_master" is null))
or
((not snapshotted_data."id_master" is null) and (source_data."id_master" is null))
) or snapshotted_data."id_source" != source_data."id_source"
or
(
((snapshotted_data."id_source" is null) and not (source_data."id_source" is null))
or
((not snapshotted_data."id_source" is null) and (source_data."id_source" is null))
) or snapshotted_data."desc_source" != source_data."desc_source"
or
(
((snapshotted_data."desc_source" is null) and not (source_data."desc_source" is null))
or
((not snapshotted_data."desc_source" is null) and (source_data."desc_source" is null))
) or snapshotted_data."source" != source_data."source"
or
(
((snapshotted_data."source" is null) and not (source_data."source" is null))
or
((not snapshotted_data."source" is null) and (source_data."source" is null))
))or snapshotted_data.dbt_is_deleted = 'True'
)
)
,
deletes as (
select
'delete' as dbt_change_type,
source_data.*,
cast(now() as timestamp(6))
as dbt_valid_from,
cast(now() as timestamp(6))
as dbt_updated_at,
cast(now() as timestamp(6))
as dbt_valid_to,
snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted
from snapshotted_data
left join deletes_source_data as source_data
on
(snapshotted_data.dbt_unique_key_1 = source_data.dbt_unique_key_1)
and
(snapshotted_data.dbt_unique_key_2 = source_data.dbt_unique_key_2)
where
source_data.dbt_unique_key_1 is null
and not (
--avoid updating the record's valid_to if the latest entry is marked as deleted
snapshotted_data.dbt_is_deleted = 'True'
and
snapshotted_data.dbt_valid_to is null
)
)
,
deletion_records as (
select
'insert' as dbt_change_type,
snapshotted_data."id_master",snapshotted_data."id_source",snapshotted_data."desc_source",snapshotted_data."source",snapshotted_data.id_source as dbt_unique_key_1,
snapshotted_data.source as dbt_unique_key_2,
cast(now() as timestamp(6))
as dbt_valid_from,
cast(now() as timestamp(6))
as dbt_updated_at,
snapshotted_data.dbt_valid_to as dbt_valid_to,
to_hex(md5(to_utf8(coalesce(cast(dbt_scd_id as varchar ), '')
|| '|' || coalesce(cast(
cast(now() as timestamp(6))
as varchar ), '')
))) as dbt_scd_id,
'True' as dbt_is_deleted
from snapshotted_data
left join deletes_source_data as source_data
on
(snapshotted_data.dbt_unique_key_1 = source_data.dbt_unique_key_1)
and
(snapshotted_data.dbt_unique_key_2 = source_data.dbt_unique_key_2)
where
source_data.dbt_unique_key_1 is null
and not (
--avoid inserting a new record if the latest one is marked as deleted
snapshotted_data.dbt_is_deleted = 'True'
and
snapshotted_data.dbt_valid_to is null
)Environment
- OS: Linux
- Python: 3.12
- dbt: 1.10.15Which database adapter are you using with dbt?
other (mention it in "Additional Context")
Additional Context
Process run with adapter athena=1.9.5
The problems is that in "snapshotted_data" query definition, both dbt_unique_key_1 and dbt_unique_key_2 fields are created as it can be seen in log:
snapshotted_data as (
select ,
id_source as dbt_unique_key_1 ,
source as dbt_unique_key_2
from "awsdatacatalog"."db_test"."dim_product_sources"
where
dbt_valid_to is null
),
but after that, "insertions", "updates", "deletes" and "deletion_records" subqueries include these "dbt_unique_key_" fields again in their select statement by "source_data." or just referencing them directly.
They should not be included in final "insertions", "updates", "deletes" and "deletion_records" subqueries.