-
Notifications
You must be signed in to change notification settings - Fork 204
Description
Describe the bug
The concept of transaction for get_delete_and_insert_queries was introduced in this commit.
However as visible in the documentation:
If a transaction mutates (update or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently. Conflicting transactions are cancelled. Conflicting DML statements that run outside of a transaction are queued to run later, subject to queuing limits.
Therefore if we want to have the delete part we either have:
- have an outside system to limit concurrency so that only one dbt run can concurrently with that transaction
- remove the transaction
To Reproduce
Steps to reproduce the behavior:
- set up a dbt project using Elementary
- Create a simple model
- Run concurrently twice the model and hope that they finish around the same time
- See error generated by the concurrent transaction on
dbt_modelsthat would look like:
/* {"app": "dbt", "dbt_version": "1.6.0", "profile_name": "dbt", "target_name": "prod", "connection_name": "master"} */
begin transaction;
delete from `gcp_project`.`elementary`.`dbt_models`
where
metadata_hash is null
or metadata_hash in (select metadata_hash from `gcp_project`.`elementary`.`dbt_models__tmp_table_XXXX`);
insert into `gcp_project`.`elementary`.`dbt_models` select * from `gcp_project`.`elementary`.`dbt_models__tmp_table_YYY`;
commit;
Expected behavior
I would expect to be able to run concurrently 2 dbt run commands without them failing when they happen to start the Elementary data post hook executing get_delete_and_insert_queries macro query concurrently.
Environment (please complete the following information):
- edr Version: 0.9.2
- dbt package Version: 0.9.0
Additional context
Slack thread regarding other people affected: https://elementary-community.slack.com/archives/C02CTC89LAX/p1689186975724739
Potential workaround
Override the macro default__get_delete_and_insert_queries by a version without a transaction.
{% macro default__get_delete_and_insert_queries(relation, insert_relation, delete_relation, delete_column_key) %}
{% set query %}
{% if delete_relation %}
delete from {{ relation }}
where
{{ delete_column_key }} is null
or {{ delete_column_key }} in (select {{ delete_column_key }} from {{ delete_relation }});
{% endif %}
{% if insert_relation %}
insert into {{ relation }} select * from {{ insert_relation }};
{% endif %}
{% endset %}
{% do return([query]) %}
{% endmacro %}
However I'm not sure of all the consequences of removing the transaction but since Spark doesn't have transactions, I assume mostly works but it's not "safe"?