-
Notifications
You must be signed in to change notification settings - Fork 203
Description
I have the same issue than this one and it's blocking our Elementary deployment.
We use one run per model in our scheduler and sometimes two models end at the same time and we get the queries cancelled.
Describe the bug The concept of transaction for
get_delete_and_insert_querieswas introduced in this commit.However as visible in the documentation:
If a transaction mutates (update or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently. Conflicting transactions are cancelled. Conflicting DML statements that run outside of a transaction are queued to run later, subject to queuing limits.
Therefore if we want to have the
deletepart we either have:
- have an outside system to limit concurrency so that only one dbt run can concurrently with that transaction
- remove the transaction
To Reproduce Steps to reproduce the behavior:
- set up a dbt project using Elementary
- Create a simple model
- Run concurrently twice the model and hope that they finish around the same time
- See error generated by the concurrent transaction on
dbt_modelsthat would look like:/* {"app": "dbt", "dbt_version": "1.6.0", "profile_name": "dbt", "target_name": "prod", "connection_name": "master"} */ begin transaction; delete from `gcp_project`.`elementary`.`dbt_models` where metadata_hash is null or metadata_hash in (select metadata_hash from `gcp_project`.`elementary`.`dbt_models__tmp_table_XXXX`); insert into `gcp_project`.`elementary`.`dbt_models` select * from `gcp_project`.`elementary`.`dbt_models__tmp_table_YYY`; commit;Expected behavior I would expect to be able to run concurrently 2 dbt run commands without them failing when they happen to start the Elementary data post hook executing
get_delete_and_insert_queriesmacro query concurrently.Environment (please complete the following information):
- edr Version: 0.9.2
- dbt package Version: 0.9.0
Additional context Slack thread regarding other people affected: https://elementary-community.slack.com/archives/C02CTC89LAX/p1689186975724739
Potential workaround Override the macro
default__get_delete_and_insert_queriesby a version without a transaction.{% macro default__get_delete_and_insert_queries(relation, insert_relation, delete_relation, delete_column_key) %} {% set query %} {% if delete_relation %} delete from {{ relation }} where {{ delete_column_key }} is null or {{ delete_column_key }} in (select {{ delete_column_key }} from {{ delete_relation }}); {% endif %} {% if insert_relation %} insert into {{ relation }} select * from {{ insert_relation }}; {% endif %} {% endset %} {% do return([query]) %} {% endmacro %}However I'm not sure of all the consequences of removing the transaction but since Spark doesn't have transactions, I assume mostly works but it's not "safe"?