Skip to content

[Feature] Optimize incremental 'insert_overwrite' strategy #527

@AxelThevenot

Description

@AxelThevenot

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

The MERGE statement is sub-optimized in BigQuery.
This is of course ok for unique keys as this is what we are looking for.

But for the insert_overwrite strategy where we are looking to rows at the partition-level there is a better solution and here is why:

  • a DELETE or INSERT statement is cheapest than a MERGE statement.
  • incremental tables are the most expensive tables in real-world projects.
  • The DELETE statement in BigQuery is free at the partition-level.

This has been tested at Carrefour which is my company.

  • On this replacement of the MERGE statement it reduces the cost by 50.4% and the elapsed time by 35.2%
  • On the overall procedure it reduces the cost by 26.1% and the elapsed time by 23.1%

This is wrapped in a transaction to avoid deleting rows if any error occurs.

Describe alternatives you've considered

I have considered an alternative to create an additional delete+insert incremental strategy.
But overriding the existing insert_overwrite is the more convenient as this is exactly what we are looking for and the features are the same.

Who will this benefit?

Of course, everyone without any changes.

Are you interested in contributing this feature?

Yes

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions