Skip to content

[Feature] support copy multiple tables in parallel using copy_partitions #559

@Klimmy

Description

@Klimmy

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Python BigQuery Client supports asynchronous copy jobs while the dbt-bigquery adapter sends BigQuery requests one by one (using incremental_strategy = 'insert_overwrite' with copy_partitions=true).

We can achieve a better performance if we start sending requests in small batches of partitions.

dbt-bigquery already supports parallel execution in the copy_bq_table function.
But in the bq_copy_partitions macro partitions are sent one at a time.

We can probably implement this feature by introducing a batch_size argument to the configs:

{{ config(
    materialized = 'incremental',
    incremental_strategy = 'insert_overwrite',
    partition_by = {
      "field": "day",
      "data_type": "date",
      "copy_partitions": true,
      "batch_size": 5
    }
) }}

Default value will be 1. And bq_copy_partitions macro will send a list of partitions to the copy_bq_table, where the size of list = batch_size.

Describe alternatives you've considered

No response

Who will this benefit?

Anyone who has high amount of heavy BigQuery partitions.

Are you interested in contributing this feature?

Definitely, just need a green light to proceed

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions