Skip to content
This repository was archived by the owner on Sep 2, 2025. It is now read-only.
This repository was archived by the owner on Sep 2, 2025. It is now read-only.

[ADAP-781] [Feature] Add an option to use INFORMATION_SCHEMA for partition info retrieval #867

@github-christophe-oudar

Description

@github-christophe-oudar

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

So far there are few places where dbt-bigquery access partitions information:

  • In incremental materialization, _dbt_max_partition is declared (when used in a model) and stores the max partition
  • In get_partitions_metadata macro that returns the partitions information for input model
  • In incremental insert_overwrite materialization and before the MERGE statement with dynamic partitions to store them in dbt_partitions_for_replacement and use it in the predicate
  • In incremental insert_overwrite materialization and before the copy partitions statement with dynamic partitions to iterate on the partitions to replace

Allowing to use INFORMATION_SCHEMA data would reduce the amount of data scanned.
More details are available on https://cloud.google.com/bigquery/docs/information-schema-partitions.

As this table is still in Preview, it's likely safer to provide as an alternative to current method.

To access that new method, adding partition_information settings to partition_by config block so that without value the default is current behavior or with explicit value "model" or use information schema with value "information_schema" to use the new approach.

This issue is similar to #286 but intend to rely on existing macros and solutions as well replacing under the hood accesses to improve overall performances.

Describe alternatives you've considered

We could try to set the config at connection level (profiles.yml) but doing so would change the API for existing macros and wouldn't be as convenient as accessing model config block.

Who will this benefit?

Anyone using incremental strategy with fairly large partitions as the current approach can be costly and and slightly slow compared to the metadata approach.

Are you interested in contributing this feature?

yes

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions