Skip to content

[Feature] Allow changing the destination schema for the __dbt_tmp tables. #519

@borjavb

Description

@borjavb

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

make_temp_relation will always use the schema of the target table to create the __dbt_tmp table. Currently there's no way to change this.

To keep production datasets clean, it would be ideal to have the option to specify a different dataset for temporary tables.

Additionally, in BigQuery, storage can be billed based on physical or logical bytes, but this is only defined at dataset level. Depending on the nature of the table, it might be more beneficial to set the dataset at physical or logical bytes. Sometimes, it might be beneficial to have the __dbt_tmp on a logical dataset while keeping the target dataset on a physicial billing. to achieve this we require having two different datasets.

Without touching make_temp_relation and going into dbt-adapters (alternative solution), one option could be something like the following, where we extend incremental.sql with an extra config variable:

{%- set temp_schema = config.get('temp_schema') -%}
{%- if temp_schema is not none-%}
    {%- set temp_relation = this.incorporate(path={
      "schema": temp_schema
      }) -%}
      {%- do create_schema(temp_relation) -%}
{% endif %}
{%- set tmp_relation = make_temp_relation(temp_relation) %}

Describe alternatives you've considered

We could move this change to the dpt-adapters instead, extending the make_temp_relation with an extra parameter temp_schema. Something like:

{% macro make_temp_relation(base_relation, suffix='__dbt_tmp', temp_schema=none) %}
  {%- set temp_identifier = base_relation.identifier ~ suffix -%}

  {%- set temp_relation = base_relation.incorporate(
                                path={"identifier": temp_identifier}) -%}

  {%- if temp_schema is not none-%}
    {%- set temp_relation = temp_relation.incorporate(path={
      "schema": temp_schema
      }) -%}
      {%- do create_schema(temp_relation) -%}
  {%- elif temp_schema_suffix is not none-%}
    {%- set temp_schema = base_relation.schema ~ suffix -%}
    {%- set temp_relation = temp_relation.incorporate(path={
      "schema": temp_schema
      }) -%}
      {%- do create_schema(temp_relation) -%}
  {% endif %}

  {{ return(temp_relation) }}
{% endmacro %}

Who will this benefit?

Users in bigquery who want to optimise for storage costs and clean datasets.

Are you interested in contributing this feature?

Yes

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions