Skip to content

Conversation

@Kayrnt
Copy link
Contributor

@Kayrnt Kayrnt commented Oct 12, 2025

resolves #599

Problem

Previously, dbt-bigquery did not support schema changes for STRUCT (nested) fields in incremental models. When new fields were added to STRUCT columns or existing fields were removed, the table schema was not updated before incremental materialization operations (like MERGE), leading to potential data inconsistencies or failures.

Solution

This PR implements STRUCT field synchronization for BigQuery incremental models by:

  • Fixing the macro dispatch mechanism in the global on_schema_change.sql to properly use adapter.dispatch for adapter-specific overrides
  • Adding the @available decorator to the sync_struct_columns method in impl.py to expose it to Jinja templates
  • Implementing logic to detect and add missing STRUCT fields using BigQuery's schema update API before MERGE operations
  • Adding graceful error handling for BigQuery's limitation that prevents removing fields from STRUCT columns (documented as a known limitation)
  • Updating the BigQuery-specific on_schema_change.sql macro to call sync_struct_columns during schema change processing

The implementation supports append_new_columns mode fully, while sync_all_columns mode is skipped with appropriate warnings due to BigQuery API constraints (because of the unsupported nested column removal).
Functional tests have been added to cover the functionality.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change in the dbt-adapters package. For details on how to document a change, see the Contributing Guide.

{% else %}

{% set schema_changes_dict = check_for_schema_changes(source_relation, target_relation) %}
{% set schema_changes_dict = adapter.dispatch('check_for_schema_changes', 'dbt')(source_relation, target_relation) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's update check_for_schema_changes to follow our standard pattern where it contains the dispatch logic + a default macro

E         In dispatch: No macro named 'check_for_schema_changes' found within namespace: 'dbt'
E             Searched for: 'test.redshift__check_for_schema_changes', 'test.postgres__check_for_schema_changes', 'test.default__check_for_schema_changes', 'dbt.redshift__check_for_schema_changes', 'dbt.postgres__check_for_schema_changes', 'dbt.default__check_for_schema_changes'

Copy link
Contributor Author

@Kayrnt Kayrnt Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups! I didn't check other adapters 😱
Will do 👍

if drop_candidates:
relation_name = relation.render()
for column in drop_candidates:
drop_sql = f"ALTER TABLE {relation_name} DROP COLUMN {self.quote(column.name)}"
Copy link
Contributor

@colin-rogers-dbt colin-rogers-dbt Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should we do this via a macro? (not really sure)
  2. Can we group these drops into a single call? i.e. can we do this:
    ALTER TABLE `project.dataset.table_name` DROP COLUMN column_name_1, DROP COLUMN column_name_2, DROP COLUMN column_name_3;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Well that's a design decision, where else would you do it?
  2. Right, that's also possible to do it according to https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#alter_table_drop_column_statement

logger.debug(
'Dropping column `{}` from table "{}".'.format(column.name, relation_name)
)
client.query(drop_sql).result()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of calling client.query we should probably use the execute method so we use a consistent retry / exception handling logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:approve-public-fork-ci cla:yes The PR author has signed the CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CT-1710] [CT-1703] [Feature] on_schema_change should handle non-top-level schema changes

2 participants