Skip to content

Conversation

@Jeremy-Demlow
Copy link

Summary

  • Add hybrid_table materialization and supporting macros
  • Relation config + changeset; adapter describe method
  • Tests for basic, incremental, schema changes, constraints
  • Remove local-only docs from repo (PR will include usage/summary)

Resolves

Problem
Snowflake Hybrid Tables are not currently supported as a first-class materialization in dbt-snowflake. Users must hand-roll DDL/DML for CTAS, constraints, indexes, and incremental upserts (MERGE), which is error-prone and inconsistent across projects.

Solution
Implement a new hybrid_table materialization that:

  • Creates hybrid tables with CREATE HYBRID TABLE … AS SELECT (CTAS) using explicit schema (columns, PRIMARY KEY, optional UNIQUE/FOREIGN KEY, secondary indexes).
  • Supports incremental upsert via MERGE using the configured primary key.
  • Detects/handles schema changes with on_schema_change = fail | apply | continue (apply → full refresh).
  • Applies grants, persists docs, and integrates with existing dbt-snowflake patterns.

Implementation

  • Materialization and macros
    • src/dbt/include/snowflake/macros/materializations/hybrid_table.sql
    • src/dbt/include/snowflake/macros/relations/hybrid_table/create.sql
    • src/dbt/include/snowflake/macros/relations/hybrid_table/merge.sql
    • src/dbt/include/snowflake/macros/relations/hybrid_table/replace.sql
    • src/dbt/include/snowflake/macros/relations/hybrid_table/drop.sql
    • src/dbt/include/snowflake/macros/relations/hybrid_table/describe.sql
  • Relation config and adapter integration
    • src/dbt/adapters/snowflake/relation_configs/hybrid_table.py (config dataclass + changeset)
    • src/dbt/adapters/snowflake/impl.py (describe_hybrid_table)
    • src/dbt/adapters/snowflake/relation.py (register HybridTable, config changeset)
  • Behavior details and guardrails
    • CTAS projects columns in the configured order to prevent type mismatches in CTAS.
    • Information schema detection uses IS_HYBRID so relation type is correctly reported.
    • on_schema_change:
      • fail (default): raises when config/schema changes detected
      • apply: full-refresh path (drop + create)
      • continue: warn, skip applying change, proceed with MERGE
    • MERGE updates all non-PK columns by default or a configured subset via merge_update_columns.
    • Constraints supported at create-time: PRIMARY KEY (required), UNIQUE, FOREIGN KEY (enforced by Snowflake).
    • Secondary indexes supported (with optional INCLUDE columns).
    • Full refresh: dbt run --full-refresh.

Tests (15/15 passing)
Functional tests under tests/functional/relation_tests/hybrid_table_tests/:

  • Basic: creation, full refresh, composite PK, indexes, unique constraint
  • Incremental: MERGE-upsert updates existing rows and inserts new ones
  • Schema changes: fail/apply/continue behaviors
  • Constraints: PRIMARY KEY, UNIQUE, composite PK enforcement
    Notes:
  • Relation type detection updated to use IS_HYBRID
  • CTAS column order enforced
  • Acceptance criteria validated against Snowflake’s documented behavior for hybrid tables

Backward Compatibility

  • No breaking changes; new materialization opt-in via materialized: hybrid_table.
  • No changes to existing table/view/incremental materializations.

Performance

  • Initial build uses CTAS and benefits from Snowflake’s optimized bulk loading for empty tables.
  • Incremental uses MERGE; users may fine-tune via merge_update_columns.
  • Secondary indexes supported for access-path optimization.

Docs

  • Open a docs issue at https://github.com/dbt-labs/docs.getdbt.com/issues/new/choose to add:
    • New hybrid_table materialization reference for dbt-snowflake
    • Configuration examples (columns, primary_key, indexes, include columns, foreign_keys, on_schema_change, merge_update_columns)
    • Known limitations (constraints at create time; most schema/index changes require rebuild)

Checklist

  • I have read the contributing guide and understand what’s expected
  • I have run this code in development, and tests pass locally
  • This PR includes tests
  • This PR has no breaking interface changes

Additional Notes

  • Hybrid table creation uses DROP TABLE for replacement (Snowflake doesn’t use DROP HYBRID TABLE).
  • Constraint error assertions in tests accept Snowflake’s hybrid error variants (e.g., “A primary key already exists.”).
  • Limitations per Snowflake docs: constraints defined at create time; many schema/index changes require rebuild.

Branch/PR

References

- Add hybrid_table materialization and supporting macros
- Relation config + changeset; adapter describe method
- Tests for basic, incremental, schema changes, constraints
- Remove local-only docs from repo (PR will include usage/summary)
@Jeremy-Demlow Jeremy-Demlow requested a review from a team as a code owner October 16, 2025 21:09
@cla-bot
Copy link

cla-bot bot commented Oct 16, 2025

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @Jeremy-Demlow

@Jeremy-Demlow
Copy link
Author

I have signed it

@Jeremy-Demlow
Copy link
Author

Hybrid Tables in dbt-snowflake

Overview

Hybrid tables in Snowflake combine the benefits of transactional tables with the performance of analytical tables. They support:

  • Low-latency queries with row-based storage
  • ACID transactions with enforced constraints
  • Primary and secondary indexes for fast lookups
  • UPSERT patterns via incremental MERGE operations

This implementation follows the same patterns as dynamic tables in dbt-snowflake.

Quick Start

Basic Hybrid Table

-- models/my_hybrid_table.sql
{{ config(
    materialized='hybrid_table',
    columns={
        'user_id': 'INTEGER',
        'username': 'VARCHAR(100)',
        'email': 'VARCHAR(255)',
        'created_at': 'TIMESTAMP_NTZ'
    },
    primary_key='user_id'
) }}

select * from {{ ref('source_users') }}

Configuration Options

Required Configurations

  • materialized='hybrid_table': Specifies the materialization type
  • columns: Dictionary mapping column names to Snowflake data types
  • primary_key: Column(s) forming the primary key (can be string or list)

Optional Configurations

  • indexes: List of secondary index definitions
  • unique_key: Column(s) with UNIQUE constraint
  • foreign_keys: List of foreign key constraint definitions
  • on_schema_change: How to handle schema changes ('fail', 'apply', 'continue')
  • merge_update_columns: Specific columns to update during MERGE

Examples

Composite Primary Key

{{ config(
    materialized='hybrid_table',
    columns={
        'stream_id': 'VARCHAR(100)',
        'ad_campaign_id': 'VARCHAR(100)',
        'impressions': 'INTEGER',
        'watch_time': 'FLOAT'
    },
    primary_key=['stream_id', 'ad_campaign_id']
) }}

select * from {{ ref('aggregated_events') }}

With Secondary Indexes

{{ config(
    materialized='hybrid_table',
    columns={
        'order_id': 'INTEGER',
        'customer_id': 'INTEGER',
        'product_id': 'INTEGER',
        'order_date': 'DATE',
        'amount': 'DECIMAL(10,2)'
    },
    primary_key='order_id',
    indexes=[
        {'columns': ['customer_id']},
        {'columns': ['product_id']},
        {'name': 'idx_order_date', 'columns': ['order_date']}
    ]
) }}

select * from {{ ref('orders') }}

With INCLUDE Columns

Secondary indexes can include additional columns for covering index optimization:

{{ config(
    materialized='hybrid_table',
    columns={
        'sensor_id': 'INTEGER',
        'timestamp': 'TIMESTAMP_NTZ',
        'temperature': 'DECIMAL(6,4)',
        'pressure': 'DECIMAL(6,4)'
    },
    primary_key='sensor_id',
    indexes=[
        {
            'name': 'idx_timestamp_covering',
            'columns': ['timestamp'],
            'include': ['temperature', 'pressure']
        }
    ]
) }}

select * from {{ ref('sensor_readings') }}

With Constraints

{{ config(
    materialized='hybrid_table',
    columns={
        'user_id': 'INTEGER',
        'email': 'VARCHAR(255)',
        'account_id': 'INTEGER',
        'status': 'VARCHAR(20)'
    },
    primary_key='user_id',
    unique_key='email',
    foreign_keys=[
        {
            'columns': ['account_id'],
            'parent_table': 'accounts',
            'parent_columns': ['account_id']
        }
    ]
) }}

select * from {{ ref('users') }}

Incremental Behavior

Hybrid tables support incremental updates using MERGE:

  1. First run: Creates table using CTAS with optimized bulk loading
  2. Subsequent runs: Uses MERGE to UPDATE existing rows and INSERT new ones

The MERGE uses the primary_key to match rows.

Custom Merge Columns

By default, all non-primary-key columns are updated. You can specify which columns to update:

{{ config(
    materialized='hybrid_table',
    columns={
        'id': 'INTEGER',
        'value': 'INTEGER',
        'updated_at': 'TIMESTAMP_NTZ',
        'created_at': 'TIMESTAMP_NTZ'
    },
    primary_key='id',
    merge_update_columns=['value', 'updated_at']  -- Don't update created_at
) }}

select * from {{ ref('source') }}

Schema Change Handling

Hybrid tables have limited ALTER support. Use on_schema_change to control behavior:

fail (default)

{{ config(
    materialized='hybrid_table',
    on_schema_change='fail',
    -- ... other config
) }}

Raises an error if schema changes are detected. This is the safest option.

apply

{{ config(
    materialized='hybrid_table',
    on_schema_change='apply',
    -- ... other config
) }}

Performs a full refresh (DROP + CREATE) when schema changes are detected.

continue

{{ config(
    materialized='hybrid_table',
    on_schema_change='continue',
    -- ... other config
) }}

Logs a warning but continues with incremental MERGE, ignoring schema changes.

Full Refresh

Force a full refresh using the --full-refresh flag:

dbt run --select my_hybrid_table --full-refresh

This will DROP and recreate the table.

Performance Considerations

  1. Bulk Loading: Initial CTAS uses Snowflake's optimized bulk loading (up to 10x faster)
  2. MERGE Performance: MERGE operations may be slower than bulk loads. Consider batch sizes.
  3. Index Strategy: Add indexes on columns used in WHERE clauses and JOIN conditions
  4. INCLUDE Columns: Use INCLUDE for covering indexes to avoid table lookups

Limitations

Per Snowflake's hybrid table limitations:

  • Primary key is required
  • Constraints are enforced (unlike standard Snowflake tables)
  • Limited ALTER support (most changes require full refresh)
  • Cannot be shared across accounts
  • Some Snowflake features not supported (see Snowflake docs)

Resources

@cla-bot
Copy link

cla-bot bot commented Oct 16, 2025

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @Jeremy-Demlow

@Jeremy-Demlow Jeremy-Demlow changed the title feat(snowflake): add hybrid table materialization [Feature]: snowflake add hybrid table materialization Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant