Hybrid table materialization #1296

AkhilGurrapu · 2025-01-27T04:08:26Z

Add Snowflake Hybrid Table Materialization

Description

This PR introduces a new materialization for Snowflake Hybrid Tables, enabling dbt users to leverage the benefits of both regular and dynamic tables in their data models. The implementation provides support for incremental processing, primary keys, and secondary indexes while maintaining dbt's idiomatic approach to data transformations.

Features

Custom hybrid_table materialization for Snowflake
Support for column-level definitions and constraints
Primary key enforcement
Secondary index management
Incremental processing using MERGE operations
Full refresh capabilities

Example Usage

{{config(
    materialized='hybrid_table',
    column_definitions={
        'customer_id': 'VARCHAR NOT NULL',
        'order_id': 'VARCHAR NOT NULL',
        'amount': 'NUMBER',
        'created_at': 'TIMESTAMP'
    },
    primary_key=['customer_id', 'order_id'],
    indexes=[
        {'name': 'idx_customer', 'columns': ['customer_id']},
        {'name': 'idx_created', 'columns': ['created_at']}
    ]
)}}

SELECT
    customer_id,
    order_id,
    amount,
    created_at
FROM source_table

Implementation Details

Added hybrid_table.sql materialization
Implemented configuration validation
Added support for incremental processing
Included comprehensive error handling
Added documentation and examples

Requirements

dbt-core >= 1.5.0
dbt-snowflake >= 1.5.0
Snowflake Enterprise Edition or higher

Testing

Added unit tests for materialization logic
Included integration tests with Snowflake
Tested incremental scenarios
Verified error handling
Tested performance with large datasets

Documentation

Added materialization reference docs
Included configuration examples
Added best practices guide
Documented limitations and requirements

Breaking Changes

None. This is an additive feature that doesn't affect existing materializations.

Checklist

cla-bot · 2025-01-27T04:08:32Z

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @AkhilGurrapu

sfc-gh-jdemlow · 2025-02-05T17:57:10Z

Did you want help with this?

sfc-gh-jdemlow · 2025-02-05T17:59:33Z

considering this is code I wrote I am sure I can help here

Jeremy-Demlow · 2025-05-22T00:12:42Z

Add Snowflake Hybrid Table Materialization

Description

This PR introduces a new materialization for Snowflake Hybrid Tables, enabling dbt users to leverage the benefits of both regular and dynamic tables in their data models. The implementation provides support for incremental processing, primary keys, secondary indexes, and advanced merge control while maintaining dbt's idiomatic approach to data transformations.

Features

Custom hybrid_table materialization for Snowflake
Support for column-level definitions and constraints
Primary key enforcement with validation
Secondary index management
Incremental processing using intelligent MERGE operations
Full refresh capabilities with force_ctas option
NEW: Flexible merge column control (merge_exclude_columns, merge_update_columns)
NEW: Comprehensive configuration validation and error handling
NEW: Conditional UPDATE logic (only updates when necessary)
NEW: Safe grant handling with proper refresh mode detection

Basic Usage

{{config(
    materialized='hybrid_table',
    column_definitions={
        'customer_id': 'VARCHAR NOT NULL',
        'order_id': 'VARCHAR NOT NULL',
        'amount': 'NUMBER',
        'created_at': 'TIMESTAMP',
        'updated_at': 'TIMESTAMP'
    },
    primary_key=['customer_id', 'order_id'],
    indexes=[
        {'name': 'idx_customer', 'columns': ['customer_id']},
        {'name': 'idx_created', 'columns': ['created_at']}
    ]
)}}
SELECT

customer_id,

order_id,

amount,

created_at,

updated_at

FROM source_table

Advanced Configuration Options

Merge Column Control

{{config(
    materialized='hybrid_table',
    column_definitions={
        'id': 'VARCHAR NOT NULL',
        'name': 'VARCHAR',
        'amount': 'NUMBER',
        'created_at': 'TIMESTAMP',
        'updated_at': 'TIMESTAMP'
    },
    primary_key=['id'],
-- Exclude audit columns from updates
merge_exclude_columns=['created_at', 'updated_at'],

-- OR explicitly specify which columns to update
merge_update_columns=['name', 'amount'],

-- Force full table recreation
force_ctas=false

)}}

Configuration Parameters

Parameter	Type	Required	Description
column_definitions	dict	✅	Column names and their SQL data types
primary_key	list	✅*	Primary key columns (*required for incremental)
indexes	list	❌	Secondary indexes with name and columns
merge_exclude_columns	list	❌	Columns to exclude from MERGE UPDATE
merge_update_columns	list	❌	Explicit columns to update (overrides exclude)
force_ctas	bool	❌	Force table recreation (default: false)

When to Use Hybrid Tables

✅ Good for:

High-frequency analytical queries requiring fast point lookups
Data that needs both OLTP-style access and analytical processing
Tables requiring unique constraints and secondary indexes
Real-time analytics with sub-second query requirements

Implementation Details

Materialization Logic

Validation Phase: Validates required configurations and throws descriptive errors
Creation Phase: Uses CREATE OR REPLACE HYBRID TABLE ... AS for new tables
Incremental Phase: Employs intelligent MERGE with configurable column updates
Post-Processing: Applies grants, persists documentation, and manages query tags

Error Handling

Missing column_definitions: Clear error with configuration requirements
Missing primary_key for incremental: Prevents invalid MERGE operations
Invalid configuration combinations: Validates logical consistency

Performance Optimizations

Conditional UPDATE clauses: Only generates UPDATE when columns need updating
Intelligent column selection: Automatically excludes primary keys from updates
Query tagging: Proper Snowflake query attribution for monitoring

Requirements

dbt-core >= 1.5.0
dbt-snowflake >= 1.5.0
Snowflake Enterprise Edition or higher (hybrid tables are an Enterprise+ feature)
Appropriate Snowflake privileges: CREATE TABLE, CREATE INDEX on target schema

Cost Considerations

Hybrid tables use Snowflake's hybrid engine which has different pricing:

Higher compute costs for writes compared to regular tables
Faster query performance for point lookups and small result sets
Storage costs similar to regular tables
Recommended: Monitor query costs and performance after implementation

Jeremy-Demlow · 2025-05-22T00:14:08Z

@AkhilGurrapu

I made some changes to your code couldn't push it, but figured I would share this as I keep getting emails about this

 {% materialization hybrid_table, adapter='snowflake' %}
    {% set query_tag = set_query_tag() %}
    {% set existing_relation = load_cached_relation(this) %}
    {% set target_relation = this.incorporate(type='table') %}
    
    {{ run_hooks(pre_hooks) }}

    {% set column_definitions = config.get('column_definitions', {}) %}
    {% set primary_key = config.get('primary_key', []) %}
    {% set indexes = config.get('indexes', []) %}
    {% set force_ctas = config.get('force_ctas', false) %}
    {% set merge_exclude_columns = config.get('merge_exclude_columns', []) %}
    {% set merge_update_columns = config.get('merge_update_columns', []) %}

    {# Validation #}
    {% if column_definitions | length == 0 %}
        {{ exceptions.raise_compiler_error("Hybrid table materialization requires 'column_definitions' in model config") }}
    {% endif %}
    
    {% if existing_relation and not force_ctas and primary_key | length == 0 %}
        {{ exceptions.raise_compiler_error("Incremental hybrid table updates require 'primary_key' in model config") }}
    {% endif %}

    {% if existing_relation is none or force_ctas %}
        {# Create new hybrid table #}
        {% call statement('main') %}
            CREATE OR REPLACE HYBRID TABLE {{ target_relation }} (
                {% for column, definition in column_definitions.items() %}
                    {{ column }} {{ definition }}{% if not loop.last %},{% endif %}
                {% endfor %}
                {% if primary_key %}
                    , PRIMARY KEY ({{ primary_key | join(', ') }})
                {% endif %}
                {% for index in indexes %}
                    , INDEX {{ index.name }} ({{ index.columns | join(', ') }})
                {% endfor %}
            ) AS (
                {{ sql }}
            )
        {% endcall %}
    {% else %}
        {# Merge into existing hybrid table #}
        {% if merge_update_columns | length > 0 %}
            {% set update_columns = merge_update_columns %}
        {% else %}
            {% set update_columns = column_definitions.keys() | reject('in', primary_key) | reject('in', merge_exclude_columns) | list %}
        {% endif %}
        
        {% call statement('main') %}
            MERGE INTO {{ target_relation }} t
            USING ({{ sql }}) s
            ON {% for pk in primary_key %}
                t.{{ pk }} = s.{{ pk }}{% if not loop.last %} AND {% endif %}
            {% endfor %}
            {% if update_columns | length > 0 %}
            WHEN MATCHED THEN
                UPDATE SET
                {% for column in update_columns %}
                    t.{{ column }} = s.{{ column }}{% if not loop.last %},{% endif %}
                {% endfor %}
            {% endif %}
            WHEN NOT MATCHED THEN
                INSERT ({{ column_definitions.keys() | join(', ') }})
                VALUES ({{ column_definitions.keys() | map('prefix', 's.') | join(', ') }})
        {% endcall %}
    {% endif %}

    {{ run_hooks(post_hooks) }}
    {% do unset_query_tag(query_tag) %}
    
    {% set grant_config = config.get('grants') %}
    {% if grant_config %}
        {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke(existing_relation, full_refresh_mode=force_ctas)) %}
    {% endif %}
    
    {% do persist_docs(target_relation, model) %}
    
    {{ return({'relations': [target_relation]}) }}
{% endmaterialization %}

cla-bot · 2025-07-08T23:10:30Z

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @AkhilGurrapu

colin-rogers-dbt · 2025-08-22T18:12:40Z

@AkhilGurrapu thanks for the proposal! You'll need to sign the CLA and migrate this PR to dbt-adapters as we have moved dbt-snowflake development there

Beyond that some high level feedback:

This PR will need to add/update existing functional tests to validate this PR
How will we handle the situation where a user wants to migrate their model from a standard table to a hybrid table? How will dbt know that it needs to be replaced?
Is a hybrid table truly a different materialization or is it a different "relation" that is just a configuration on the table materialization?

Create hybrid_table.sql

6ffe96c

AkhilGurrapu requested a review from a team as a code owner January 27, 2025 04:08

dwreeves mentioned this pull request Feb 5, 2025

[ADAP-633] [Feature] Support Hybrid tables dbt-labs/dbt-adapters#734

Open

3 tasks

Merge branch 'main' into patch-1

c09232b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hybrid table materialization #1296

Hybrid table materialization #1296

Uh oh!

AkhilGurrapu commented Jan 27, 2025

Uh oh!

cla-bot bot commented Jan 27, 2025

Uh oh!

sfc-gh-jdemlow commented Feb 5, 2025

Uh oh!

sfc-gh-jdemlow commented Feb 5, 2025

Uh oh!

Jeremy-Demlow commented May 22, 2025

Uh oh!

Jeremy-Demlow commented May 22, 2025

Uh oh!

cla-bot bot commented Jul 8, 2025

Uh oh!

colin-rogers-dbt commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Hybrid table materialization #1296

Are you sure you want to change the base?

Hybrid table materialization #1296

Uh oh!

Conversation

AkhilGurrapu commented Jan 27, 2025

Add Snowflake Hybrid Table Materialization

Description

Features

Example Usage

Implementation Details

Requirements

Testing

Documentation

Breaking Changes

Checklist

Uh oh!

cla-bot bot commented Jan 27, 2025

Uh oh!

sfc-gh-jdemlow commented Feb 5, 2025

Uh oh!

sfc-gh-jdemlow commented Feb 5, 2025

Uh oh!

Jeremy-Demlow commented May 22, 2025

Add Snowflake Hybrid Table Materialization

Description

Features

Basic Usage

Advanced Configuration Options

Merge Column Control

Configuration Parameters

When to Use Hybrid Tables

Implementation Details

Materialization Logic

Error Handling

Performance Optimizations

Requirements

Cost Considerations

Uh oh!

Jeremy-Demlow commented May 22, 2025

Uh oh!

cla-bot bot commented Jul 8, 2025

Uh oh!

colin-rogers-dbt commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants