Skip to content
This repository was archived by the owner on Sep 2, 2025. It is now read-only.

Conversation

@AkhilGurrapu
Copy link

Add Snowflake Hybrid Table Materialization

Description

This PR introduces a new materialization for Snowflake Hybrid Tables, enabling dbt users to leverage the benefits of both regular and dynamic tables in their data models. The implementation provides support for incremental processing, primary keys, and secondary indexes while maintaining dbt's idiomatic approach to data transformations.

Features

  • Custom hybrid_table materialization for Snowflake
  • Support for column-level definitions and constraints
  • Primary key enforcement
  • Secondary index management
  • Incremental processing using MERGE operations
  • Full refresh capabilities

Example Usage

{{config(
    materialized='hybrid_table',
    column_definitions={
        'customer_id': 'VARCHAR NOT NULL',
        'order_id': 'VARCHAR NOT NULL',
        'amount': 'NUMBER',
        'created_at': 'TIMESTAMP'
    },
    primary_key=['customer_id', 'order_id'],
    indexes=[
        {'name': 'idx_customer', 'columns': ['customer_id']},
        {'name': 'idx_created', 'columns': ['created_at']}
    ]
)}}

SELECT
    customer_id,
    order_id,
    amount,
    created_at
FROM source_table

Implementation Details

  • Added hybrid_table.sql materialization
  • Implemented configuration validation
  • Added support for incremental processing
  • Included comprehensive error handling
  • Added documentation and examples

Requirements

  • dbt-core >= 1.5.0
  • dbt-snowflake >= 1.5.0
  • Snowflake Enterprise Edition or higher

Testing

  • Added unit tests for materialization logic
  • Included integration tests with Snowflake
  • Tested incremental scenarios
  • Verified error handling
  • Tested performance with large datasets

Documentation

  • Added materialization reference docs
  • Included configuration examples
  • Added best practices guide
  • Documented limitations and requirements

Breaking Changes

None. This is an additive feature that doesn't affect existing materializations.

Checklist

  • Added new materialization file
  • Implemented configuration validation
  • Added comprehensive tests
  • Updated documentation
  • Tested with various scenarios
  • Followed dbt coding standards
  • Added error handling
  • Verified Snowflake compatibility

@AkhilGurrapu AkhilGurrapu requested a review from a team as a code owner January 27, 2025 04:08
@cla-bot
Copy link

cla-bot bot commented Jan 27, 2025

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @AkhilGurrapu

@sfc-gh-jdemlow
Copy link

Did you want help with this?

@sfc-gh-jdemlow
Copy link

considering this is code I wrote I am sure I can help here

@Jeremy-Demlow
Copy link

Add Snowflake Hybrid Table Materialization

Description

This PR introduces a new materialization for Snowflake Hybrid Tables, enabling dbt users to leverage the benefits of both regular and dynamic tables in their data models. The implementation provides support for incremental processing, primary keys, secondary indexes, and advanced merge control while maintaining dbt's idiomatic approach to data transformations.

Features

  • Custom hybrid_table materialization for Snowflake
  • Support for column-level definitions and constraints
  • Primary key enforcement with validation
  • Secondary index management
  • Incremental processing using intelligent MERGE operations
  • Full refresh capabilities with force_ctas option
  • NEW: Flexible merge column control (merge_exclude_columns, merge_update_columns)
  • NEW: Comprehensive configuration validation and error handling
  • NEW: Conditional UPDATE logic (only updates when necessary)
  • NEW: Safe grant handling with proper refresh mode detection

Basic Usage

{{config(
    materialized='hybrid_table',
    column_definitions={
        'customer_id': 'VARCHAR NOT NULL',
        'order_id': 'VARCHAR NOT NULL',
        'amount': 'NUMBER',
        'created_at': 'TIMESTAMP',
        'updated_at': 'TIMESTAMP'
    },
    primary_key=['customer_id', 'order_id'],
    indexes=[
        {'name': 'idx_customer', 'columns': ['customer_id']},
        {'name': 'idx_created', 'columns': ['created_at']}
    ]
)}}

SELECT
customer_id,
order_id,
amount,
created_at,
updated_at
FROM source_table

Advanced Configuration Options

Merge Column Control

{{config(
    materialized='hybrid_table',
    column_definitions={
        'id': 'VARCHAR NOT NULL',
        'name': 'VARCHAR',
        'amount': 'NUMBER',
        'created_at': 'TIMESTAMP',
        'updated_at': 'TIMESTAMP'
    },
    primary_key=['id'],
-- Exclude audit columns from updates
merge_exclude_columns=['created_at', 'updated_at'],

-- OR explicitly specify which columns to update
merge_update_columns=['name', 'amount'],

-- Force full table recreation
force_ctas=false

)}}

Configuration Parameters

Parameter Type Required Description
column_definitions dict Column names and their SQL data types
primary_key list ✅* Primary key columns (*required for incremental)
indexes list Secondary indexes with name and columns
merge_exclude_columns list Columns to exclude from MERGE UPDATE
merge_update_columns list Explicit columns to update (overrides exclude)
force_ctas bool Force table recreation (default: false)

When to Use Hybrid Tables

✅ Good for:

  • High-frequency analytical queries requiring fast point lookups
  • Data that needs both OLTP-style access and analytical processing
  • Tables requiring unique constraints and secondary indexes
  • Real-time analytics with sub-second query requirements

Implementation Details

Materialization Logic

  1. Validation Phase: Validates required configurations and throws descriptive errors
  2. Creation Phase: Uses CREATE OR REPLACE HYBRID TABLE ... AS for new tables
  3. Incremental Phase: Employs intelligent MERGE with configurable column updates
  4. Post-Processing: Applies grants, persists documentation, and manages query tags

Error Handling

  • Missing column_definitions: Clear error with configuration requirements
  • Missing primary_key for incremental: Prevents invalid MERGE operations
  • Invalid configuration combinations: Validates logical consistency

Performance Optimizations

  • Conditional UPDATE clauses: Only generates UPDATE when columns need updating
  • Intelligent column selection: Automatically excludes primary keys from updates
  • Query tagging: Proper Snowflake query attribution for monitoring

Requirements

  • dbt-core >= 1.5.0
  • dbt-snowflake >= 1.5.0
  • Snowflake Enterprise Edition or higher (hybrid tables are an Enterprise+ feature)
  • Appropriate Snowflake privileges: CREATE TABLE, CREATE INDEX on target schema

Cost Considerations

Hybrid tables use Snowflake's hybrid engine which has different pricing:

  • Higher compute costs for writes compared to regular tables
  • Faster query performance for point lookups and small result sets
  • Storage costs similar to regular tables
  • Recommended: Monitor query costs and performance after implementation

@Jeremy-Demlow
Copy link

@AkhilGurrapu

I made some changes to your code couldn't push it, but figured I would share this as I keep getting emails about this

 {% materialization hybrid_table, adapter='snowflake' %}
    {% set query_tag = set_query_tag() %}
    {% set existing_relation = load_cached_relation(this) %}
    {% set target_relation = this.incorporate(type='table') %}
    
    {{ run_hooks(pre_hooks) }}

    {% set column_definitions = config.get('column_definitions', {}) %}
    {% set primary_key = config.get('primary_key', []) %}
    {% set indexes = config.get('indexes', []) %}
    {% set force_ctas = config.get('force_ctas', false) %}
    {% set merge_exclude_columns = config.get('merge_exclude_columns', []) %}
    {% set merge_update_columns = config.get('merge_update_columns', []) %}

    {# Validation #}
    {% if column_definitions | length == 0 %}
        {{ exceptions.raise_compiler_error("Hybrid table materialization requires 'column_definitions' in model config") }}
    {% endif %}
    
    {% if existing_relation and not force_ctas and primary_key | length == 0 %}
        {{ exceptions.raise_compiler_error("Incremental hybrid table updates require 'primary_key' in model config") }}
    {% endif %}

    {% if existing_relation is none or force_ctas %}
        {# Create new hybrid table #}
        {% call statement('main') %}
            CREATE OR REPLACE HYBRID TABLE {{ target_relation }} (
                {% for column, definition in column_definitions.items() %}
                    {{ column }} {{ definition }}{% if not loop.last %},{% endif %}
                {% endfor %}
                {% if primary_key %}
                    , PRIMARY KEY ({{ primary_key | join(', ') }})
                {% endif %}
                {% for index in indexes %}
                    , INDEX {{ index.name }} ({{ index.columns | join(', ') }})
                {% endfor %}
            ) AS (
                {{ sql }}
            )
        {% endcall %}
    {% else %}
        {# Merge into existing hybrid table #}
        {% if merge_update_columns | length > 0 %}
            {% set update_columns = merge_update_columns %}
        {% else %}
            {% set update_columns = column_definitions.keys() | reject('in', primary_key) | reject('in', merge_exclude_columns) | list %}
        {% endif %}
        
        {% call statement('main') %}
            MERGE INTO {{ target_relation }} t
            USING ({{ sql }}) s
            ON {% for pk in primary_key %}
                t.{{ pk }} = s.{{ pk }}{% if not loop.last %} AND {% endif %}
            {% endfor %}
            {% if update_columns | length > 0 %}
            WHEN MATCHED THEN
                UPDATE SET
                {% for column in update_columns %}
                    t.{{ column }} = s.{{ column }}{% if not loop.last %},{% endif %}
                {% endfor %}
            {% endif %}
            WHEN NOT MATCHED THEN
                INSERT ({{ column_definitions.keys() | join(', ') }})
                VALUES ({{ column_definitions.keys() | map('prefix', 's.') | join(', ') }})
        {% endcall %}
    {% endif %}

    {{ run_hooks(post_hooks) }}
    {% do unset_query_tag(query_tag) %}
    
    {% set grant_config = config.get('grants') %}
    {% if grant_config %}
        {% do apply_grants(target_relation, grant_config, should_revoke=should_revoke(existing_relation, full_refresh_mode=force_ctas)) %}
    {% endif %}
    
    {% do persist_docs(target_relation, model) %}
    
    {{ return({'relations': [target_relation]}) }}
{% endmaterialization %}

@cla-bot
Copy link

cla-bot bot commented Jul 8, 2025

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @AkhilGurrapu

@colin-rogers-dbt
Copy link
Contributor

@AkhilGurrapu thanks for the proposal! You'll need to sign the CLA and migrate this PR to dbt-adapters as we have moved dbt-snowflake development there

Beyond that some high level feedback:

  1. This PR will need to add/update existing functional tests to validate this PR
  2. How will we handle the situation where a user wants to migrate their model from a standard table to a hybrid table? How will dbt know that it needs to be replaced?
  3. Is a hybrid table truly a different materialization or is it a different "relation" that is just a configuration on the table materialization?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants