Add redshift support by CoderCookE · Pull Request #271 · Datavault-UK/automate-dv

CoderCookE · 2026-02-12T18:26:32Z

Add Amazon Redshift Support

Summary

This PR adds full Amazon Redshift support to automate-dv, enabling users to build Data Vault 2.0 warehouses on Redshift using all table types (hubs, links, satellites, PITs, bridges, etc.).

Motivation

Amazon Redshift is a widely-used cloud data warehouse platform, and many organizations need to implement Data Vault patterns on Redshift. This implementation provides native Redshift support with optimizations for Redshift's SQL dialect and performance characteristics.

Implementation Details

Supported Features

✅ All 10 Data Vault Table Types:

Hubs (hub.sql)
Links (link.sql)
Satellites (sat.sql)
Effectivity Satellites (eff_sat.sql)
Multi-Active Satellites (ma_sat.sql)
Point-in-Time Tables (pit.sql)
Bridge Tables (bridge.sql)
Extended Tracking Satellites (xts.sql)
Non-Historized Links (nh_link.sql)
Reference Tables (ref_table.sql)

✅ Hash Algorithms:

MD5: Full support via native MD5() function → VARCHAR(32)
SHA1: Full support via native SHA1() function → VARCHAR(40)
SHA256: Full support via native SHA2(string, 256) function → VARCHAR(64)

Key Design Decisions

1. VARCHAR Hex Strings for Hash Storage

Decision: Store hashes as VARCHAR hex strings instead of binary types.

Rationale:

Simplifies the PIT (Point-in-Time) macro significantly - no need for PostgreSQL-specific ENCODE/DECODE operations
Makes hashes human-readable for debugging
Follows proven pattern from BigQuery and Databricks implementations
Storage overhead is negligible for most use cases

Implementation:

MD5: VARCHAR(32) - 32 hex characters
SHA1: VARCHAR(40) - 40 hex characters
SHA256: VARCHAR(64) - 64 hex characters

2. QUALIFY Clause for Performance

Decision: Require Redshift version with QUALIFY support (July 2023+).

Rationale:

QUALIFY provides significant performance improvements over subquery patterns
Most production Redshift clusters are kept current for security and performance
Cleaner, more maintainable SQL

Example:

SELECT columns
FROM table
QUALIFY ROW_NUMBER() OVER(PARTITION BY pk ORDER BY ldts) = 1

3. Inheritance Strategy

Approach: Mix of custom implementations and inheritance from existing adapters:

Custom: hub.sql (QUALIFY pattern), pit.sql (VARCHAR hex simplification)
PostgreSQL-based: link.sql, sat.sql (compatible window functions)
Default: ref_table.sql, xts.sql, eff_sat.sql, ma_sat.sql, nh_link.sql, bridge.sql

Rationale: Maximize code reuse while optimizing for Redshift-specific features.

Files Changed

Supporting Macros (7 files modified)

macros/internal/metadata_processing/get_escape_characters.sql - Double quotes like PostgreSQL
macros/internal/metadata_processing/concat_ws.sql - Native CONCAT_WS support
macros/supporting/data_types/type_binary.sql - VARCHAR(32/64) for hashes
macros/supporting/data_types/type_string.sql - VARCHAR type
macros/supporting/data_types/type_timestamp.sql - TIMESTAMP (no timezone)
macros/supporting/hash_components/select_hash_alg.sql - MD5, SHA256, SHA1 handling
macros/supporting/casting/cast_binary.sql - Dynamic VARCHAR casting

Table Macros (10 files created)

macros/tables/redshift/*.sql - All 10 Data Vault table types

Documentation (1 file created)

macros/tables/redshift/README.md - Comprehensive implementation guide

Total: 17 files changed, 560 insertions(+)

Configuration

Users can configure their hash algorithm in dbt_project.yml:

vars:
  # Use MD5 (default)
  hash: 'md5'

  # Or use SHA256 for stronger hashing
  hash: 'sha'

Requirements

Minimum Redshift Version: July 2023 or later (QUALIFY support required)
dbt Version: >=1.0.0, <3.0.0
dbt-redshift adapter: Latest version recommended

Usage Example

-- models/raw_vault/hubs/hub_customer.sql
{{- config(
    materialized='incremental',
    schema='raw_vault'
) -}}

{%- set src_pk = 'CUSTOMER_PK' -%}
{%- set src_nk = 'CUSTOMER_ID' -%}
{%- set src_ldts = 'LOAD_DATETIME' -%}
{%- set src_source = 'RECORD_SOURCE' -%}

{{ automate_dv.hub(src_pk=src_pk,
                   src_nk=src_nk,
                   src_ldts=src_ldts,
                   src_source=src_source,
                   source_model='stg_customer') }}

Migration Notes

For users migrating from PostgreSQL to Redshift:

Hash Storage: Redshift uses VARCHAR hex strings vs PostgreSQL's BYTEA binary. MD5 hash values remain compatible when compared as strings.
Data Type Mapping:
- PostgreSQL BYTEA → Redshift VARCHAR(32) or VARCHAR(64)
- PostgreSQL TIMESTAMP → Redshift TIMESTAMP
- PostgreSQL VARCHAR → Redshift VARCHAR
Performance: QUALIFY clause provides better performance than PostgreSQL's DISTINCT ON for many queries.

Documentation Updates Needed

This PR includes in-code documentation (macros/tables/redshift/README.md). External documentation updates needed:

Update platform support matrix at https://automate-dv.readthedocs.io/en/latest/platform_support/
Add Redshift configuration guide to official docs
Add Redshift to platform comparison tables
Update installation instructions with Redshift example

References

AWS Redshift SQL Reference: https://docs.aws.amazon.com/redshift/latest/dg/cm_chap_SQLCommandRef.html
QUALIFY Clause (July 2023): https://docs.aws.amazon.com/redshift/latest/dg/r_QUALIFY_clause.html
SHA2 Function: https://docs.aws.amazon.com/redshift/latest/dg/SHA2.html
MD5 Function: https://docs.aws.amazon.com/redshift/latest/dg/r_MD5.html

Note: This PR is ready for review and testing. I'm happy to make adjustments based on feedback and add any additional test coverage or documentation as needed.

Add Redshift-specific data type definitions: - type_binary: VARCHAR(32) for MD5 hash hex strings - type_string: VARCHAR - type_timestamp: TIMESTAMP (no timezone)

Add Redshift-specific metadata processing: - get_escape_characters: Use double quotes like PostgreSQL - concat_ws: Use native CONCAT_WS function

Add Redshift hash function support: - MD5: Returns UPPER(MD5(...)) as VARCHAR(32) hex string - SHA256: Returns UPPER(SHA2(..., 256)) as VARCHAR(64) hex string - SHA1: Warns and falls back to MD5 (not supported) - cast_binary: Dynamic VARCHAR casting based on hash type - type_binary: Dynamic VARCHAR length (32 for MD5, 64 for SHA256) Redshift supports MD5 and SHA256 (via SHA2 function) natively. SHA1 is not supported and falls back to MD5 with a warning.

Create Redshift table macros directory and add simple table types that inherit from default implementations: - ref_table: Reference tables - xts: Extended tracking satellite - eff_sat: Effectivity satellite - ma_sat: Multi-active satellite These macros work with Redshift standard SQL without requiring platform-specific optimizations.

Implement hub macro using Redshift QUALIFY clause for better performance. Uses ROW_NUMBER() window function with QUALIFY to deduplicate records efficiently. Requires Redshift version with QUALIFY support (July 2023+).

Add link and non-historized link macros: - link: Inherits from PostgreSQL (uses ROW_NUMBER pattern) - nh_link: Inherits from default implementation PostgreSQL link pattern works well with Redshift as both support similar window function syntax.

Implement satellite macro by inheriting from PostgreSQL. Uses LAG window function and ROW_NUMBER for change detection and deduplication. Redshift supports all required window functions (LAG, ROW_NUMBER, PARTITION BY).

Implement Point-in-Time (PIT) macro for Redshift using simplified VARCHAR hex string approach instead of PostgreSQL BYTEA encoding. Key differences from PostgreSQL: - Removed ENCODE/DECODE functions (PostgreSQL-specific) - Uses simple MAX() aggregation on VARCHAR(32) hex strings - Works because hashes are stored as VARCHAR not binary This simplification is possible due to our choice of VARCHAR(32) for hash storage, making the code cleaner and more maintainable.

Implement bridge macro by inheriting from default implementation. No ENCODE/DECODE operations needed (unlike PIT), so default implementation works without modifications.

CoderCookE added 9 commits February 12, 2026 12:58

Add Redshift data type macros

515092f

Add Redshift-specific data type definitions: - type_binary: VARCHAR(32) for MD5 hash hex strings - type_string: VARCHAR - type_timestamp: TIMESTAMP (no timezone)

Add Redshift metadata processing macros

619d6ba

Add Redshift-specific metadata processing: - get_escape_characters: Use double quotes like PostgreSQL - concat_ws: Use native CONCAT_WS function

Add Redshift hub macro with QUALIFY pattern

2c641f8

Implement hub macro using Redshift QUALIFY clause for better performance. Uses ROW_NUMBER() window function with QUALIFY to deduplicate records efficiently. Requires Redshift version with QUALIFY support (July 2023+).

Add Redshift link macros

0f4bbc1

Add link and non-historized link macros: - link: Inherits from PostgreSQL (uses ROW_NUMBER pattern) - nh_link: Inherits from default implementation PostgreSQL link pattern works well with Redshift as both support similar window function syntax.

Add Redshift satellite macro

6d94cf5

Implement satellite macro by inheriting from PostgreSQL. Uses LAG window function and ROW_NUMBER for change detection and deduplication. Redshift supports all required window functions (LAG, ROW_NUMBER, PARTITION BY).

Add Redshift bridge macro

0bd5cc9

Implement bridge macro by inheriting from default implementation. No ENCODE/DECODE operations needed (unlike PIT), so default implementation works without modifications.

CoderCookE mentioned this pull request Feb 12, 2026

[FEATURE] Redshift Support #86

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add redshift support#271

Add redshift support#271
CoderCookE wants to merge 9 commits intoDatavault-UK:masterfrom
CoderCookE:feat/add-redshift-support

CoderCookE commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CoderCookE commented Feb 12, 2026

Add Amazon Redshift Support

Summary

Motivation

Implementation Details

Supported Features

Key Design Decisions

1. VARCHAR Hex Strings for Hash Storage

2. QUALIFY Clause for Performance

3. Inheritance Strategy

Files Changed

Supporting Macros (7 files modified)

Table Macros (10 files created)

Documentation (1 file created)

Configuration

Requirements

Usage Example

Migration Notes

Documentation Updates Needed

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant