Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions macros/internal/metadata_processing/concat_ws.sql
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,9 @@ CONCAT(
{{ automate_dv.default__concat_ws(string_list=string_list, separator=separator) }}

{%- endmacro -%}

{%- macro redshift__concat_ws(string_list, separator="||") -%}

{{ automate_dv.default__concat_ws(string_list=string_list, separator=separator) }}

{%- endmacro -%}
4 changes: 4 additions & 0 deletions macros/internal/metadata_processing/get_escape_characters.sql
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,7 @@
{%- macro postgres__get_escape_characters() %}
{%- do return (('"', '"')) -%}
{%- endmacro %}

{%- macro redshift__get_escape_characters() %}
{%- do return (('"', '"')) -%}
{%- endmacro %}
11 changes: 11 additions & 0 deletions macros/supporting/casting/cast_binary.sql
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,14 @@

{%- endmacro -%}

{%- macro redshift__cast_binary(column_str, alias=none, quote=true) -%}

{%- if quote -%}
'{{ column_str }}'
{%- else -%}
CAST({{ column_str }} AS {{ automate_dv.type_binary() }})
{%- endif -%}

{%- if alias %} AS {{ alias }} {%- endif -%}

{%- endmacro -%}
18 changes: 18 additions & 0 deletions macros/supporting/data_types/type_binary.sql
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,22 @@
{%- else -%}
BINARY
{%- endif -%}
{%- endmacro -%}

{%- macro redshift__type_binary(for_dbt_compare=false) -%}
{%- if for_dbt_compare -%}
VARCHAR
{%- else -%}
{%- set selected_hash = var('hash', 'MD5') | lower -%}

{%- if selected_hash == 'md5' -%}
VARCHAR(32)
{%- elif selected_hash == 'sha' -%}
VARCHAR(64)
{%- elif selected_hash == 'sha1' -%}
VARCHAR(40)
{%- else -%}
VARCHAR(32)
{%- endif -%}
{%- endif -%}
{%- endmacro -%}
4 changes: 4 additions & 0 deletions macros/supporting/data_types/type_string.sql
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,7 @@
{%- macro databricks__type_string() -%}
STRING
{%- endmacro -%}

{%- macro redshift__type_string() -%}
VARCHAR
{%- endmacro -%}
4 changes: 4 additions & 0 deletions macros/supporting/data_types/type_timestamp.sql
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,8 @@

{%- macro sqlserver__type_timestamp() -%}
DATETIME2
{%- endmacro -%}

{%- macro redshift__type_timestamp() -%}
TIMESTAMP
{%- endmacro -%}
20 changes: 20 additions & 0 deletions macros/supporting/hash_components/select_hash_alg.sql
Original file line number Diff line number Diff line change
Expand Up @@ -180,3 +180,23 @@
{%- endif -%}

{% endmacro %}

{#- Redshift -#}

{% macro redshift__hash_alg_md5() -%}

{% do return("UPPER(MD5([HASH_STRING_PLACEHOLDER]))") %}

{% endmacro %}

{% macro redshift__hash_alg_sha256() -%}

{% do return("UPPER(SHA2([HASH_STRING_PLACEHOLDER], 256))") %}

{% endmacro %}

{% macro redshift__hash_alg_sha1() -%}

{% do return("UPPER(SHA1([HASH_STRING_PLACEHOLDER]))") %}

{% endmacro %}
130 changes: 130 additions & 0 deletions macros/tables/redshift/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Redshift Table Macros

This directory contains Amazon Redshift-specific implementations of Data Vault table macros.

## Requirements

- **Minimum Redshift Version**: July 2023 or later (QUALIFY clause support required)
- **dbt Version**: >=1.0.0, <3.0.0
- **dbt-redshift adapter**: Latest version recommended

## Supported Hash Algorithms

- **MD5**: Fully supported via native `MD5()` function → `VARCHAR(32)`
- **SHA1**: Fully supported via native `SHA1()` function → `VARCHAR(40)`
- **SHA256**: Fully supported via native `SHA2(string, 256)` function → `VARCHAR(64)`

## Configuration

Configure your hash algorithm in `dbt_project.yml`:

```yaml
vars:
# Use MD5 (default)
hash: 'md5'

# Or use SHA1
hash: 'sha1'

# Or use SHA256 for strongest hashing
hash: 'sha'
```

## Implementation Details

### Hash Storage
Hashes are stored as **VARCHAR hex strings** rather than binary types:
- MD5: `VARCHAR(32)` - 32 hexadecimal characters
- SHA1: `VARCHAR(40)` - 40 hexadecimal characters
- SHA256: `VARCHAR(64)` - 64 hexadecimal characters

This approach simplifies the PIT (Point-in-Time) macro by avoiding PostgreSQL-specific `ENCODE/DECODE` operations.

### Performance Optimizations

**QUALIFY Clause**: Redshift macros use the modern `QUALIFY` clause for window function filtering, providing better performance than subquery approaches:

```sql
SELECT columns
FROM table
WHERE conditions
QUALIFY ROW_NUMBER() OVER(PARTITION BY pk ORDER BY ldts) = 1
```

### Table Macro Inheritance

- **Custom implementations**: `hub.sql`, `pit.sql`
- **PostgreSQL-based**: `link.sql`, `sat.sql`
- **Default inheritance**: `ref_table.sql`, `xts.sql`, `eff_sat.sql`, `ma_sat.sql`, `nh_link.sql`, `bridge.sql`

## Data Types

| AutomateDV Type | Redshift Type |
|----------------|---------------|
| type_binary | VARCHAR(32) for MD5, VARCHAR(40) for SHA1, VARCHAR(64) for SHA256 |
| type_string | VARCHAR |
| type_timestamp | TIMESTAMP (no timezone) |

## SQL Dialect Features Used

- **QUALIFY**: Window function result filtering
- **ROW_NUMBER()**: Row deduplication
- **LAG()**: Change detection in satellites
- **CONCAT_WS()**: String concatenation
- **MD5()**: MD5 hash computation
- **SHA1()**: SHA1 hash computation
- **SHA2(str, 256)**: SHA256 hash computation
- **MAX()**: Aggregate functions (including on VARCHAR hashes)

## Example Usage

```sql
-- models/raw_vault/hubs/hub_customer.sql
{{- config(
materialized='incremental',
schema='raw_vault'
) -}}

{%- set src_pk = 'CUSTOMER_PK' -%}
{%- set src_nk = 'CUSTOMER_ID' -%}
{%- set src_ldts = 'LOAD_DATETIME' -%}
{%- set src_source = 'RECORD_SOURCE' -%}

{{ automate_dv.hub(src_pk=src_pk,
src_nk=src_nk,
src_ldts=src_ldts,
src_source=src_source,
source_model='stg_customer') }}
```

## Differences from PostgreSQL

1. **PIT Macro**: Simplified to use direct `MAX()` on VARCHAR instead of `ENCODE(MAX(ENCODE()))`
2. **Hub Macro**: Uses `QUALIFY` instead of `DISTINCT ON`
3. **Binary Type**: Uses `VARCHAR` instead of `BYTEA`
4. **Hash Output**: Returns uppercase hex strings

## Troubleshooting

### QUALIFY not recognized
**Error**: `syntax error at or near "QUALIFY"`

**Solution**: Upgrade to Redshift version from July 2023 or later

### Hash configuration not working
**Error**: Hash values don't match expected algorithm

**Solution**: Ensure you're using the correct var name:
- `hash: 'md5'` for MD5
- `hash: 'sha1'` for SHA1
- `hash: 'sha'` for SHA256 (note: use 'sha', not 'sha256')

### VARCHAR length errors
**Error**: Value too long for type VARCHAR(32) or similar

**Solution**: Check your hash configuration vs table definition:
- MD5 needs VARCHAR(32)
- SHA1 needs VARCHAR(40)
- SHA256 needs VARCHAR(64)

If you changed hash algorithms, recreate the table with the correct VARCHAR length.
16 changes: 16 additions & 0 deletions macros/tables/redshift/bridge.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
/*
* Copyright (c) Business Thinking Ltd. 2019-2026
* This software includes code developed by the AutomateDV (f.k.a dbtvault) Team at Business Thinking Ltd. Trading as Datavault
*/

{%- macro redshift__bridge(src_pk, as_of_dates_table, bridge_walk, stage_tables_ldts, src_extra_columns, src_ldts, source_model) -%}

{{- automate_dv.default__bridge(src_pk=src_pk,
as_of_dates_table=as_of_dates_table,
bridge_walk=bridge_walk,
stage_tables_ldts=stage_tables_ldts,
src_extra_columns=src_extra_columns,
src_ldts=src_ldts,
source_model=source_model) -}}

{%- endmacro -%}
14 changes: 14 additions & 0 deletions macros/tables/redshift/eff_sat.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
/*
* Copyright (c) Business Thinking Ltd. 2019-2026
* This software includes code developed by the AutomateDV (f.k.a dbtvault) Team at Business Thinking Ltd. Trading as Datavault
*/

{%- macro redshift__eff_sat(src_pk, src_dfk, src_sfk, src_extra_columns, src_start_date, src_end_date, src_eff, src_ldts, src_source, source_model) -%}

{{- automate_dv.default__eff_sat(src_pk=src_pk, src_dfk=src_dfk, src_sfk=src_sfk,
src_extra_columns=src_extra_columns,
src_start_date=src_start_date, src_end_date=src_end_date,
src_eff=src_eff, src_ldts=src_ldts, src_source=src_source,
source_model=source_model) -}}

{%- endmacro -%}
92 changes: 92 additions & 0 deletions macros/tables/redshift/hub.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
/*
* Copyright (c) Business Thinking Ltd. 2019-2026
* This software includes code developed by the AutomateDV (f.k.a dbtvault) Team at Business Thinking Ltd. Trading as Datavault
*/

{%- macro redshift__hub(src_pk, src_nk, src_extra_columns, src_ldts, src_source, source_model) -%}

{%- set source_cols = automate_dv.expand_column_list(columns=[src_pk, src_nk, src_extra_columns, src_ldts, src_source]) -%}

{%- if model.config.materialized == 'vault_insert_by_rank' %}
{%- set source_cols_with_rank = source_cols + [automate_dv.config_meta_get('rank_column')] -%}
{%- endif %}

{{ 'WITH ' -}}

{%- set stage_count = source_model | length -%}

{%- set ns = namespace(last_cte= "") -%}

{%- for src in source_model -%}

{%- set source_number = loop.index | string -%}

row_rank_{{ source_number }} AS (
{%- if model.config.materialized == 'vault_insert_by_rank' %}
SELECT {{ automate_dv.prefix(source_cols_with_rank, 'rr') }}
{%- else %}
SELECT {{ automate_dv.prefix(source_cols, 'rr') }}
{%- endif %}
FROM {{ ref(src) }} AS rr
WHERE {{ automate_dv.multikey(src_pk, prefix='rr', condition='IS NOT NULL') }}
QUALIFY ROW_NUMBER() OVER(
PARTITION BY {{ automate_dv.prefix([src_pk], 'rr') }}
ORDER BY {{ automate_dv.prefix([src_ldts], 'rr') }}
) = 1
{%- set ns.last_cte = "row_rank_{}".format(source_number) %}
),{{ "\n" if not loop.last }}
{% endfor -%}
{% if stage_count > 1 %}
stage_union AS (
{%- for src in source_model %}
SELECT * FROM row_rank_{{ loop.index | string }}
{%- if not loop.last %}
UNION ALL
{%- endif %}
{%- endfor %}
{%- set ns.last_cte = "stage_union" %}
),
{%- endif -%}

{%- if model.config.materialized == 'vault_insert_by_period' %}
stage_mat_filter AS (
SELECT *
FROM {{ ns.last_cte }}
WHERE __PERIOD_FILTER__
{%- set ns.last_cte = "stage_mat_filter" %}
),
{%- elif model.config.materialized == 'vault_insert_by_rank' %}
stage_mat_filter AS (
SELECT *
FROM {{ ns.last_cte }}
WHERE __RANK_FILTER__
{%- set ns.last_cte = "stage_mat_filter" %}
),
{%- endif -%}

{%- if stage_count > 1 %}

row_rank_union AS (
SELECT ru.*
FROM {{ ns.last_cte }} AS ru
WHERE {{ automate_dv.multikey(src_pk, prefix='ru', condition='IS NOT NULL') }}
QUALIFY ROW_NUMBER() OVER(
PARTITION BY {{ automate_dv.prefix([src_pk], 'ru') }}
ORDER BY {{ automate_dv.prefix([src_ldts], 'ru') }}, {{ automate_dv.prefix([src_source], 'ru') }} ASC
) = 1
{%- set ns.last_cte = "row_rank_union" %}
),
{% endif %}
records_to_insert AS (
SELECT {{ automate_dv.prefix(source_cols, 'a', alias_target='target') }}
FROM {{ ns.last_cte }} AS a
{%- if automate_dv.is_any_incremental() %}
LEFT JOIN {{ this }} AS d
ON {{ automate_dv.multikey(src_pk, prefix=['a','d'], condition='=') }}
WHERE {{ automate_dv.multikey(src_pk, prefix='d', condition='IS NULL') }}
{%- endif %}
)

SELECT * FROM records_to_insert

{%- endmacro -%}
13 changes: 13 additions & 0 deletions macros/tables/redshift/link.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
/*
* Copyright (c) Business Thinking Ltd. 2019-2026
* This software includes code developed by the AutomateDV (f.k.a dbtvault) Team at Business Thinking Ltd. Trading as Datavault
*/

{%- macro redshift__link(src_pk, src_fk, src_extra_columns, src_ldts, src_source, source_model) -%}

{{- automate_dv.postgres__link(src_pk=src_pk, src_fk=src_fk,
src_extra_columns=src_extra_columns,
src_ldts=src_ldts, src_source=src_source,
source_model=source_model) -}}

{%- endmacro -%}
14 changes: 14 additions & 0 deletions macros/tables/redshift/ma_sat.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
/*
* Copyright (c) Business Thinking Ltd. 2019-2026
* This software includes code developed by the AutomateDV (f.k.a dbtvault) Team at Business Thinking Ltd. Trading as Datavault
*/

{%- macro redshift__ma_sat(src_pk, src_cdk, src_hashdiff, src_payload, src_extra_columns, src_eff, src_ldts, src_source, source_model) -%}

{{- automate_dv.default__ma_sat(src_pk=src_pk, src_cdk=src_cdk,
src_hashdiff=src_hashdiff, src_payload=src_payload,
src_extra_columns=src_extra_columns, src_eff=src_eff,
src_ldts=src_ldts, src_source=src_source,
source_model=source_model) -}}

{%- endmacro -%}
13 changes: 13 additions & 0 deletions macros/tables/redshift/nh_link.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
/*
* Copyright (c) Business Thinking Ltd. 2019-2026
* This software includes code developed by the AutomateDV (f.k.a dbtvault) Team at Business Thinking Ltd. Trading as Datavault
*/

{%- macro redshift__nh_link(src_pk, src_fk, src_payload, src_extra_columns, src_eff, src_ldts, src_source, source_model) -%}

{{- automate_dv.default__nh_link(src_pk=src_pk, src_fk=src_fk, src_payload=src_payload,
src_extra_columns=src_extra_columns,
src_eff=src_eff, src_ldts=src_ldts, src_source=src_source,
source_model=source_model) -}}

{%- endmacro -%}
Loading