Key patterns and configurations for this Dune dbt repository.
NEVER declare schema property in model configs.
Schema names are controlled automatically by macros/dune_dbt_overrides/get_custom_schema.sql based on:
- Target (
devorprod) DUNE_TEAM_NAMEenvironment variableDEV_SCHEMA_SUFFIX(optional)
ALWAYS provide an alias config for every model.
{{ config(
alias = 'my_model_name'
, materialized = 'view'
) }}Alias is how we differentiate models since schema names are auto-managed.
The source() macro automatically uses database='delta_prod'.
To override:
{{ source('source_name', 'table_name', database='custom_db') }}{{ config(
alias = 'my_incremental_model'
, materialized = 'incremental'
, incremental_strategy = 'merge' -- ALWAYS specify strategy
, unique_key = ['date', 'id'] -- ALWAYS specify for merge/delete+insert
) }}Strategies:
merge- Update/insert based on unique keydelete+insert- Delete matching rows, then insertappend- Append-only (use with deduplication)
Unique key columns MUST NOT contain NULL values.
In Trino, NULLs cause lookups to fail → duplicates inserted.
Solutions:
- Filter out NULLs:
where key_column is not null - Generate surrogate key:
{{ dbt_utils.generate_surrogate_key(['col1', 'col2']) }}
Only use for time-series data where you ONLY need recent target records.
Use when:
- Daily aggregations, rolling metrics
- Data naturally partitioned by time
DO NOT use when:
- Need to check full history (e.g., DEX pool creation events)
- Reference/dimension tables
When in doubt, leave it out.
Use a lookback period when reading from source tables in incremental models to handle late-arriving data and gaps.
This handles:
- Upstream delays: Source data may be delayed due to incidents
- Late-arriving data: Data points may arrive hours or days late
- Data gaps: Automatically backfills missing data within lookback window
{{ config(
alias = 'my_incremental_model'
, materialized = 'incremental'
, incremental_strategy = 'merge'
, unique_key = ['block_date', 'transaction_hash']
) }}
select
block_date
, block_time
, transaction_hash
, from_address
, to_address
, value
from
{{ source('ethereum', 'transactions') }}
where
block_date >= date('2020-01-01') -- Historical start date
{% if is_incremental() %}
-- Lookback period: adjust based on run frequency and reliability needs
and block_date >= current_date - interval '7' day
{% endif %}The optimal lookback period depends on your run frequency and priorities:
| Run Frequency | Suggested Lookback | Reasoning |
|---|---|---|
| Hourly | 1-2 days | Frequent runs catch gaps quickly; shorter lookback keeps costs low |
| Daily | 3-7 days | Longer lookback needed since next run is 24 hours away |
| Weekly | 7-14 days | Must cover full week + buffer for late-arriving data |
Cost vs. Reliability Trade-offs:
✅ Larger Lookback Period (7+ days)
- ✅ More reliable: catches all late-arriving data and handles longer delays
- ✅ Peace of mind: less worry about upstream source data delays
- ✅ Better for infrequent runs (daily/weekly)
- ❌ More data processed: longer query runs, higher credit spend
✅ Shorter Lookback Period (1-3 days)
- ✅ Lower cost: less data scanned, faster query runs, lower credit spend
- ✅ Better for frequent runs (hourly)
- ❌ May miss data: late-arriving data beyond lookback window requires manual backfills
- ❌ More maintenance: need to monitor for gaps
Recommendation: Find the right balance for your team:
- Consider your data SLAs and how critical up-to-date data is
- Factor in your credit budget and query costs
- Start with 7 days for daily runs, adjust based on observed delays
- Monitor for gaps and tune the lookback period accordingly
The lookback period is ONLY for source reads in is_incremental() block:
-- ✅ CORRECT: Lookback on source, full merge on target
select
...
from
{{ source('ethereum', 'transactions') }}
where
block_date >= date('2020-01-01')
{% if is_incremental() %}
and block_date >= current_date - interval '7' day -- Lookback on SOURCE
{% endif %}The target table merge still checks ALL existing records (unless using incremental_predicates).
- Automatic gap filling: If yesterday's data was missing, next run picks it up
- Late data handling: Data arriving 2-3 days late is captured
- Upstream incident recovery: When sources come back online, gaps auto-fill
- No manual backfills: Most data quality issues resolve automatically
-- High-value, daily run, reliability critical → 7-day lookback
{% if is_incremental() %}
and block_date >= current_date - interval '7' day -- Handles weekly delays
{% endif %}
-- High-frequency hourly run, cost-sensitive → 1-2 day lookback
{% if is_incremental() %}
and block_date >= current_date - interval '1' day -- Minimal lookback, frequent runs
{% endif %}
-- Weekly run, must cover full period → 10-14 day lookback
{% if is_incremental() %}
and block_date >= current_date - interval '14' day -- Cover full week + buffer
{% endif %}Only partition if each partition will have ~1M+ rows.
Common use cases:
properties = { "partitioned_by": "ARRAY['block_date']" }for daily partitionsproperties = { "partitioned_by": "ARRAY['block_month']" }for large event tables- Monthly transfers, swaps, liquidity events
Small tables: partitioning hurts more than helps.
If partitioned, ALWAYS include partition column in unique_key.
Example (partitioned table model):
{{ config(
alias = 'daily_transaction_summary'
, materialized = 'table'
, properties = {
"partitioned_by": "ARRAY['block_date']"
}
) }}Example (partitioned incremental model):
{{ config(
alias = 'monthly_dex_swaps'
, materialized = 'incremental'
, incremental_strategy = 'merge'
, unique_key = ['block_month', 'tx_hash', 'evt_index'] -- ✅ includes partition column
, properties = {
"partitioned_by": "ARRAY['block_month']"
}
) }}Why: Enables partition pruning during merge lookups. Dramatically faster.
One Model Per Protocol/Version/Blockchain
- Separate Uniswap V2, Uniswap V3
- Separate Ethereum, Arbitrum, Polygon
- Benefits: Clear lineage, easy to debug, flexible
Simple Union Models
- Union across versions: add
versioncolumn - Union across chains: add
blockchaincolumn - Keep unions simple, no complex logic
Save Enrichments for Downstream
- Keep staging models simple (select, rename, basic filters)
- Add metadata, lookups, calculations downstream
- Easier to iterate without rebuilding upstream
- Logic repeats across many blockchains
- Protocols commonly forked (Uniswap V2, Compound, ERC20)
Benefits: Single source of truth, easier maintenance.
- Use
UINT256andINT256for large numbers - Use
VARBINARYfor binary data - Hex without quotes:
0x039e2fb...not'0x039e2fb...' DATE '2025-10-08'for block_date,TIMESTAMP '2025-10-08'for block_time
- Filter by partition columns -
block_date,block_time,evt_block_time - Never SELECT * - Specify columns needed
- Filter cross-chain tables - Both
blockchainAND time - Use CTEs - Break complex queries into readable parts
- Avoid correlated subqueries - Use window functions instead
- Use LIMIT during development - Test with small result sets
- Never ORDER BY without LIMIT - Expensive on large datasets
- Use curated tables -
dex.trades,tokens.transfersvs raw logs
Put time filters in both ON and WHERE:
inner join ethereum.logs as l
on t.hash = l.tx_hash
and t.block_date = l.block_date -- In ON clause
and l.block_date >= timestamp '2024-10-01' -- In ON clause
where
t.block_date >= timestamp '2024-10-01' -- In WHERE clauseSet globally in dbt_project.yml, do not override:
view_security: invoker- Required for Dune viewsrequire_certificate_validation: true- Security requirement
- Development Workflow - Step-by-step process
- Testing - Test requirements
- SQL Style Guide - Formatting standards