Added new PumpFun AMM buy event to stg_decoded_swaps#9428
Added new PumpFun AMM buy event to stg_decoded_swaps#9428megavel wants to merge 18 commits intoduneanalytics:mainfrom
Conversation
|
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new incremental staging model decoding Pumpswap "new buy" instructions, an intermediate UNION ALL view consolidating decoded swaps, updates the backfill to consume the union and advances project start dates; plus schema and source entries and minor formatting fixes. Changes
Sequence Diagram(s)sequenceDiagram
participant InstructionCalls as InstructionCalls\n(solana.instruction_calls)
participant PumpAmm as PumpAmm\n(pump_amm_evt_buyevent)
participant StgOld as StgOld\n(pumpswap_solana_stg_decoded_swaps)
participant StgNew as StgNew\n(pumpswap_solana_stg_decoded_newevent)
participant IntAll as IntAll\n(pumpswap_solana_int_all_swaps)
participant Backfill as Backfill\n(pumpswap_solana_base_trades_backfill)
InstructionCalls->>PumpAmm: join/filter by discriminator & execution/tx_success
InstructionCalls->>StgNew: decode new buy events, compute surrogate_key,\nblock_month/block_date, dedupe (rn=1)
InstructionCalls->>StgOld: existing decode flow (unchanged)
StgOld->>IntAll: supply decoded swap rows
StgNew->>IntAll: supply decoded new-event rows
IntAll->>Backfill: backfill reads consolidated swaps
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@dbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_swaps.sql`:
- Around line 95-113: The join between g and f is currently only on tx_id
causing N:N matches; change it to a 1:1 instruction-scoped join (add the
instruction-level key such as instruction_index or instruction_id: e.g., ON
g.tx_id = f.tx_id AND g.instruction_index = f.instruction_index) and include
block_date equality (AND g.block_date = f.block_date) so the relationship is
unique and partition-prunable; also switch any filters using block_time to use
block_date for partition pruning (replace incremental_predicate('f.block_time')
and incremental_predicate('g.block_time') with
incremental_predicate('f.block_date') and incremental_predicate('g.block_date')
and ensure WHERE includes AND g.block_date >= DATE '{{ project_start_date }}' /
f.block_date counterpart).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e7f67dfb-fb8d-46b4-8d85-a91942a3cef3
📒 Files selected for processing (2)
dbt_subprojects/solana/models/_sector/dex/pumpswap/pumpswap_solana_pools.sqldbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_swaps.sql
| JOIN {{ source('solana', 'instruction_calls') }} f | ||
| ON g.tx_id = f.tx_id | ||
| AND bytearray_substring(f.data, 1, 8) = 0xc62e1552b4d9e870 | ||
| AND f.executing_account = 'pAMMBay6oceH9fJKBRHGP5D4bD4sWpmSwMn52FMfXEA' | ||
| AND f.tx_success = true | ||
| {% if is_incremental() %} | ||
| AND {{ incremental_predicate('f.block_time') }} | ||
| {% else %} | ||
| AND f.block_time >= TIMESTAMP '{{ project_start_date }}' | ||
| {% endif %} | ||
| WHERE bytearray_substring(g.data, 1, 16) = 0xe445a52e51cb9a1d67f4521f2cf57777 | ||
| AND g.executing_account = 'pAMMBay6oceH9fJKBRHGP5D4bD4sWpmSwMn52FMfXEA' | ||
| AND g.is_inner = true | ||
| AND g.tx_success = true | ||
| AND length(g.data) = 447 | ||
| {% if is_incremental() %} | ||
| AND {{ incremental_predicate('g.block_time') }} | ||
| {% else %} | ||
| AND g.block_time >= TIMESTAMP '{{ project_start_date }}' |
There was a problem hiding this comment.
Tighten the g↔f join to an instruction-scoped key.
Right now this branch matches g and f on tx_id only. If a transaction contains more than one qualifying PumpFun instruction, every matching f row can pair with every matching g row, which will duplicate swaps and attach the wrong account_pool_*_token_account values. Please join on the smallest instruction-level key that makes the relationship 1:1 for this tx, and include block_date in the join/filter so the raw-table self-join stays prunable.
As per coding guidelines, "Filter on block_date (not block_time) for partition pruning, and always include partition columns in WHERE and JOIN conditions".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@dbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_swaps.sql`
around lines 95 - 113, The join between g and f is currently only on tx_id
causing N:N matches; change it to a 1:1 instruction-scoped join (add the
instruction-level key such as instruction_index or instruction_id: e.g., ON
g.tx_id = f.tx_id AND g.instruction_index = f.instruction_index) and include
block_date equality (AND g.block_date = f.block_date) so the relationship is
unique and partition-prunable; also switch any filters using block_time to use
block_date for partition pruning (replace incremental_predicate('f.block_time')
and incremental_predicate('g.block_time') with
incremental_predicate('f.block_date') and incremental_predicate('g.block_date')
and ensure WHERE includes AND g.block_date >= DATE '{{ project_start_date }}' /
f.block_date counterpart).
- Create separate staging model for new buy event (0xe445a52e51cb9a1d) - Implement 1:1 join using outer_instruction_index for deduplication - Add ROW_NUMBER() to handle potential duplicate rows from join - Improve partition pruning with block_date filters - Create intermediate model to union old and new event staging models - Update schema.yml with new model documentation
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
dbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_newevent.sql (1)
40-40: Use explicitINNER JOINinstead of bareJOIN.Per coding guidelines, explicit join types should be used for clarity.
♻️ Suggested fix
- JOIN {{ source('solana', 'instruction_calls') }} f + INNER JOIN {{ source('solana', 'instruction_calls') }} f🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_newevent.sql` at line 40, The query uses a bare "JOIN" which needs to be made explicit; replace the occurrence of "JOIN {{ source('solana', 'instruction_calls') }} f" with "INNER JOIN {{ source('solana', 'instruction_calls') }} f" to follow the coding guideline requiring explicit join types (look for the token instruction_calls and alias f in the pumpswap_solana_stg_decoded_newevent SQL to locate the exact line).dbt_subprojects/solana/models/_sector/dex/pumpswap/intermediate/pumpswap_solana_int_all_swaps.sql (1)
9-17: AvoidSELECT *- specify exact columns needed.Using
SELECT *on these staging tables violates the columnar storage guideline and can degrade query performance. Enumerate the required columns explicitly to benefit from Dune's columnar storage optimization. As per coding guidelines: "Never useSELECT *on large tables; specify exact columns needed due to Dune's columnar storage".♻️ Suggested refactor
SELECT - * -FROM {{ ref('pumpswap_solana_stg_decoded_swaps')}} + block_slot + , block_month + , block_date + , block_time + , inner_instruction_index + , swap_inner_index + , outer_instruction_index + , outer_executing_account + , tx_id + , tx_index + , pool + , user_account + , account_user_base_token_account + , account_user_quote_token_account + , account_pool_base_token_account + , account_pool_quote_token_account + , account_protocol_fee_recipient_token_account + , base_amount + , is_buy + , surrogate_key +FROM {{ ref('pumpswap_solana_stg_decoded_swaps') }} UNION ALL SELECT -* -FROM {{ref('pumpswap_solana_stg_decoded_newevent')}} + block_slot + , block_month + , block_date + , block_time + , inner_instruction_index + , swap_inner_index + , outer_instruction_index + , outer_executing_account + , tx_id + , tx_index + , pool + , user_account + , account_user_base_token_account + , account_user_quote_token_account + , account_pool_base_token_account + , account_pool_quote_token_account + , account_protocol_fee_recipient_token_account + , base_amount + , is_buy + , surrogate_key +FROM {{ ref('pumpswap_solana_stg_decoded_newevent') }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dbt_subprojects/solana/models/_sector/dex/pumpswap/intermediate/pumpswap_solana_int_all_swaps.sql` around lines 9 - 17, Replace the two "SELECT *" uses with an explicit, matching column list: inspect the staging models referenced by ref('pumpswap_solana_stg_decoded_swaps') and ref('pumpswap_solana_stg_decoded_newevent'), enumerate the exact columns you need (e.g., tx_hash, block_number, timestamp, sender, recipient, amount, token_address, etc.), and use that same ordered column list in both SELECTs so UNION ALL aligns. Ensure you only include needed columns to respect Dune's columnar-storage guideline and update any downstream references that expect specific columns from this intermediate model.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@dbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_newevent.sql`:
- Line 9: The config currently sets incremental_predicates =
[incremental_predicate('DBT_INTERNAL_DEST.block_date')] but the model and its
WHERE filters use block_time; update the config to use incremental_predicates =
[incremental_predicate('DBT_INTERNAL_DEST.block_time')] so the incremental
predicate matches the query filters (see incremental_predicate usages in the
body) and ensures correct time-series incremental behavior.
---
Nitpick comments:
In
`@dbt_subprojects/solana/models/_sector/dex/pumpswap/intermediate/pumpswap_solana_int_all_swaps.sql`:
- Around line 9-17: Replace the two "SELECT *" uses with an explicit, matching
column list: inspect the staging models referenced by
ref('pumpswap_solana_stg_decoded_swaps') and
ref('pumpswap_solana_stg_decoded_newevent'), enumerate the exact columns you
need (e.g., tx_hash, block_number, timestamp, sender, recipient, amount,
token_address, etc.), and use that same ordered column list in both SELECTs so
UNION ALL aligns. Ensure you only include needed columns to respect Dune's
columnar-storage guideline and update any downstream references that expect
specific columns from this intermediate model.
In
`@dbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_newevent.sql`:
- Line 40: The query uses a bare "JOIN" which needs to be made explicit; replace
the occurrence of "JOIN {{ source('solana', 'instruction_calls') }} f" with
"INNER JOIN {{ source('solana', 'instruction_calls') }} f" to follow the coding
guideline requiring explicit join types (look for the token instruction_calls
and alias f in the pumpswap_solana_stg_decoded_newevent SQL to locate the exact
line).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b06fce8d-c793-4a59-8088-52c468184966
📒 Files selected for processing (5)
dbt_subprojects/solana/models/_sector/dex/pumpswap/_schema.ymldbt_subprojects/solana/models/_sector/dex/pumpswap/intermediate/pumpswap_solana_int_all_swaps.sqldbt_subprojects/solana/models/_sector/dex/pumpswap/pumpswap_solana_base_trades_backfill.sqldbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_newevent.sqldbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_swaps.sql
✅ Files skipped from review due to trivial changes (1)
- dbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_swaps.sql
...projects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_newevent.sql
Show resolved
Hide resolved
- Added explicit column list to int_all_swaps UNION ALL branches - Change JOIN to explicit INNER JOIN in stg_decoded_newevent - Switch new buy event from raw instruction_calls to decoded source table - Changed project start date to 2026-03-06 for testing only
…book into pumpswap-new-event Merge remote changes.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@dbt_subprojects/solana/models/_sector/dex/pumpswap/pumpswap_solana_base_trades_backfill.sql`:
- Line 19: The fallback date assigned to the Jinja variable begin ('{% set begin
= ... %}') is out of sync with the model config; update the fallback value used
when model.batch is absent so it matches the model config(begin = '2025-02-20')
— i.e., change the begin fallback to '2025-02-20' in the
pumpswap_solana_base_trades_backfill.sql template so the query backfills from
the same date as the model config.
In
`@dbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_swaps.sql`:
- Line 14: The project_start_date was narrowed to '2026-03-06', which will drop
legacy swaps (2025-02-20 through 2026-03-05); revert project_start_date in
pumpswap_solana_stg_decoded_swaps (the project_start_date variable) to the
previous/original value that covers 2025-02-20 (or set it to an earlier
universal start) so that pumpswap_solana_int_all_swaps can reconstruct full
history, or remove the artificial restriction entirely so legacy swaps are
preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: be24c4f3-6d4a-4206-ad3f-62d0bdce40ae
📒 Files selected for processing (6)
dbt_subprojects/solana/models/_sector/dex/pumpswap/_schema.ymldbt_subprojects/solana/models/_sector/dex/pumpswap/intermediate/pumpswap_solana_int_all_swaps.sqldbt_subprojects/solana/models/_sector/dex/pumpswap/pumpswap_solana_base_trades_backfill.sqldbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_newevent.sqldbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_swaps.sqldbt_subprojects/solana/models/_sources.yml
🚧 Files skipped from review as they are similar to previous changes (2)
- dbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_newevent.sql
- dbt_subprojects/solana/models/_sector/dex/pumpswap/intermediate/pumpswap_solana_int_all_swaps.sql
| }} | ||
|
|
||
| {% set begin = '2025-02-20' %} | ||
| {% set begin = '2026-03-06' %} |
There was a problem hiding this comment.
Keep the SQL fallback begin aligned with the model config.
This no longer matches Line 11's config(begin = '2025-02-20'). When model.batch is absent, the query now skips the first year of Pumpswap trades even though the microbatch config still says to backfill from February 20, 2025.
Suggested fix
-{% set begin = '2026-03-06' %}
+{% set begin = '2025-02-20' %}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% set begin = '2026-03-06' %} | |
| {% set begin = '2025-02-20' %} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@dbt_subprojects/solana/models/_sector/dex/pumpswap/pumpswap_solana_base_trades_backfill.sql`
at line 19, The fallback date assigned to the Jinja variable begin ('{% set
begin = ... %}') is out of sync with the model config; update the fallback value
used when model.batch is absent so it matches the model config(begin =
'2025-02-20') — i.e., change the begin fallback to '2025-02-20' in the
pumpswap_solana_base_trades_backfill.sql template so the query backfills from
the same date as the model config.
| }} | ||
|
|
||
| {% set project_start_date = '2025-02-20' %} | ||
| {% set project_start_date = '2026-03-06' %} |
There was a problem hiding this comment.
Don't narrow the legacy swaps model to March 2026.
This staging model still carries the pre-new-event buy/sell history. Moving project_start_date to 2026-03-06 means any full refresh or first-time build will miss swaps from 2025-02-20 through 2026-03-05, and pumpswap_solana_int_all_swaps cannot reconstruct them because the new-event model only covers the new discriminator path.
Suggested fix
-{% set project_start_date = '2026-03-06' %}
+{% set project_start_date = '2025-02-20' %}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@dbt_subprojects/solana/models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_swaps.sql`
at line 14, The project_start_date was narrowed to '2026-03-06', which will drop
legacy swaps (2025-02-20 through 2026-03-05); revert project_start_date in
pumpswap_solana_stg_decoded_swaps (the project_start_date variable) to the
previous/original value that covers 2025-02-20 (or set it to an earlier
universal start) so that pumpswap_solana_int_all_swaps can reconstruct full
history, or remove the artificial restriction entirely so legacy swaps are
preserved.
…book into pumpswap-new-event
…wap_with_transfers - Split swaps_with_transfers into UNION of old and new events - Old events: use INNER JOIN with transfers (preserves existing behavior) - New events: bypass transfer join, use decoded quote_token_amount directly
…book into pumpswap-new-event
Thank you for contributing to Spellbook 🪄
Please open the PR in draft and mark as ready when you want to request a review.
Description:
Added support for a new PumpFun AMM buy event type that was previously not part of pumpdotfun-solana schema.
Changes
Added third UNION topumpswap_solana_stg_decoded_swaps.sqlto include new buy eventDecodes instruction with discriminator0xe445a52e51cb9a1d67f4521f2cf57777directly fromsolana.instruction_calls0xc62e1552b4d9e870) to extract pool token account addresses fromaccount_argumentsneeded the new eventDetails
New event is 447 bytes different from the first buy event which was 432 byte in size with specific byte-offset mapping-~~ Executes on
pAMMBay6oceH9fJKBRHGP5D4bD4sWpmSwMn52FMfXEA(PumpFun AMM)~~Follows same schema as existing buy/sell events for consistencyNew Changes
Architecture:
stg_decoded_newevents.sql- separate staging model for new event decodingint_all_swaps.sql- intermediate model that unions existing and new event staging modelsint_all_swapsinstead ofstg_decoded_swapsTechnical Improvements:
outer_instruction_indexandblock_dateon newevent model to avoid N:N matches recomended by the botROW_NUMBER()deduplication to handle potential duplicate rows from joinFiles Changed
models/pumpswap/solana/staging/stg_decoded_newevents.sql(new)models/pumpswap/solana/intermediate/int_all_swaps.sql(new)latest Modified Files:
The new events have a
quote_amountcolumn built into the decoded source, while old events need to fetch it from the transfers table.models/_sector/dex/pumpswap/staging/pumpswap_solana_stg_decoded_swaps.sqlquote_amountcolumn (NULL)models/_sector/dex/pumpswap/pumpswap_solana_base_trades_backfill.sqlstg_decoded_swapstoint_all_swapsswapsCTE to includequote_amountThis way:
Old events have quote_amount = NULL and gets the quote_amount from transfers source
New events have quote_amount populated and skip the transfer join
Validated the modified models with recent data (2026-03-06+) instead of (2025-02-20) just for testing purpose which will be changed
Here's also a reference dune query showing the entire decoded instruction data and txns for the new pump fun: AMM buy event:
https://dune.com/queries/6777900/