Skip to content

perf: streaming TxHash and block-level scratch buffer for deserialization#106

Merged
icellan merged 2 commits intomasterfrom
perf/block-deser-and-txhash-optimization
Feb 27, 2026
Merged

perf: streaming TxHash and block-level scratch buffer for deserialization#106
icellan merged 2 commits intomasterfrom
perf/block-deser-and-txhash-optimization

Conversation

@icellan
Copy link
Copy Markdown
Contributor

@icellan icellan commented Feb 27, 2026

Summary

Two optimizations targeting block deserialization and transaction hashing, measured against a real 3.64 GB testnet block (28,672 txs, block 1681787).

1. Streaming TxHash (eliminates per-tx buffer allocation)

MsgTx.TxHash() previously serialized the entire transaction into a bytes.Buffer and then hashed the buffer. For large transactions (100+ KB), this allocated a buffer proportional to the tx size just to compute a hash.

Fix: Write transaction data directly to sha256.New() (which implements io.Writer) instead of an intermediate buffer. The double-SHA256 is computed as sha256(sha256(tx)) using a stack-allocated [32]byte for the intermediate hash.

Also adds a cachedHash *chainhash.Hash field to MsgTx that lazily caches the computed hash, invalidated by AddTxIn()/AddTxOut(). Applied the same streaming approach to MsgExtendedTx.TxHash().

Result: MsgTx.TxHash dropped from 3,758 MB to 0 MB in the allocation profile.

2. Block-level scratch buffer for deserialization (eliminates temporary script allocations)

During MsgTx.Bsvdecode, each script is read into a temporary buffer via scriptFreeList.Borrow(), then copied into a per-tx contiguous buffer. Scripts larger than 512 bytes bypass the pool and get a fresh make([]byte, size) allocation. For a 3.64 GB block, this produced ~3.7 GB of temporary allocations that were immediately discarded after copying.

Fix: Add bsvdecodeWithScratch() method that reads all scripts into a shared scratch buffer. The buffer is passed from MsgBlock.Bsvdecode and reused across all transactions:

  • Reset to len=0 for each tx (capacity preserved)
  • Grows only when a tx has more total script data than any previous tx
  • After reading, scripts are copied into the exact-size per-tx contiguous buffer (unchanged)
  • The scratch buffer's capacity stabilizes after the first few txs

Also pre-allocates MsgTx structs contiguously in MsgBlock.Bsvdecode (make([]MsgTx, txCount)) instead of individual stack-escape allocations per tx.

Single-tx Bsvdecode (non-block path) remains unchanged, using the existing script pool.

Result: Block deserialization allocations dropped from 7,514 MB to 3,790 MB (-49.6%). The remaining 3,716 MB is the irreducible per-tx contiguous script buffer (the actual script data that must live somewhere).

Test changes

  • msg_block_test.go: Added clearBlockTxCaches helper to clear cachedHash before reflect.DeepEqual comparisons
  • msg_tx_test.go: Clear cachedHash before reflect.DeepEqual comparisons
  • msg_extended_tx_test.go: No test changes needed (MsgExtendedTx has no cached hash field)

Test plan

  • All existing go-wire tests pass (go test -count=1 -short ./...)
  • Verified against 3.64 GB testnet block in teranode HandleBlockDirect test
  • TxHash correctness verified by existing TestTxTxHash and TestExtendedTxTxHash tests
  • Block serialization/deserialization roundtrip verified by existing TestBlockSerialize* tests

…tion

TxHash optimization:
- Write tx data directly to sha256.New() instead of serializing into
  an intermediate bytes.Buffer, eliminating a per-tx buffer allocation
  proportional to the transaction size.
- Cache computed hash in MsgTx.cachedHash field, invalidated by
  AddTxIn/AddTxOut. Applied to both MsgTx and MsgExtendedTx.

Block deserialization optimization:
- Add bsvdecodeWithScratch method that reads all scripts into a shared
  scratch buffer (reused across transactions) instead of allocating a
  fresh buffer per large script via scriptFreeList.Borrow.
- MsgBlock.Bsvdecode passes a shared scratch buffer to each tx,
  growing it only when a script larger than any previous one is
  encountered. After each tx, the buffer is reset (len=0) but
  capacity is preserved.
- Pre-allocate MsgTx structs contiguously in MsgBlock.Bsvdecode
  (one slice instead of per-tx heap allocations).

For a 3.64 GB testnet block (28,672 txs):
- Block deserialization: 7,514 MB → 3,790 MB (-49.6%)
- TxHash: 3,758 MB → 0 MB (eliminated from allocation profile)
@icellan icellan requested a review from mrz1836 as a code owner February 27, 2026 13:29
@github-actions github-actions bot added the size/L Large change (201–500 lines) label Feb 27, 2026
@github-actions github-actions bot added the performance Performance improvements or optimizations label Feb 27, 2026
Pre-declare scriptLen variable to avoid shadowing err from outer scope
in the output script reading loop.
@sonarqubecloud
Copy link
Copy Markdown

@icellan icellan merged commit 9aa279c into master Feb 27, 2026
44 checks passed
@github-actions github-actions bot deleted the perf/block-deser-and-txhash-optimization branch February 27, 2026 13:40
icellan added a commit to bsv-blockchain/teranode that referenced this pull request Feb 27, 2026
Replace subtreeData.Serialize() + storer.Write(bytes) with
subtreeData.WriteTransactionsToWriter(storer, 0, length), streaming
transaction data directly to the blob store FileStorer pipe instead
of serializing into a large intermediate byte slice.

The FileStorer already implements io.Writer via a pipe connected to
SetFromReader, and the file store guarantees atomic writes via
temp file + rename.

Add HandleBlockDirect memory profiling test for large testnet blocks.

Depends on:
- bsv-blockchain/go-bt#117 (SerializeTo)
- bsv-blockchain/go-subtree (streaming WriteTransactionsToWriter)
- bsv-blockchain/go-wire#106 (TxHash + block deserialization)
icellan added a commit to bsv-blockchain/teranode that referenced this pull request Feb 27, 2026
Replace subtreeData.Serialize() + storer.Write(bytes) with
subtreeData.WriteTransactionsToWriter(storer, 0, length), streaming
transaction data directly to the blob store FileStorer pipe instead
of serializing into a large intermediate byte slice.

The FileStorer already implements io.Writer via a pipe connected to
SetFromReader, and the file store guarantees atomic writes via
temp file + rename.

Add HandleBlockDirect memory profiling test for large testnet blocks.

Depends on:
- bsv-blockchain/go-bt#117 (SerializeTo)
- bsv-blockchain/go-subtree (streaming WriteTransactionsToWriter)
- bsv-blockchain/go-wire#106 (TxHash + block deserialization)
@icellan icellan restored the perf/block-deser-and-txhash-optimization branch February 27, 2026 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance improvements or optimizations size/L Large change (201–500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants