Skip to content

consensus: standardize slow block JSON output for cross-client metrics#10288

Open
CPerezz wants to merge 32 commits intoNethermindEth:masterfrom
CPerezz:feature/execution-metrics-standardization
Open

consensus: standardize slow block JSON output for cross-client metrics#10288
CPerezz wants to merge 32 commits intoNethermindEth:masterfrom
CPerezz:feature/execution-metrics-standardization

Conversation

@CPerezz
Copy link
Contributor

@CPerezz CPerezz commented Jan 21, 2026

Summary

Implement standardized JSON format for slow block logging to enable cross-client performance analysis and protocol research.

This change is part of the Cross-Client Execution Metrics initiative proposed by Gary Rong and CPerezz.

Motivation

Standardized execution metrics are critical for:

  • Cross-client performance comparison
  • Network health monitoring
  • Data-driven protocol research

Real-world example: The EIP-7907 analysis used execution metrics to measure code read latency, per-call overhead scaling, and block execution breakdown. Without standardized metrics across clients, such analysis cannot be validated cross-client.

References

Configuration

Two new flags in Blocks config section:

Flag Default Description
Blocks.SlowBlockThresholdMs -1 (disabled) Threshold in ms for logging slow blocks. Set to 1000 to log blocks slower than 1s. Set to 0 to log all blocks.
Blocks.SlowBlockPerTxThresholdMs -1 (disabled) Per-transaction threshold in ms. When enabled, individual transactions exceeding this threshold are included in the slow block JSON. Set to 0 to log all transactions.

Both flags default to -1 (disabled). When disabled, there is zero overhead on block processing — no metric captures, no delta computation, no JSON serialization.

When enabled, the overhead is ~31μs per block (~0.08% on a 40ms block) from reading thread-local counters. Per-tx timing adds ~6μs for a 150-tx block when enabled.

Example usage

# Log blocks slower than 1s with all transactions
--Blocks.SlowBlockThresholdMs=1000 --Blocks.SlowBlockPerTxThresholdMs=0

# Log blocks slower than 500ms, only transactions slower than 50ms
--Blocks.SlowBlockThresholdMs=500 --Blocks.SlowBlockPerTxThresholdMs=50

# Log every block, no per-tx breakdown
--Blocks.SlowBlockThresholdMs=0

JSON Output Format

{
  "level": "warn",
  "msg": "Slow block",
  "block": {
    "number": 22075123,
    "hash": "0xabc...",
    "gas_used": 29850000,
    "gas_limit": 30000000,
    "tx_count": 142,
    "blob_count": 6
  },
  "timing": {
    "execution_ms": 52.1,
    "evm_ms": 45.2,
    "blooms_ms": 1.3,
    "receipts_root_ms": 3.8,
    "commit_ms": 2.1,
    "storage_merkle_ms": 18.4,
    "state_root_ms": 5.2,
    "state_hash_ms": 23.6,
    "total_ms": 82.0
  },
  "throughput": {
    "mgas_per_sec": 364.1
  },
  "state_reads": {
    "accounts": 1842,
    "storage_slots": 3201,
    "code": 215,
    "code_bytes": 482310
  },
  "state_writes": {
    "accounts": 312,
    "accounts_deleted": 0,
    "storage_slots": 1805,
    "storage_slots_deleted": 12,
    "code": 3,
    "code_bytes": 8420,
    "eip7702_delegations_set": 0,
    "eip7702_delegations_cleared": 0
  },
  "cache": {
    "account": { "hits": 1520, "misses": 322, "hit_rate": 82.52 },
    "storage": { "hits": 2800, "misses": 401, "hit_rate": 87.48 },
    "code": { "hits": 180, "misses": 35, "hit_rate": 83.72 }
  },
  "evm": {
    "opcodes": 1542000,
    "sload": 3201,
    "sstore": 1805,
    "calls": 2412,
    "empty_calls": 48,
    "creates": 3,
    "self_destructs": 0,
    "contracts_analyzed": 118,
    "cached_contracts_used": 97
  },
  "transactions": [
    {
      "index": 42,
      "hash": "0xdef...",
      "gas_used": 8200000,
      "execution_ms": 12.3,
      "type": "Legacy",
      "to": "0x7a25..."
    },
    {
      "index": 107,
      "hash": "0x123...",
      "gas_used": 5100000,
      "execution_ms": 8.7,
      "type": "EIP1559",
      "to": "0xdead..."
    }
  ]
}

Timing field definitions

Field Definition
total_ms Wall-clock block processing time
execution_ms total - state_hash - commit (backwards-compatible)
evm_ms total - state_hash - commit - blooms - receipts_root (pure EVM)
blooms_ms Bloom filter calculation (parallel)
receipts_root_ms Receipts Patricia trie root computation
commit_ms State commit without root recalculation
storage_merkle_ms Storage trie merkleization (Commit(commitRoots: true))
state_root_ms State root recalculation + commit tree
state_hash_ms storage_merkle + state_root (total merkleization)

The transactions array is only present when SlowBlockPerTxThresholdMs >= 0.

Implementation

  • All metrics use ZeroContentionCounter (lock-free thread-local counters) — no locks, no contention
  • Timing uses Stopwatch.GetTimestamp() only at coarse block-phase boundaries (not per state read/write)
  • JSON is serialized via Utf8JsonWriter + ArrayBufferWriter<byte> on the ThreadPool, not the block-processing thread
  • Per-tx timing uses a thread-static array with one Stopwatch.GetTimestamp() per tx (~20ns)
  • Delta computation and JSON output are queued to ThreadPool via UnsafeQueueUserWorkItem

Implements execution metrics following the cross-client specification:
https://github.com/ethereum/execution-specs/blob/main/docs/execution-metrics-spec.md

- Add slow block JSON logging (threshold: 1000ms)
- Include timing, throughput, and EVM operation counts
- Leverage existing SLOAD/SSTORE/CALL metrics infrastructure
- Output format matches cross-client standardization spec
Implement standardized JSON format for slow block logging to enable
cross-client performance analysis and protocol research.

This change is part of the Cross-Client Execution Metrics initiative
proposed by Gary Rong and CPerezz: https://hackmd.io/dg7rizTyTXuCf2LSa2LsyQ

The standardized metrics enabled data-driven analysis like the EIP-7907
research: https://ethresear.ch/t/data-driven-analysis-on-eip-7907/23850

JSON format includes:
- block: number, hash, gas_used, tx_count
- timing: execution_ms, total_ms
- throughput: mgas_per_sec
- state_reads: accounts, storage_slots, code, code_bytes
- state_writes: accounts, storage_slots
- cache: account/storage/code hits, misses, hit_rate
- evm: sload, sstore, calls, creates

Also adds ThreadLocal accessors to Db.Metrics for per-block cache
statistics tracking without global synchronization overhead.
@CPerezz CPerezz force-pushed the feature/execution-metrics-standardization branch from b6dfcec to 8cdef27 Compare January 21, 2026 11:01
Convert timing calculations from long to double to preserve
sub-millisecond precision in slow block JSON output.

Changes:
- ProcessingStats: timing variables changed to double with float division
- Math.Round(..., 3) for consistent 3 decimal place output
- ZeroContentionCounter: add Increment(long) overload for type safety
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements standardized JSON format for slow block logging to enable cross-client performance analysis and protocol research. The changes add comprehensive execution metrics tracking and output them in a cross-client standardized JSON format when block processing exceeds a threshold.

Changes:

  • Added ThreadLocal accessor methods to Db.Metrics for per-block cache statistics
  • Defined new state access and timing metrics in Evm.Metrics (account/storage/code reads and writes)
  • Integrated metric tracking into StateProvider and PersistentStorageProvider for state access monitoring
  • Updated ProcessingStats to capture metrics and output slow block logs in standardized JSON format

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/Nethermind/Nethermind.State/StateProvider.cs Adds account read tracking via EvmMetrics.IncrementAccountReads()
src/Nethermind/Nethermind.State/PersistentStorageProvider.cs Adds storage read tracking via EvmMetrics.IncrementStorageReads()
src/Nethermind/Nethermind.Evm/Metrics.cs Defines new state access and timing metrics with ThreadLocal accessors
src/Nethermind/Nethermind.Evm/CodeInfoRepository.cs Adds code read and code bytes tracking
src/Nethermind/Nethermind.Db/Metrics.cs Adds ThreadLocal accessors for cache hit metrics
src/Nethermind/Nethermind.Core/Threading/ZeroContentionCounter.cs Adds long value support for metrics tracking
src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs Implements slow block JSON logging with comprehensive metrics capture

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add tracking for EIP-7702 delegation set/cleared operations as part of
the cross-client execution metrics standardization effort.

New metrics in Nethermind.Evm.Metrics:
- Eip7702DelegationsSet: Number of EIP-7702 delegations set
- Eip7702DelegationsCleared: Number of EIP-7702 delegations cleared

Both metrics include thread-local variants for use in slow block
logging. The slow block JSON output now includes:
- state_writes.eip7702_delegations_set
- state_writes.eip7702_delegations_cleared

These fields will be 0 for pre-Pectra blocks (per spec).
ContractsAnalysed tracks jump destination analysis, not cache misses.
CodeReads is incremented when code is loaded from DB (cache miss).
Instead of duplicating total_ms, calculate:
execution_ms = total_ms - state_read_ms - state_hash_ms - commit_ms

Falls back to total_ms if timing metrics not fully captured.
- Add IncrementCodeWrites/IncrementCodeBytesWritten in InsertCode
- Add IncrementAccountWrites in SetState
- Add IncrementStorageWrites in Set
Move IncrementAccountReads inside GetState to only count trie/DB reads,
matching storage read semantics. Previously counted all reads including
in-memory cache hits.
…metrics

Pipe timing data to EvmMetrics for state_read_ms, state_hash_ms, and commit_ms:

- WorldStateMetricsDecorator: pipe existing timing to EvmMetrics for
  RecalculateStateRoot, Commit (both overloads), and CommitTree
- StateProvider.GetState: add timing around DB reads (not cache hits)
- PersistentStorageProvider.LoadFromTree: add timing for storage reads

Uses Stopwatch.GetTimestamp()/GetElapsedTime() with TimeSpan.Ticks
(100ns precision) for minimal overhead (~10-20ns per measurement).
@CPerezz CPerezz force-pushed the feature/execution-metrics-standardization branch from 147cd96 to cd54c85 Compare January 22, 2026 15:14
@CPerezz CPerezz force-pushed the feature/execution-metrics-standardization branch from cd54c85 to 49ba4db Compare January 25, 2026 21:27
…tandardization

Add new metrics for tracking account and storage deletions during EVM execution:
- AccountDeleted: incremented when SetState is called with null account
- StorageDeleted: incremented when SSTORE sets a slot to zero
- Both metrics flow through ProcessingStats to slow block JSON output
Wire up existing but unused EIP-7702 delegation metric increment methods:
- IncrementEip7702DelegationsSet() when setting delegation (non-zero address)
- IncrementEip7702DelegationsCleared() when clearing delegation (zero address)
Add comprehensive integration tests for all cross-client execution metrics:
- EIP-7702 delegation tests (set, clear, multiple)
- Account metrics tests (reads, writes, deleted)
- Storage metrics tests (reads, writes, deleted)
- Code metrics tests (loaded, updated)
Copy link
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added some questions.

Food for thought @benaadams
With so many metrics, maybe it would be useful to have them continuous in memory, so we could make copies by copying memory region and potentially delta's using SIMD?

@benaadams
Copy link
Member

benaadams commented Jan 27, 2026

Number of additional thread locals bothers me; probably better to have as normal statics but only write on main block (i.e. are these metrics relevent to any other background or parallel activities?)

They are only interesting if mainnet block is slow?

Also they are expensive to write to, so rolling up counts via class field, and then writing batch change would be better

@LukaszRozmej
Copy link
Member

Number of additional thread locals bothers me; probably better to have as normal statics but only write on main block (i.e. are these metrics relevent to any other background or parallel activities?)

They are only interesting if mainnet block is slow?

Also they are expensive to write to, so rolling up counts via class field, and then writing batch change would be better

I'm kind of thinking it shouldn't be a static, or the only threadstatic should be an object (struct?). That would require bigger changes.

Dead code — the MetricsSnapshot and MetricsDelta classes were never
referenced anywhere in the codebase.
private void DeployCode(Address address, byte[] code)
{
_stateProvider.CreateAccountIfNotExists(address, 0);
_stateProvider.InsertCode(address, code, _specProvider.GetSpec(ForkActivation.TimestampOnly(MainnetSpecProvider.PragueBlockTimestamp)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can user just Prague.Instance or so instead of getting it from the spec provider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but this will need to be maintained always. In fact, we would have to use Osaka here to be sure we are fwd. compatible no?

I updated anyways, but I'm not fully convinced..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just so using mainnet timestamps for single spec provider does not look perfect to me. Btw you can init a _releaseSpec some where with value like Prague.Instance or Osaka.Instance and reuse it.

If you mean that you would better use just the top mainnet spec, we might try to introduce Mainnet.Instance that points to the most recent fork and we change it when time comes. Can't recall of something existing like this

public void Set(in StorageCell storageCell, byte[] newValue)
{
// Track storage write for execution metrics
EvmMetrics.IncrementStorageWrites();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benaadams @LukaszRozmej maybe we can make some generic argument trick to avoid calling metric if don't need to(if MetricsOff is passed metrics are excluded due to unconditional if)?

…cumulator

Replace 16 separate ZeroContentionCounter instances (each with its own
ThreadLocal<BoxedLong>) with a single ThreadLocal<ExecutionMetricsAccumulator>.
This addresses Ben's review feedback about the number of additional thread
locals on hot execution paths.

The consolidated accumulator reduces ThreadLocal overhead by requiring only
one TLS lookup per increment instead of one per counter. Prometheus exposure
is preserved via SumExecutionMetric() which iterates the shared ThreadLocal's
Values collection. Call sites now use direct field increments
(e.g. ThreadExecutionMetrics.AccountReads++) instead of method calls.
@CPerezz
Copy link
Contributor Author

CPerezz commented Jan 27, 2026

@LukaszRozmej @benaadams Can yoou look at ead23b9 and see if that cuts it for you?

…simplify test spec

- Replace anonymous object in LogSlowBlock with named record types for
  structured JSON serialization (SlowBlockLog, SlowBlockInfo, etc.)
- Simplify spec retrieval in MetricsIntegrationTests to use Prague.Instance
  directly instead of resolving via MainnetSpecProvider
@LukaszRozmej
Copy link
Member

@CPerezz I think it is fine for now, but it will break when parallel execution comes in. Regardless if it is Block-STM or BAL based.

@asdacap
Copy link
Contributor

asdacap commented Jan 30, 2026

Maybe this can help. https://github.com/NethermindEth/nethermind/blob/master/src/Nethermind/Nethermind.Init/Modules/MainProcessingContext.cs#L39. There is a HintGet for get that is cached by prewarmer.

@CPerezz
Copy link
Contributor Author

CPerezz commented Jan 30, 2026

@CPerezz I think it is fine for now, but it will break when parallel execution comes in. Regardless if it is Block-STM or BAL based.

Hey @LukaszRozmej thanks! Indeed the plan is to also start metering based on BALs via: https://notes.ethereum.org/@ethpandaops/bal-otel

@CPerezz
Copy link
Contributor Author

CPerezz commented Feb 9, 2026

@benaadams @LukaszRozmej any way we can merge this this week?

So far Reth and Besu are almost done, so you're the last team we are waiting on such that the stateful tests come with more insights.

LMK if you need any help 🙏

…flicts with master

- Replace ExecutionMetricsAccumulator with ZeroContentionCounter pattern
- Remove Stopwatch.GetTimestamp() from per-read hot paths (StateProvider, PersistentStorageProvider)
- Use Utf8JsonWriter for zero-allocation slow block JSON serialization
- Collapse BlockData Start/Current pairs into pre-computed deltas
- Adapt to master's refactored CodeInfoRepository, WrappedWorldState, IProcessingStats
- Condense tests from ~1600 to ~530 lines
…block log

Add per-phase timing (blooms_ms, receipts_root_ms, storage_merkle_ms,
state_root_ms, evm_ms) and extended EVM/block fields (opcodes,
self_destructs, empty_calls, contracts_analyzed, cached_contracts_used,
blob_count, gas_limit) to the slow block JSON output.

execution_ms retains its original definition for backwards compatibility.
All new timing is captured via the same ZeroContentionCounter pattern
used by existing metrics — ~25μs overhead per block.
@kamilchodola kamilchodola force-pushed the feature/execution-metrics-standardization branch from 8be8415 to 8d7cb78 Compare March 20, 2026 16:38
Wire slow block threshold to IBlocksConfig so it can be set via config
file or CLI. Default: 1000ms (log blocks slower than 1s).
Set to 0 to log all blocks, -1 to disable slow block logging entirely.

When disabled (-1), CaptureStartStats and delta computation are skipped
on the block-processing thread — zero overhead from slow block analysis.
@kamilchodola kamilchodola force-pushed the feature/execution-metrics-standardization branch from 5d6022c to ae494cf Compare March 20, 2026 16:49
@kamilchodola kamilchodola requested a review from rubo as a code owner March 20, 2026 16:49
Slow block logging is opt-in. Set Blocks.SlowBlockThresholdMs=1000
to enable logging blocks slower than 1s.
New config flag Blocks.SlowBlockPerTxThresholdMs (default -1 = disabled).
When enabled alongside SlowBlockThresholdMs, each transaction exceeding
the per-tx threshold is included individually in the slow block JSON:

  "transactions": [
    { "index": 42, "hash": "0x...", "gas_used": 8200000,
      "execution_ms": 12.3, "type": "Legacy", "to": "0x..." }
  ]

Uses a thread-static PerTxTimingCollector with one Stopwatch.GetTimestamp
per tx (~20ns). When disabled, zero overhead — the collector is not even
checked in the tx loop.
When SlowBlockThresholdMs defaults to -1, processingMs >= -1 was always
true causing every block to be logged. But CaptureStartStats/UpdateStats
correctly skipped delta capture for -1, resulting in all-zero fields.
The async processing loop (RunProcessingLoop) can resume on a different
ThreadPool thread after each await WaitToReadAsync. Since _enabled is
[ThreadStatic], setting it once in Start() only affects the first thread.
Moving it to CaptureStartStats ensures it's set on the current block-
processing thread for every block.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants