consensus: standardize slow block JSON output for cross-client metrics#10288
consensus: standardize slow block JSON output for cross-client metrics#10288CPerezz wants to merge 32 commits intoNethermindEth:masterfrom
Conversation
Implements execution metrics following the cross-client specification: https://github.com/ethereum/execution-specs/blob/main/docs/execution-metrics-spec.md - Add slow block JSON logging (threshold: 1000ms) - Include timing, throughput, and EVM operation counts - Leverage existing SLOAD/SSTORE/CALL metrics infrastructure - Output format matches cross-client standardization spec
Implement standardized JSON format for slow block logging to enable cross-client performance analysis and protocol research. This change is part of the Cross-Client Execution Metrics initiative proposed by Gary Rong and CPerezz: https://hackmd.io/dg7rizTyTXuCf2LSa2LsyQ The standardized metrics enabled data-driven analysis like the EIP-7907 research: https://ethresear.ch/t/data-driven-analysis-on-eip-7907/23850 JSON format includes: - block: number, hash, gas_used, tx_count - timing: execution_ms, total_ms - throughput: mgas_per_sec - state_reads: accounts, storage_slots, code, code_bytes - state_writes: accounts, storage_slots - cache: account/storage/code hits, misses, hit_rate - evm: sload, sstore, calls, creates Also adds ThreadLocal accessors to Db.Metrics for per-block cache statistics tracking without global synchronization overhead.
b6dfcec to
8cdef27
Compare
Convert timing calculations from long to double to preserve sub-millisecond precision in slow block JSON output. Changes: - ProcessingStats: timing variables changed to double with float division - Math.Round(..., 3) for consistent 3 decimal place output - ZeroContentionCounter: add Increment(long) overload for type safety
There was a problem hiding this comment.
Pull request overview
This pull request implements standardized JSON format for slow block logging to enable cross-client performance analysis and protocol research. The changes add comprehensive execution metrics tracking and output them in a cross-client standardized JSON format when block processing exceeds a threshold.
Changes:
- Added ThreadLocal accessor methods to Db.Metrics for per-block cache statistics
- Defined new state access and timing metrics in Evm.Metrics (account/storage/code reads and writes)
- Integrated metric tracking into StateProvider and PersistentStorageProvider for state access monitoring
- Updated ProcessingStats to capture metrics and output slow block logs in standardized JSON format
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Nethermind/Nethermind.State/StateProvider.cs | Adds account read tracking via EvmMetrics.IncrementAccountReads() |
| src/Nethermind/Nethermind.State/PersistentStorageProvider.cs | Adds storage read tracking via EvmMetrics.IncrementStorageReads() |
| src/Nethermind/Nethermind.Evm/Metrics.cs | Defines new state access and timing metrics with ThreadLocal accessors |
| src/Nethermind/Nethermind.Evm/CodeInfoRepository.cs | Adds code read and code bytes tracking |
| src/Nethermind/Nethermind.Db/Metrics.cs | Adds ThreadLocal accessors for cache hit metrics |
| src/Nethermind/Nethermind.Core/Threading/ZeroContentionCounter.cs | Adds long value support for metrics tracking |
| src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs | Implements slow block JSON logging with comprehensive metrics capture |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs
Outdated
Show resolved
Hide resolved
src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs
Outdated
Show resolved
Hide resolved
src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs
Outdated
Show resolved
Hide resolved
src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs
Outdated
Show resolved
Hide resolved
Add tracking for EIP-7702 delegation set/cleared operations as part of the cross-client execution metrics standardization effort. New metrics in Nethermind.Evm.Metrics: - Eip7702DelegationsSet: Number of EIP-7702 delegations set - Eip7702DelegationsCleared: Number of EIP-7702 delegations cleared Both metrics include thread-local variants for use in slow block logging. The slow block JSON output now includes: - state_writes.eip7702_delegations_set - state_writes.eip7702_delegations_cleared These fields will be 0 for pre-Pectra blocks (per spec).
ContractsAnalysed tracks jump destination analysis, not cache misses. CodeReads is incremented when code is loaded from DB (cache miss).
Instead of duplicating total_ms, calculate: execution_ms = total_ms - state_read_ms - state_hash_ms - commit_ms Falls back to total_ms if timing metrics not fully captured.
- Add IncrementCodeWrites/IncrementCodeBytesWritten in InsertCode - Add IncrementAccountWrites in SetState - Add IncrementStorageWrites in Set
Move IncrementAccountReads inside GetState to only count trie/DB reads, matching storage read semantics. Previously counted all reads including in-memory cache hits.
…metrics Pipe timing data to EvmMetrics for state_read_ms, state_hash_ms, and commit_ms: - WorldStateMetricsDecorator: pipe existing timing to EvmMetrics for RecalculateStateRoot, Commit (both overloads), and CommitTree - StateProvider.GetState: add timing around DB reads (not cache hits) - PersistentStorageProvider.LoadFromTree: add timing for storage reads Uses Stopwatch.GetTimestamp()/GetElapsedTime() with TimeSpan.Ticks (100ns precision) for minimal overhead (~10-20ns per measurement).
147cd96 to
cd54c85
Compare
cd54c85 to
49ba4db
Compare
…tandardization Add new metrics for tracking account and storage deletions during EVM execution: - AccountDeleted: incremented when SetState is called with null account - StorageDeleted: incremented when SSTORE sets a slot to zero - Both metrics flow through ProcessingStats to slow block JSON output
Wire up existing but unused EIP-7702 delegation metric increment methods: - IncrementEip7702DelegationsSet() when setting delegation (non-zero address) - IncrementEip7702DelegationsCleared() when clearing delegation (zero address)
Add comprehensive integration tests for all cross-client execution metrics: - EIP-7702 delegation tests (set, clear, multiple) - Account metrics tests (reads, writes, deleted) - Storage metrics tests (reads, writes, deleted) - Code metrics tests (loaded, updated)
LukaszRozmej
left a comment
There was a problem hiding this comment.
Looks good, added some questions.
Food for thought @benaadams
With so many metrics, maybe it would be useful to have them continuous in memory, so we could make copies by copying memory region and potentially delta's using SIMD?
|
Number of additional thread locals bothers me; probably better to have as normal statics but only write on main block (i.e. are these metrics relevent to any other background or parallel activities?) They are only interesting if mainnet block is slow? Also they are expensive to write to, so rolling up counts via class field, and then writing batch change would be better |
I'm kind of thinking it shouldn't be a static, or the only threadstatic should be an object (struct?). That would require bigger changes. |
Dead code — the MetricsSnapshot and MetricsDelta classes were never referenced anywhere in the codebase.
src/Nethermind/Nethermind.Consensus/Processing/ProcessingStats.cs
Outdated
Show resolved
Hide resolved
| private void DeployCode(Address address, byte[] code) | ||
| { | ||
| _stateProvider.CreateAccountIfNotExists(address, 0); | ||
| _stateProvider.InsertCode(address, code, _specProvider.GetSpec(ForkActivation.TimestampOnly(MainnetSpecProvider.PragueBlockTimestamp))); |
There was a problem hiding this comment.
Can user just Prague.Instance or so instead of getting it from the spec provider
There was a problem hiding this comment.
Yes but this will need to be maintained always. In fact, we would have to use Osaka here to be sure we are fwd. compatible no?
I updated anyways, but I'm not fully convinced..
There was a problem hiding this comment.
It's just so using mainnet timestamps for single spec provider does not look perfect to me. Btw you can init a _releaseSpec some where with value like Prague.Instance or Osaka.Instance and reuse it.
If you mean that you would better use just the top mainnet spec, we might try to introduce Mainnet.Instance that points to the most recent fork and we change it when time comes. Can't recall of something existing like this
| public void Set(in StorageCell storageCell, byte[] newValue) | ||
| { | ||
| // Track storage write for execution metrics | ||
| EvmMetrics.IncrementStorageWrites(); |
There was a problem hiding this comment.
@benaadams @LukaszRozmej maybe we can make some generic argument trick to avoid calling metric if don't need to(if MetricsOff is passed metrics are excluded due to unconditional if)?
…cumulator Replace 16 separate ZeroContentionCounter instances (each with its own ThreadLocal<BoxedLong>) with a single ThreadLocal<ExecutionMetricsAccumulator>. This addresses Ben's review feedback about the number of additional thread locals on hot execution paths. The consolidated accumulator reduces ThreadLocal overhead by requiring only one TLS lookup per increment instead of one per counter. Prometheus exposure is preserved via SumExecutionMetric() which iterates the shared ThreadLocal's Values collection. Call sites now use direct field increments (e.g. ThreadExecutionMetrics.AccountReads++) instead of method calls.
|
@LukaszRozmej @benaadams Can yoou look at ead23b9 and see if that cuts it for you? |
…simplify test spec - Replace anonymous object in LogSlowBlock with named record types for structured JSON serialization (SlowBlockLog, SlowBlockInfo, etc.) - Simplify spec retrieval in MetricsIntegrationTests to use Prague.Instance directly instead of resolving via MainnetSpecProvider
|
@CPerezz I think it is fine for now, but it will break when parallel execution comes in. Regardless if it is Block-STM or BAL based. |
|
Maybe this can help. https://github.com/NethermindEth/nethermind/blob/master/src/Nethermind/Nethermind.Init/Modules/MainProcessingContext.cs#L39. There is a |
Hey @LukaszRozmej thanks! Indeed the plan is to also start metering based on BALs via: https://notes.ethereum.org/@ethpandaops/bal-otel |
|
@benaadams @LukaszRozmej any way we can merge this this week? So far Reth and Besu are almost done, so you're the last team we are waiting on such that the stateful tests come with more insights. LMK if you need any help 🙏 |
…flicts with master - Replace ExecutionMetricsAccumulator with ZeroContentionCounter pattern - Remove Stopwatch.GetTimestamp() from per-read hot paths (StateProvider, PersistentStorageProvider) - Use Utf8JsonWriter for zero-allocation slow block JSON serialization - Collapse BlockData Start/Current pairs into pre-computed deltas - Adapt to master's refactored CodeInfoRepository, WrappedWorldState, IProcessingStats - Condense tests from ~1600 to ~530 lines
Address PR review feedback from flcl42.
…block log Add per-phase timing (blooms_ms, receipts_root_ms, storage_merkle_ms, state_root_ms, evm_ms) and extended EVM/block fields (opcodes, self_destructs, empty_calls, contracts_analyzed, cached_contracts_used, blob_count, gas_limit) to the slow block JSON output. execution_ms retains its original definition for backwards compatibility. All new timing is captured via the same ZeroContentionCounter pattern used by existing metrics — ~25μs overhead per block.
8be8415 to
8d7cb78
Compare
Wire slow block threshold to IBlocksConfig so it can be set via config file or CLI. Default: 1000ms (log blocks slower than 1s). Set to 0 to log all blocks, -1 to disable slow block logging entirely. When disabled (-1), CaptureStartStats and delta computation are skipped on the block-processing thread — zero overhead from slow block analysis.
5d6022c to
ae494cf
Compare
Slow block logging is opt-in. Set Blocks.SlowBlockThresholdMs=1000 to enable logging blocks slower than 1s.
New config flag Blocks.SlowBlockPerTxThresholdMs (default -1 = disabled).
When enabled alongside SlowBlockThresholdMs, each transaction exceeding
the per-tx threshold is included individually in the slow block JSON:
"transactions": [
{ "index": 42, "hash": "0x...", "gas_used": 8200000,
"execution_ms": 12.3, "type": "Legacy", "to": "0x..." }
]
Uses a thread-static PerTxTimingCollector with one Stopwatch.GetTimestamp
per tx (~20ns). When disabled, zero overhead — the collector is not even
checked in the tx loop.
When SlowBlockThresholdMs defaults to -1, processingMs >= -1 was always true causing every block to be logged. But CaptureStartStats/UpdateStats correctly skipped delta capture for -1, resulting in all-zero fields.
The async processing loop (RunProcessingLoop) can resume on a different ThreadPool thread after each await WaitToReadAsync. Since _enabled is [ThreadStatic], setting it once in Start() only affects the first thread. Moving it to CaptureStartStats ensures it's set on the current block- processing thread for every block.
Summary
Implement standardized JSON format for slow block logging to enable cross-client performance analysis and protocol research.
This change is part of the Cross-Client Execution Metrics initiative proposed by Gary Rong and CPerezz.
Motivation
Standardized execution metrics are critical for:
Real-world example: The EIP-7907 analysis used execution metrics to measure code read latency, per-call overhead scaling, and block execution breakdown. Without standardized metrics across clients, such analysis cannot be validated cross-client.
References
Configuration
Two new flags in
Blocksconfig section:Blocks.SlowBlockThresholdMs-1(disabled)1000to log blocks slower than 1s. Set to0to log all blocks.Blocks.SlowBlockPerTxThresholdMs-1(disabled)0to log all transactions.Both flags default to
-1(disabled). When disabled, there is zero overhead on block processing — no metric captures, no delta computation, no JSON serialization.When enabled, the overhead is ~31μs per block (~0.08% on a 40ms block) from reading thread-local counters. Per-tx timing adds ~6μs for a 150-tx block when enabled.
Example usage
JSON Output Format
{ "level": "warn", "msg": "Slow block", "block": { "number": 22075123, "hash": "0xabc...", "gas_used": 29850000, "gas_limit": 30000000, "tx_count": 142, "blob_count": 6 }, "timing": { "execution_ms": 52.1, "evm_ms": 45.2, "blooms_ms": 1.3, "receipts_root_ms": 3.8, "commit_ms": 2.1, "storage_merkle_ms": 18.4, "state_root_ms": 5.2, "state_hash_ms": 23.6, "total_ms": 82.0 }, "throughput": { "mgas_per_sec": 364.1 }, "state_reads": { "accounts": 1842, "storage_slots": 3201, "code": 215, "code_bytes": 482310 }, "state_writes": { "accounts": 312, "accounts_deleted": 0, "storage_slots": 1805, "storage_slots_deleted": 12, "code": 3, "code_bytes": 8420, "eip7702_delegations_set": 0, "eip7702_delegations_cleared": 0 }, "cache": { "account": { "hits": 1520, "misses": 322, "hit_rate": 82.52 }, "storage": { "hits": 2800, "misses": 401, "hit_rate": 87.48 }, "code": { "hits": 180, "misses": 35, "hit_rate": 83.72 } }, "evm": { "opcodes": 1542000, "sload": 3201, "sstore": 1805, "calls": 2412, "empty_calls": 48, "creates": 3, "self_destructs": 0, "contracts_analyzed": 118, "cached_contracts_used": 97 }, "transactions": [ { "index": 42, "hash": "0xdef...", "gas_used": 8200000, "execution_ms": 12.3, "type": "Legacy", "to": "0x7a25..." }, { "index": 107, "hash": "0x123...", "gas_used": 5100000, "execution_ms": 8.7, "type": "EIP1559", "to": "0xdead..." } ] }Timing field definitions
total_msexecution_mstotal - state_hash - commit(backwards-compatible)evm_mstotal - state_hash - commit - blooms - receipts_root(pure EVM)blooms_msreceipts_root_mscommit_msstorage_merkle_msCommit(commitRoots: true))state_root_msstate_hash_msstorage_merkle + state_root(total merkleization)The
transactionsarray is only present whenSlowBlockPerTxThresholdMs >= 0.Implementation
ZeroContentionCounter(lock-free thread-local counters) — no locks, no contentionStopwatch.GetTimestamp()only at coarse block-phase boundaries (not per state read/write)Utf8JsonWriter+ArrayBufferWriter<byte>on the ThreadPool, not the block-processing threadStopwatch.GetTimestamp()per tx (~20ns)UnsafeQueueUserWorkItem