Skip to content

bench: decrease StdDev in benchmarks to get morre reliable numbers#10641

Merged
smartprogrammer93 merged 13 commits intomasterfrom
perf/increase-benchmark-iterations
Feb 27, 2026
Merged

bench: decrease StdDev in benchmarks to get morre reliable numbers#10641
smartprogrammer93 merged 13 commits intomasterfrom
perf/increase-benchmark-iterations

Conversation

@smartprogrammer93
Copy link
Contributor

@smartprogrammer93 smartprogrammer93 commented Feb 25, 2026

Changes

  • Rewrite BlockProcessingBenchmark to use a single shared world state and BranchProcessor instead of a pre-built state pool, matching the live client's block processing path
  • Add OperationsPerInvoke = N loop (N=5000) to all 9 benchmark methods so BDN divides total time by N — reported times remain per-operation but iteration time stays above 100ms, eliminating MinIterationTime warnings for all scenarios including EmptyBlock and SingleTransfer
  • Tune BDN job config: 2 launches, 2 warmup, 10 iterations (20 data points total) with GcForce, InvocationCount=1, UnrollFactor=1 — keeps total runtime ~15 min
  • Add statistical columns: Min, Max, Median, P90, P95

Benchmark results (latest run)

Method Mean Error StdDev Median Ratio
EmptyBlock 44.33 us 16.646 us 45.567 us 23.81 us 0.05
SingleTransfer 46.76 us 0.796 us 2.297 us 46.07 us 0.06
Transfers_50 262.86 us 2.863 us 8.076 us 261.04 us 0.31
Transfers_200 840.06 us 4.999 us 14.425 us 838.20 us 1.00
Eip1559_200 826.78 us 6.664 us 18.905 us 824.67 us 0.98
AccessList_50 303.45 us 1.860 us 5.308 us 303.07 us 0.36
ContractDeploy_10 406.32 us 22.942 us 66.558 us 408.68 us 0.48
ContractCall_200 868.79 us 8.603 us 24.406 us 864.08 us 1.03
MixedBlock 862.99 us 9.609 us 27.571 us 864.87 us 1.03

Most benchmarks show Error/Mean under 2%.

Types of changes

  • Optimization

Testing

Run the benchmark suite locally:

dotnet run -c Release --project src/Nethermind/Nethermind.Evm.Benchmark/Nethermind.Evm.Benchmark.csproj \
  -- --filter "*BlockProcessing*"

Verify no MinIterationTime warnings and that StdDev/Mean is low.

Documentation

  • No documentation update required
  • No release notes required

Increase from MediumRun (30 data points: 2 launches x 15 iterations)
to 500 data points (5 launches x 100 iterations, 20 warmup each)
for improved statistical confidence on noisy environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the evm label Feb 25, 2026
@smartprogrammer93 smartprogrammer93 requested a review from a team February 25, 2026 12:47
@smartprogrammer93 smartprogrammer93 changed the title bench: increase BlockProcessingBenchmark to 500 data points bench: increase BlockProcessingBenchmark to 500 data points and add CI workflow Feb 25, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

Block Processing Benchmark Comparison

Run: View workflow run
Base: f2b17a59 | Head: 37930a05

Method Base (us) PR (us) Delta Base CV PR CV Alloc Base Alloc PR Alloc Delta
AccessList_50 1,770.6 1,045.2 -41.0% ⬇️ 12.5% 0.7% 1,103.9 KB 95.9 KB -91.3%
ContractCall_200 3,791.8 2,115.4 -44.2% ⬇️ 12.3% 0.5% 2,014.6 KB 399.8 KB -80.2%
ContractDeploy_10 1,182.3 656.1 -44.5% ⬇️ 13.8% 0.8% 1,516.9 KB 64.8 KB -95.7%
Eip1559_200 5,847.2 2,117.6 -63.8% ⬇️ 29.0% 0.6% 1,516.6 KB 381.7 KB -74.8%
EmptyBlock 235.7 48.0 -79.6% ⬇️ 11.8% 44.8% 13.4 KB 5.5 KB -59.1%
MixedBlock 2,534.9 2,221.8 -12.4% ⬇️ 22.9% 0.4% 1,396.6 KB 394.4 KB -71.8%
SingleTransfer 430.3 165.2 -61.6% ⬇️ 11.1% 1.0% 25.1 KB 16.4 KB -34.6%
Transfers_200 7,792.5 2,126.3 -72.7% ⬇️ 10.2% 0.4% 1,536.9 KB 381.7 KB -75.2%
Transfers_50 3,282.9 958.5 -70.8% ⬇️ 19.7% 0.7% 1,100.2 KB 79.3 KB -92.8%
Detailed statistics
Method Metric Base PR Delta
AccessList_50 Mean 1,770.6 us 1,045.2 us -41.0%
AccessList_50 Median 1,674.8 us 1,044.4 us -37.6%
AccessList_50 P90 2,007.3 us 1,056.3 us -47.4%
AccessList_50 P95 2,138.4 us 1,057.9 us -50.5%
AccessList_50 Min 1,516.7 us 1,032.4 us -31.9%
AccessList_50 Max 2,528.5 us 1,058.9 us -58.1%
AccessList_50 StdDev 221.7 us 7.1 us -96.8%
ContractCall_200 Mean 3,791.8 us 2,115.4 us -44.2%
ContractCall_200 Median 3,971.4 us 2,117.2 us -46.7%
ContractCall_200 P90 4,284.7 us 2,125.4 us -50.4%
ContractCall_200 P95 4,324.0 us 2,127.7 us -50.8%
ContractCall_200 Min 2,945.8 us 2,094.0 us -28.9%
ContractCall_200 Max 4,350.9 us 2,133.5 us -51.0%
ContractCall_200 StdDev 465.4 us 10.5 us -97.7%
ContractDeploy_10 Mean 1,182.3 us 656.1 us -44.5%
ContractDeploy_10 Median 1,203.9 us 654.0 us -45.7%
ContractDeploy_10 P90 1,384.8 us 662.2 us -52.2%
ContractDeploy_10 P95 1,483.8 us 664.1 us -55.2%
ContractDeploy_10 Min 938.3 us 650.2 us -30.7%
ContractDeploy_10 Max 1,556.1 us 668.4 us -57.0%
ContractDeploy_10 StdDev 163.2 us 5.1 us -96.9%
Eip1559_200 Mean 5,847.2 us 2,117.6 us -63.8%
Eip1559_200 Median 5,725.4 us 2,118.2 us -63.0%
Eip1559_200 P90 7,960.0 us 2,131.4 us -73.2%
Eip1559_200 P95 8,342.0 us 2,132.2 us -74.4%
Eip1559_200 Min 3,981.3 us 2,098.1 us -47.3%
Eip1559_200 Max 9,118.4 us 2,134.2 us -76.6%
Eip1559_200 StdDev 1,698.4 us 12.7 us -99.3%
EmptyBlock Mean 235.7 us 48.0 us -79.6%
EmptyBlock Median 238.4 us 65.0 us -72.8%
EmptyBlock P90 276.6 us 67.7 us -75.5%
EmptyBlock P95 280.7 us 68.1 us -75.7%
EmptyBlock Min 176.3 us 19.2 us -89.1%
EmptyBlock Max 284.5 us 68.8 us -75.8%
EmptyBlock StdDev 27.8 us 21.5 us -22.8%
MixedBlock Mean 2,534.9 us 2,221.8 us -12.4%
MixedBlock Median 2,437.1 us 2,221.6 us -8.8%
MixedBlock P90 3,270.8 us 2,230.4 us -31.8%
MixedBlock P95 3,597.6 us 2,236.8 us -37.8%
MixedBlock Min 1,822.1 us 2,209.4 us +21.3%
MixedBlock Max 4,044.9 us 2,239.1 us -44.6%
MixedBlock StdDev 581.5 us 7.8 us -98.7%
SingleTransfer Mean 430.3 us 165.2 us -61.6%
SingleTransfer Median 417.4 us 165.2 us -60.4%
SingleTransfer P90 486.8 us 167.3 us -65.6%
SingleTransfer P95 511.9 us 167.5 us -67.3%
SingleTransfer Min 380.9 us 162.8 us -57.3%
SingleTransfer Max 604.9 us 168.0 us -72.2%
SingleTransfer StdDev 47.7 us 1.7 us -96.5%
Transfers_200 Mean 7,792.5 us 2,126.3 us -72.7%
Transfers_200 Median 7,736.8 us 2,125.0 us -72.5%
Transfers_200 P90 8,890.6 us 2,134.8 us -76.0%
Transfers_200 P95 9,250.6 us 2,137.0 us -76.9%
Transfers_200 Min 6,479.5 us 2,110.2 us -67.4%
Transfers_200 Max 9,905.3 us 2,148.9 us -78.3%
Transfers_200 StdDev 792.6 us 9.2 us -98.8%
Transfers_50 Mean 3,282.9 us 958.5 us -70.8%
Transfers_50 Median 3,177.3 us 958.6 us -69.8%
Transfers_50 P90 4,272.6 us 963.8 us -77.4%
Transfers_50 P95 4,355.8 us 966.3 us -77.8%
Transfers_50 Min 2,396.1 us 946.8 us -60.5%
Transfers_50 Max 4,935.7 us 975.7 us -80.2%
Transfers_50 StdDev 645.2 us 6.5 us -99.0%

Comment on lines +71 to +74
.WithUnrollFactor(1)
.WithLaunchCount(5)
.WithWarmupCount(20)
.WithIterationCount(100));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are default defaults?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Job.Default uses auto-tuned values (LaunchCount=1, WarmupCount=auto 6-50, IterationCount=auto 15-100). It does not matter here though - we override all three explicitly on the lines below (WithLaunchCount(10), WithWarmupCount(20), WithIterationCount(100)), so Job.Default is just a blank slate. We switched from Job.MediumRun because MediumRun bakes in LaunchCount=2 / WarmupCount=10 / IterationCount=15 which would silently conflict with our explicit values.

…0 data points

- Increase to 10 launches x 100 iterations (1000 data points, 20 warmup each)
- Pre-build state pool in GlobalSetup to remove allocation noise from
  measurement window (IterationSetup now just picks from pool)
- Pin process to single core via ProcessorAffinity to reduce scheduler jitter
- Enable GcForce to ensure GC collection between iterations
- Tiered JIT disabled note: not applicable with InProcessNoEmitToolchain,
  warmup iterations handle JIT tiering instead

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@smartprogrammer93 smartprogrammer93 force-pushed the perf/increase-benchmark-iterations branch from cb50c89 to d522e7e Compare February 25, 2026 13:14
@smartprogrammer93 smartprogrammer93 changed the title bench: increase BlockProcessingBenchmark to 500 data points and add CI workflow bench: increase BlockProcessingBenchmark to 1K data points and add CI workflow Feb 25, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each benchmark method now loops N=500 times with OperationsPerInvoke=N,
keeping iteration time above BDN's 100ms minimum to eliminate
MinIterationTime warnings and reduce measurement noise. Also reduces
warmup from 10 to 5 and iterations from 40 to 20.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@smartprogrammer93 smartprogrammer93 changed the title bench: increase BlockProcessingBenchmark to 1K data points and add CI workflow bench: decrease StdDev in benchmarks to get morre reliable numbers Feb 25, 2026
N=5000 clears BDN's 100ms MinIterationTime threshold for EmptyBlock
and SingleTransfer. Reduce to 2 launches, 2 warmup, 10 iterations
(20 data points) to keep total runtime ~15 min.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kamilchodola kamilchodola self-requested a review February 25, 2026 23:00
@github-actions
Copy link
Contributor

EXPB Benchmark Comparison

Run: View workflow run

superblocks

No cached master baseline for superblocks. A baseline will be created from the next successful master push run.

Metric PR
AVG (ms) 1481.563800
MEDIAN (ms) 1153.820000
P90 (ms) 2545.26
P95 (ms) 2695.31
P99 (ms) 3942.24
MIN (ms) 766.41
MAX (ms) 4442.00

realblocks

No cached master baseline for realblocks. A baseline will be created from the next successful master push run.

Metric PR
AVG (ms) 40.407100
MEDIAN (ms) 23.580000
P90 (ms) 67.24
P95 (ms) 166.90
P99 (ms) 269.03
MIN (ms) 1.43
MAX (ms) 780.29

@smartprogrammer93 smartprogrammer93 mentioned this pull request Feb 25, 2026
16 tasks
@smartprogrammer93 smartprogrammer93 merged commit 25b52ea into master Feb 27, 2026
174 of 176 checks passed
@smartprogrammer93 smartprogrammer93 deleted the perf/increase-benchmark-iterations branch February 27, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants