bench: decrease StdDev in benchmarks to get morre reliable numbers#10641
bench: decrease StdDev in benchmarks to get morre reliable numbers#10641smartprogrammer93 merged 13 commits intomasterfrom
Conversation
Increase from MediumRun (30 data points: 2 launches x 15 iterations) to 500 data points (5 launches x 100 iterations, 20 warmup each) for improved statistical confidence on noisy environments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Block Processing Benchmark ComparisonRun: View workflow run
Detailed statistics
|
| .WithUnrollFactor(1) | ||
| .WithLaunchCount(5) | ||
| .WithWarmupCount(20) | ||
| .WithIterationCount(100)); |
There was a problem hiding this comment.
what are default defaults?
There was a problem hiding this comment.
Job.Default uses auto-tuned values (LaunchCount=1, WarmupCount=auto 6-50, IterationCount=auto 15-100). It does not matter here though - we override all three explicitly on the lines below (WithLaunchCount(10), WithWarmupCount(20), WithIterationCount(100)), so Job.Default is just a blank slate. We switched from Job.MediumRun because MediumRun bakes in LaunchCount=2 / WarmupCount=10 / IterationCount=15 which would silently conflict with our explicit values.
…0 data points - Increase to 10 launches x 100 iterations (1000 data points, 20 warmup each) - Pre-build state pool in GlobalSetup to remove allocation noise from measurement window (IterationSetup now just picks from pool) - Pin process to single core via ProcessorAffinity to reduce scheduler jitter - Enable GcForce to ensure GC collection between iterations - Tiered JIT disabled note: not applicable with InProcessNoEmitToolchain, warmup iterations handle JIT tiering instead Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cb50c89 to
d522e7e
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each benchmark method now loops N=500 times with OperationsPerInvoke=N, keeping iteration time above BDN's 100ms minimum to eliminate MinIterationTime warnings and reduce measurement noise. Also reduces warmup from 10 to 5 and iterations from 40 to 20. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
N=5000 clears BDN's 100ms MinIterationTime threshold for EmptyBlock and SingleTransfer. Reduce to 2 launches, 2 warmup, 10 iterations (20 data points) to keep total runtime ~15 min. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EXPB Benchmark ComparisonRun: View workflow run superblocksNo cached master baseline for
realblocksNo cached master baseline for
|
Changes
BlockProcessingBenchmarkto use a single shared world state andBranchProcessorinstead of a pre-built state pool, matching the live client's block processing pathOperationsPerInvoke = Nloop (N=5000) to all 9 benchmark methods so BDN divides total time by N — reported times remain per-operation but iteration time stays above 100ms, eliminating MinIterationTime warnings for all scenarios including EmptyBlock and SingleTransferGcForce,InvocationCount=1,UnrollFactor=1— keeps total runtime ~15 minBenchmark results (latest run)
Most benchmarks show Error/Mean under 2%.
Types of changes
Testing
Run the benchmark suite locally:
dotnet run -c Release --project src/Nethermind/Nethermind.Evm.Benchmark/Nethermind.Evm.Benchmark.csproj \ -- --filter "*BlockProcessing*"Verify no MinIterationTime warnings and that StdDev/Mean is low.
Documentation