OpCode Benchmarks action by benaadams · Pull Request #10651 · NethermindEth/nethermind

benaadams · 2026-02-25T20:12:53Z

Changes

Add .github/workflows/evm-opcode-benchmark-diff.yml GitHub Actions workflow that automatically benchmarks EVM opcode performance on PRs touching Nethermind.Evm or upgrading Nethermind.Numerics.Int256
Add EvmOpcodesBenchmarkConfig with explicit BDN job configuration (3 launches × 15 iterations, 15 warmups, GC server mode, process isolation)
Add per-opcode regression detection thresholds (5% default, 15% CALL/LOG, 20% state opcodes) via GetThresholdPercent() and custom OpcodeThresholdColumn
Add EvmOpcodeGasColumnProvider with Gas, MGas/s, and Threshold custom BDN columns
Add transient storage pre-seeding in IterationSetup for TLOAD/TSTORE benchmark stability
Comparison script uses Median (not Mean), CV-based noise guard, BDN Error uncertainty floor, absolute delta floor, and hysteresis margin to eliminate false positives
Noisy flagged opcodes are automatically re-run and results aggregated across runs
PR comment separates regressions, improvements, and new/removed opcodes
Lightweight check-trigger job on ubuntu-latest gates the expensive benchmark to avoid claiming self-hosted runner unnecessarily

Types of changes

What types of changes does your code introduce?

Bugfix (a non-breaking change that fixes an issue)
New feature (a non-breaking change that adds functionality)
Breaking change (a change that causes existing functionality not to work as expected)
Optimization
Refactoring
Documentation update
Build-related changes
Other: Description

Testing

Requires testing

Yes
No

If yes, did you write tests?

Yes
No

Notes on testing

Validated via multiple CI runs on the benchmarks-action branch against itself (no EVM implementation changes). Final run produces zero false positives with all noise guards active.

Documentation

Requires documentation update

Yes
No

Requires explanation in Release Notes

Yes
No

github-actions · 2026-02-25T20:15:59Z

EVM Opcode Benchmark Diff

Aggregated runs: base=1, pr=1

No significant regressions or improvements detected.

Copilot

Pull request overview

This pull request adds automated EVM opcode performance benchmarking to the CI pipeline, enabling automatic detection of performance regressions in the Nethermind.Evm module. When changes are made to the EVM implementation, the workflow runs benchmarks on both the base and PR branches, compares the results, and posts a summary comment on the PR if any opcode's mean execution time changes by ±2.5% or more. The PR also includes a minor comment formatting change in the EVM instructions file.

Changes:

Added a GitHub Actions workflow that automatically benchmarks EVM opcode performance for PRs affecting the Nethermind.Evm codebase
Implemented Python-based benchmark result comparison with configurable threshold (2.5%)
Removed a period from a comment in EvmInstructions.cs

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
.github/workflows/evm-opcode-benchmark-diff.yml	New workflow that runs EVM opcode benchmarks, compares base vs PR results, and posts PR comments with performance changes
src/Nethermind/Nethermind.Evm/Instructions/EvmInstructions.cs	Removed period from end of comment for consistency (line 36)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/evm-opcode-benchmark-diff.yml

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/evm-opcode-benchmark-diff.yml

Introduce .github/scripts/evm_benchmark_utils.py containing shared parsing, aggregation and comparison utilities for EVM opcode BenchmarkDotNet output (unit parsing, median aggregation, CV/noise/uncertainty floors, etc.). Refactor evm-opcode-benchmark-diff.yml to import and use the new module (set PYTHONPATH), simplify inline Python by delegating logic, and improve handling of rerun filters and dotnet invocation. Also update workflow trigger to run when the workflow or utility script changes and unify output/table formatting. This reduces duplicated code, centralizes configuration via environment variables, and makes the benchmark diffing/rerun logic clearer and more maintainable.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-26T11:32:57Z

.github/workflows/evm-opcode-benchmark-diff.yml

+  compare-evm-opcodes:
+    name: Compare EVM opcode benchmarks (base vs PR)
+    needs: check-trigger
+    if: needs.check-trigger.outputs.should_run == 'true'


compare-evm-opcodes runs untrusted PR code on the benchmark runner. If that runner is self-hosted (as it appears to be for benchmark workloads), this is a security risk for PRs from forks. Add a guard to only run the benchmark job for same-repo PRs (e.g., github.event.pull_request.head.repo.full_name == github.repository) or switch to a safer model (e.g., require a label/manual trigger for fork PRs).

Suggested change

if: needs.check-trigger.outputs.should_run == 'true'

if: needs.check-trigger.outputs.should_run == 'true' && github.event.pull_request.head.repo.full_name == github.repository

Copilot · 2026-02-26T11:32:58Z

.github/workflows/evm-opcode-benchmark-diff.yml

+      - name: Publish PR benchmark comment
+        if: always() && github.event_name == 'pull_request'
+        uses: actions/github-script@v7
+        env:
+          COMMENT_BODY: ${{ steps.compare.outputs.comment_body }}
+        with:
+          script: |
+            const marker = '<!-- evm-opcode-benchmark-diff -->';
+            const body = process.env.COMMENT_BODY;
+            if (!body || body.trim().length === 0) {
+              core.setFailed('Missing COMMENT_BODY from compare step output.');
+              return;
+            }


Publish PR benchmark comment runs under if: always(), but it hard-fails when COMMENT_BODY is missing. If any earlier step fails and compare doesn't produce outputs, this step will fail with a secondary error and can obscure the real failure. Consider gating this step on steps.compare.outputs.comment_body being non-empty (or steps.compare.outcome == 'success') so failures are reported cleanly.

Copilot · 2026-02-26T11:32:58Z

src/Nethermind/Nethermind.Evm/Instructions/EvmInstructions.cs

        }

-        // Set basic control and arithmetic opcodes.
+        // Set basic control and arithmetic opcodes


This comment is missing the trailing period, while adjacent section comments in this file use periods (e.g., "Comparison and bitwise opcodes."). Consider adding the period back for consistent punctuation.

Suggested change

// Set basic control and arithmetic opcodes

// Set basic control and arithmetic opcodes.

Copilot AI review requested due to automatic review settings February 25, 2026 20:12

benaadams requested review from a team, Demuirgos, LukaszRozmej and rubo as code owners February 25, 2026 20:12

github-actions bot added devops evm labels Feb 25, 2026

Copilot started reviewing on behalf of benaadams February 25, 2026 20:13 View session

Demuirgos approved these changes Feb 25, 2026

View reviewed changes

Copilot AI reviewed Feb 25, 2026

View reviewed changes

benaadams mentioned this pull request Feb 25, 2026

Raw opcode benchmarks #10650

Merged

8 tasks

benaadams force-pushed the benchmarks-action branch 6 times, most recently from 83661d8 to 5601b40 Compare February 26, 2026 08:54

Base automatically changed from benchmarks to master February 26, 2026 09:43

Benchmarks action

40e6b81

benaadams force-pushed the benchmarks-action branch from 879171e to 40e6b81 Compare February 26, 2026 10:16

Use correct runner

8403172

benaadams requested a review from Copilot February 26, 2026 10:35

Copilot started reviewing on behalf of benaadams February 26, 2026 10:36 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

github-actions bot added build changes new feature labels Feb 26, 2026

benaadams changed the title ~~Benchmarks action~~ OpCode Benchmarks action Feb 26, 2026

benaadams requested a review from Copilot February 26, 2026 11:28

Copilot started reviewing on behalf of benaadams February 26, 2026 11:28 View session

kamilchodola approved these changes Feb 26, 2026

View reviewed changes

benaadams merged commit dfa5e91 into master Feb 26, 2026
118 checks passed

benaadams deleted the benchmarks-action branch February 26, 2026 11:31

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpCode Benchmarks action#10651

OpCode Benchmarks action#10651
benaadams merged 3 commits intomasterfrom
benchmarks-action

benaadams commented Feb 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if: needs.check-trigger.outputs.should_run == 'true'
	if: needs.check-trigger.outputs.should_run == 'true' && github.event.pull_request.head.repo.full_name == github.repository

	// Set basic control and arithmetic opcodes
	// Set basic control and arithmetic opcodes.

Conversation

benaadams commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Types of changes

What types of changes does your code introduce?

Testing

Requires testing

If yes, did you write tests?

Notes on testing

Documentation

Requires documentation update

Requires explanation in Release Notes

Uh oh!

github-actions bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

EVM Opcode Benchmark Diff

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

benaadams commented Feb 25, 2026 •

edited

Loading

github-actions bot commented Feb 25, 2026 •

edited

Loading