Skip to content

OpCode Benchmarks action#10651

Merged
benaadams merged 3 commits intomasterfrom
benchmarks-action
Feb 26, 2026
Merged

OpCode Benchmarks action#10651
benaadams merged 3 commits intomasterfrom
benchmarks-action

Conversation

@benaadams
Copy link
Member

@benaadams benaadams commented Feb 25, 2026

Changes

  • Add .github/workflows/evm-opcode-benchmark-diff.yml GitHub Actions workflow that automatically benchmarks EVM opcode performance on PRs touching Nethermind.Evm or upgrading Nethermind.Numerics.Int256
  • Add EvmOpcodesBenchmarkConfig with explicit BDN job configuration (3 launches × 15 iterations, 15 warmups, GC server mode, process isolation)
  • Add per-opcode regression detection thresholds (5% default, 15% CALL/LOG, 20% state opcodes) via GetThresholdPercent() and custom OpcodeThresholdColumn
  • Add EvmOpcodeGasColumnProvider with Gas, MGas/s, and Threshold custom BDN columns
  • Add transient storage pre-seeding in IterationSetup for TLOAD/TSTORE benchmark stability
  • Comparison script uses Median (not Mean), CV-based noise guard, BDN Error uncertainty floor, absolute delta floor, and hysteresis margin to eliminate false positives
  • Noisy flagged opcodes are automatically re-run and results aggregated across runs
  • PR comment separates regressions, improvements, and new/removed opcodes
  • Lightweight check-trigger job on ubuntu-latest gates the expensive benchmark to avoid claiming self-hosted runner unnecessarily

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Validated via multiple CI runs on the benchmarks-action branch against itself (no EVM implementation changes). Final run produces zero false positives with all noise guards active.

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

Copilot AI review requested due to automatic review settings February 25, 2026 20:12
@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

EVM Opcode Benchmark Diff

Aggregated runs: base=1, pr=1

No significant regressions or improvements detected.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds automated EVM opcode performance benchmarking to the CI pipeline, enabling automatic detection of performance regressions in the Nethermind.Evm module. When changes are made to the EVM implementation, the workflow runs benchmarks on both the base and PR branches, compares the results, and posts a summary comment on the PR if any opcode's mean execution time changes by ±2.5% or more. The PR also includes a minor comment formatting change in the EVM instructions file.

Changes:

  • Added a GitHub Actions workflow that automatically benchmarks EVM opcode performance for PRs affecting the Nethermind.Evm codebase
  • Implemented Python-based benchmark result comparison with configurable threshold (2.5%)
  • Removed a period from a comment in EvmInstructions.cs

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
.github/workflows/evm-opcode-benchmark-diff.yml New workflow that runs EVM opcode benchmarks, compares base vs PR results, and posts PR comments with performance changes
src/Nethermind/Nethermind.Evm/Instructions/EvmInstructions.cs Removed period from end of comment for consistency (line 36)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@benaadams benaadams mentioned this pull request Feb 25, 2026
8 tasks
@benaadams benaadams force-pushed the benchmarks-action branch 6 times, most recently from 83661d8 to 5601b40 Compare February 26, 2026 08:54
Base automatically changed from benchmarks to master February 26, 2026 09:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Introduce .github/scripts/evm_benchmark_utils.py containing shared parsing, aggregation and comparison utilities for EVM opcode BenchmarkDotNet output (unit parsing, median aggregation, CV/noise/uncertainty floors, etc.). Refactor evm-opcode-benchmark-diff.yml to import and use the new module (set PYTHONPATH), simplify inline Python by delegating logic, and improve handling of rerun filters and dotnet invocation. Also update workflow trigger to run when the workflow or utility script changes and unify output/table formatting. This reduces duplicated code, centralizes configuration via environment variables, and makes the benchmark diffing/rerun logic clearer and more maintainable.
@benaadams benaadams changed the title Benchmarks action OpCode Benchmarks action Feb 26, 2026
@benaadams benaadams requested a review from Copilot February 26, 2026 11:28
@benaadams benaadams merged commit dfa5e91 into master Feb 26, 2026
118 checks passed
@benaadams benaadams deleted the benchmarks-action branch February 26, 2026 11:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

compare-evm-opcodes:
name: Compare EVM opcode benchmarks (base vs PR)
needs: check-trigger
if: needs.check-trigger.outputs.should_run == 'true'
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare-evm-opcodes runs untrusted PR code on the benchmark runner. If that runner is self-hosted (as it appears to be for benchmark workloads), this is a security risk for PRs from forks. Add a guard to only run the benchmark job for same-repo PRs (e.g., github.event.pull_request.head.repo.full_name == github.repository) or switch to a safer model (e.g., require a label/manual trigger for fork PRs).

Suggested change
if: needs.check-trigger.outputs.should_run == 'true'
if: needs.check-trigger.outputs.should_run == 'true' && github.event.pull_request.head.repo.full_name == github.repository

Copilot uses AI. Check for mistakes.
Comment on lines +305 to +317
- name: Publish PR benchmark comment
if: always() && github.event_name == 'pull_request'
uses: actions/github-script@v7
env:
COMMENT_BODY: ${{ steps.compare.outputs.comment_body }}
with:
script: |
const marker = '<!-- evm-opcode-benchmark-diff -->';
const body = process.env.COMMENT_BODY;
if (!body || body.trim().length === 0) {
core.setFailed('Missing COMMENT_BODY from compare step output.');
return;
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Publish PR benchmark comment runs under if: always(), but it hard-fails when COMMENT_BODY is missing. If any earlier step fails and compare doesn't produce outputs, this step will fail with a secondary error and can obscure the real failure. Consider gating this step on steps.compare.outputs.comment_body being non-empty (or steps.compare.outcome == 'success') so failures are reported cleanly.

Copilot uses AI. Check for mistakes.
}

// Set basic control and arithmetic opcodes.
// Set basic control and arithmetic opcodes
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is missing the trailing period, while adjacent section comments in this file use periods (e.g., "Comparison and bitwise opcodes."). Consider adding the period back for consistent punctuation.

Suggested change
// Set basic control and arithmetic opcodes
// Set basic control and arithmetic opcodes.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants