Conversation
EVM Opcode Benchmark DiffAggregated runs: base=1, pr=1 No significant regressions or improvements detected. |
There was a problem hiding this comment.
Pull request overview
This pull request adds automated EVM opcode performance benchmarking to the CI pipeline, enabling automatic detection of performance regressions in the Nethermind.Evm module. When changes are made to the EVM implementation, the workflow runs benchmarks on both the base and PR branches, compares the results, and posts a summary comment on the PR if any opcode's mean execution time changes by ±2.5% or more. The PR also includes a minor comment formatting change in the EVM instructions file.
Changes:
- Added a GitHub Actions workflow that automatically benchmarks EVM opcode performance for PRs affecting the Nethermind.Evm codebase
- Implemented Python-based benchmark result comparison with configurable threshold (2.5%)
- Removed a period from a comment in EvmInstructions.cs
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| .github/workflows/evm-opcode-benchmark-diff.yml | New workflow that runs EVM opcode benchmarks, compares base vs PR results, and posts PR comments with performance changes |
| src/Nethermind/Nethermind.Evm/Instructions/EvmInstructions.cs | Removed period from end of comment for consistency (line 36) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
83661d8 to
5601b40
Compare
879171e to
40e6b81
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Introduce .github/scripts/evm_benchmark_utils.py containing shared parsing, aggregation and comparison utilities for EVM opcode BenchmarkDotNet output (unit parsing, median aggregation, CV/noise/uncertainty floors, etc.). Refactor evm-opcode-benchmark-diff.yml to import and use the new module (set PYTHONPATH), simplify inline Python by delegating logic, and improve handling of rerun filters and dotnet invocation. Also update workflow trigger to run when the workflow or utility script changes and unify output/table formatting. This reduces duplicated code, centralizes configuration via environment variables, and makes the benchmark diffing/rerun logic clearer and more maintainable.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| compare-evm-opcodes: | ||
| name: Compare EVM opcode benchmarks (base vs PR) | ||
| needs: check-trigger | ||
| if: needs.check-trigger.outputs.should_run == 'true' |
There was a problem hiding this comment.
compare-evm-opcodes runs untrusted PR code on the benchmark runner. If that runner is self-hosted (as it appears to be for benchmark workloads), this is a security risk for PRs from forks. Add a guard to only run the benchmark job for same-repo PRs (e.g., github.event.pull_request.head.repo.full_name == github.repository) or switch to a safer model (e.g., require a label/manual trigger for fork PRs).
| if: needs.check-trigger.outputs.should_run == 'true' | |
| if: needs.check-trigger.outputs.should_run == 'true' && github.event.pull_request.head.repo.full_name == github.repository |
| - name: Publish PR benchmark comment | ||
| if: always() && github.event_name == 'pull_request' | ||
| uses: actions/github-script@v7 | ||
| env: | ||
| COMMENT_BODY: ${{ steps.compare.outputs.comment_body }} | ||
| with: | ||
| script: | | ||
| const marker = '<!-- evm-opcode-benchmark-diff -->'; | ||
| const body = process.env.COMMENT_BODY; | ||
| if (!body || body.trim().length === 0) { | ||
| core.setFailed('Missing COMMENT_BODY from compare step output.'); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Publish PR benchmark comment runs under if: always(), but it hard-fails when COMMENT_BODY is missing. If any earlier step fails and compare doesn't produce outputs, this step will fail with a secondary error and can obscure the real failure. Consider gating this step on steps.compare.outputs.comment_body being non-empty (or steps.compare.outcome == 'success') so failures are reported cleanly.
| } | ||
|
|
||
| // Set basic control and arithmetic opcodes. | ||
| // Set basic control and arithmetic opcodes |
There was a problem hiding this comment.
This comment is missing the trailing period, while adjacent section comments in this file use periods (e.g., "Comparison and bitwise opcodes."). Consider adding the period back for consistent punctuation.
| // Set basic control and arithmetic opcodes | |
| // Set basic control and arithmetic opcodes. |
Changes
.github/workflows/evm-opcode-benchmark-diff.ymlGitHub Actions workflow that automatically benchmarks EVM opcode performance on PRs touchingNethermind.Evmor upgradingNethermind.Numerics.Int256EvmOpcodesBenchmarkConfigwith explicit BDN job configuration (3 launches × 15 iterations, 15 warmups, GC server mode, process isolation)GetThresholdPercent()and customOpcodeThresholdColumnEvmOpcodeGasColumnProviderwith Gas, MGas/s, and Threshold custom BDN columnsIterationSetupfor TLOAD/TSTORE benchmark stabilitycheck-triggerjob onubuntu-latestgates the expensive benchmark to avoid claiming self-hosted runner unnecessarilyTypes of changes
What types of changes does your code introduce?
Testing
Requires testing
If yes, did you write tests?
Notes on testing
Validated via multiple CI runs on the
benchmarks-actionbranch against itself (no EVM implementation changes). Final run produces zero false positives with all noise guards active.Documentation
Requires documentation update
Requires explanation in Release Notes