Skip to content

feat(benches): add benchmarks for historic and latest events scanning#254

Merged
0xNeshi merged 41 commits intoOpenZeppelin:mainfrom
yug49:feat/benchmarks
Jan 21, 2026
Merged

feat(benches): add benchmarks for historic and latest events scanning#254
0xNeshi merged 41 commits intoOpenZeppelin:mainfrom
yug49:feat/benchmarks

Conversation

@yug49
Copy link
Contributor

@yug49 yug49 commented Dec 13, 2025

Related to #229

Hey!

This PR introduces a benchmarking system for Event Scanner using Criterion.rs to measure performance impact of changes to the scanner. Currently drafting, Bencher CI integration coming in follow-up.


What's Included

New benches Crate Structure

benches/
├── Cargo.toml                           # Benchmark crate config
├── src/
│   └── lib.rs                           # Shared utilities (Anvil setup, contract deployment, event generation)
└── benches/
    ├── historic_scanning.rs             # Historic mode benchmarks
    └── latest_events_scanning.rs        # Latest events mode benchmarks

Benchmarks Implemented

Mode Event Counts What It Measures
Historic 10K, 50K, 100K Time to scan all events from block 0 to latest
Latest Events 100, 1K, 10K, 50K Time to fetch the N most recent events from a 100K event pool

example of Historic:
Screenshot 2025-12-13 at 2 58 25 PM


How Regression Testing Works

Criterion stores baseline results in target/criterion/<benchmark>/base/. On subsequent runs, it compares new measurements against this baseline and reports:

historic_scanning/events/10000
                        time:   [30.963 ms 36.506 ms 40.598 ms]
                        thrpt:  [246.32 Kelem/s 273.93 Kelem/s 322.97 Kelem/s]
                 change: [-2.12% -1.01% +0.12%] (p = 0.12 > 0.05)
                        No change in performance detected.

If a change introduces a regression, you'll see something like:

                 change: [+15.2% +18.4% +21.1%] (p = 0.00 < 0.05)
                        Performance has regressed.

This makes it easy to catch slowdowns before merging


Running Benchmarks

# All benchmarks
cargo bench --manifest-path benches/Cargo.toml

# Specific benchmark
cargo bench --manifest-path benches/Cargo.toml --bench historic_scanning
cargo bench --manifest-path benches/Cargo.toml --bench latest_events_scanning

# Filter by event count
cargo bench --manifest-path benches/Cargo.toml -- "historic_scanning/events/10000"

Next Steps

  • Bencher CI integration: Add GitHub Actions workflow for on-demand benchmarking with historical tracking via Bencher

@yug49
Copy link
Contributor Author

yug49 commented Dec 17, 2025

Hey @0xNeshi
I have made the required changes in the Criterion part as you mentioned.
Please review once, and if everything's good, I can move forward with the Bencher integration.

@0xNeshi
Copy link
Collaborator

0xNeshi commented Dec 17, 2025

All good 👍

@yug49
Copy link
Contributor Author

yug49 commented Dec 21, 2025

GM @0xNeshi,
I have completed the integration of Bencher with the following setup:

Workflow Architecture

Three GitHub Actions workflows handle different scenarios:

  1. benchmarks.yml - Runs on push to main and manual dispatch. This workflow uploads benchmark results to Bencher and establishes the baseline for regression detection. Path filters ensure benchmarks only run when relevant code changes (src/, benches/, Cargo.toml, Cargo.lock).

  2. pr_benchmarks_run.yml - Runs benchmarks on pull requests. This workflow does not have access to secrets, making it safe for fork PRs. Results are saved as artifacts.

  3. pr_benchmarks_track.yml - Triggered after the PR benchmark run completes. This workflow downloads the artifacts and uploads them to Bencher for comparison against the base branch. It posts comparison results as comments on the PR.

Required Setup

  • BENCHER_API_TOKEN as a repository secret
  • BENCHER_PROJECT as a repository variable

How It Works

When code is merged to main, benchmarks run and upload results to establish the baseline. For pull requests, benchmarks run in a fork-safe workflow and results are compared against the baseline. The --start-point-reset flag ensures PR branches remain ephemeral and do not accumulate historical data.

@yug49 yug49 marked this pull request as ready for review December 21, 2025 22:10
Copy link
Collaborator

@0xNeshi 0xNeshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, let's polish now

yug49 and others added 4 commits December 22, 2025 19:32
Co-authored-by: Nenad <xinef.it@gmail.com>
Co-authored-by: Nenad <xinef.it@gmail.com>
Co-authored-by: Nenad <xinef.it@gmail.com>
@LeoPatOZ
Copy link
Collaborator

@0xNeshi what do you think about creating multiple dump files and starting anvil from that state instead of having to recreate it every time we start a bench

@yug49
Copy link
Contributor Author

yug49 commented Dec 29, 2025

Hey @0xNeshi
Extremely sorry for the delay, was OOO because of xmas.
I have completed the dump file integration and made all the changes as recommended by you, please review.

@yug49 yug49 requested a review from 0xNeshi December 29, 2025 15:08
yug49 and others added 4 commits January 20, 2026 15:12
Co-authored-by: Nenad <xinef.it@gmail.com>
Co-authored-by: Nenad <xinef.it@gmail.com>
Co-authored-by: Nenad <xinef.it@gmail.com>
@yug49
Copy link
Contributor Author

yug49 commented Jan 20, 2026

Hey @0xNeshi
I am really sorry for missing this change earlier, I have fixed it now.

I think this setup makes sense, since this --err will fail the workflow if the benchmark result exceed the threshold. I totally agree with your suggestion before of not including it in automatic push to main workflow because failing the workflow at this point won't prevent anything since the code is already merged. Regressions at this point will be still visible in the Bencher dashboard.
When someone is manually running the workflow then it means that he is clearly investigating the performance, and I think, a clear pass/fail signal at that time would be really feasible for him. The --err will provide a good feedback at that time.

What do you think? Will be happy to implement if you prefer a different approach.

@yug49 yug49 requested a review from 0xNeshi January 20, 2026 17:18
Copy link
Collaborator

@0xNeshi 0xNeshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yug49 re: #254 (comment)

We actually need to make a distinction between:

  • running benches locally - means benches and/or the code being benched are being debugged, so makes sense to enable the flag to address any issues
  • manually triggering CI - means we're running benches outside the normal schedule

Let's actually remove the --err flag completely.

@yug49
Copy link
Contributor Author

yug49 commented Jan 21, 2026

@yug49 re: #254 (comment)

We actually need to make a distinction between:

  • running benches locally - means benches and/or the code being benched are being debugged, so makes sense to enable the flag to address any issues
  • manually triggering CI - means we're running benches outside the normal schedule

Let's actually remove the --err flag completely.

Yes, that sounds good
I have removed the --err flags completely.

@yug49 yug49 requested a review from 0xNeshi January 21, 2026 08:01
Copy link
Collaborator

@0xNeshi 0xNeshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @yug49 !

@0xNeshi 0xNeshi merged commit 55fd016 into OpenZeppelin:main Jan 21, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants