experiment: naive round-robin test splitting (compare vs #19423) by smartcontracts · Pull Request #19451 · ethereum-optimism/optimism

smartcontracts · 2026-03-09T17:22:05Z

Summary

Experiment to compare wall-clock time of naive vs curated test splitting.

Merges ci: cut PR CI time from 26min to ~6min — caching + parallel test shards (EXPERIMENTAL DRAFT) #19423 (curated shards) then replaces the shard assignments with naive round-robin
Same 46 packages, same 8 shards, same CI structure — only the assignment strategy differs
Alphabetical sort → i % 8 distribution (emulates infra#566)

What to compare

Run both this PR and #19423, then compare the memory-shard-* job durations:

ci: cut PR CI time from 26min to ~6min — caching + parallel test shards (EXPERIMENTAL DRAFT) #19423: domain-aware grouping (interop-a, sync-b, protocol, etc.)
This PR: naive alphabetical round-robin (shard 0-7)

The longest shard determines wall-clock time. Curated sharding should produce more balanced shards since it groups by test weight, while naive sharding distributes blindly.

Shard distribution

Shard	Packages
0	base, depreqres/reqressyncdisabled/elsync, interop/message, interop/upgrade-no-supervisor, rules, supernode/interop/same_timestamp_invalid
1	base/chain, depreqres/syncmodereqressync/clsync, interop/prep, interop/upgrade-singlechain, safeheaddb_clsync, sync/...
2	base/conductor, ecotone, interop/reorgs, isthmus, safeheaddb_elsync, sync_tester/sync_tester_e2e
3	base/deposit, fjord, interop/seqwindow, isthmus/erc20_bridge, sequencer, sync_tester/sync_tester_elsync
4	batcher/..., flashblocks, interop/smoke, isthmus/operator_fee, supernode, sync_tester/sync_tester_elsync_multi
5	custom_gas_token, fusaka, interop/sync/multisupervisor_interop, isthmus/pectra, supernode/interop, sync_tester/sync_tester_hfs
6	depreqres/reqressyncdisabled/clsync, interop/contract, interop/sync/simple_interop, isthmus/withdrawal_root, supernode/interop/follow_l2
7	depreqres/reqressyncdisabled/divergence, interop/loadtest, interop/upgrade, jovian/..., supernode/interop/reorg

🤖 Generated with Claude Code

Replace the single serial memory-all job with 8 parallel shard jobs using CircleCI matrix. Each shard runs a non-overlapping subset of test packages defined in acceptance-tests.yaml. Wall-clock = longest shard, not sum. Also: - Move contracts-bedrock-coverage to develop-only (saves ~14min from PR path) - Move contracts-bedrock-upload to develop-only - Add check-shard-coverage.sh to catch orphan test packages that aren't in any shard (runs automatically in each shard job) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

With coverage generation moved to develop-only, make the patch coverage status check informational so it cannot block PRs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move flaky RPC-dependent tests (op-deployer integration_test, op-validator, etc.) from TEST_PKGS to RPC_TEST_PKGS. These only run in go-tests-full on develop, not in go-tests-short on PRs. Split op-e2e/system/... into 14 sub-packages and reorder TEST_PKGS for better round-robin distribution across 12 CI nodes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add new tests/rules package to ci-shard-misc gate - Fix SC2295 shellcheck warnings: quote expansions inside ${..} Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…chore/naive-test-splitting

Emulates the naive sharding approach from ethereum-optimism/infra#566: all 46 acceptance test packages sorted alphabetically and distributed round-robin (i % 8) across 8 shards. Same packages, same CI structure, just different assignment — compare wall-clock time against #19423's curated shards to measure the cost of naive vs domain-aware splitting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replaces curated ci-shard gates with an exact emulation of the naive sharding from ethereum-optimism/infra#566: 1. Discover all 68 test packages on disk (loadGatelessValidators) 2. sort.Strings (Go lexicographic) 3. i % 8 == shardIndex (round-robin) 4. --exclude-gates flake-shake (post-shard, removes 4 packages) Result: 64 packages across 8 shards (7-9 per shard). This includes fault-proof tests, external-network tests, and everything else that the curated shards in #19423 intentionally excluded. The naive approach has no awareness of test weight or prerequisites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-03-09T18:49:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.1%. Comparing base (470ae6e) to head (9c07e2d).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files

@@            Coverage Diff             @@
##           develop   #19451     +/-   ##
==========================================
- Coverage     76.5%    76.1%   -0.5%     
==========================================
  Files          729      591    -138     
  Lines        81441    74215   -7226     
==========================================
- Hits         62332    56504   -5828     
+ Misses       18965    17567   -1398     
  Partials       144      144

Flag	Coverage Δ
cannon-go-tests-64	`66.4% <ø> (ø)`
contracts-bedrock-tests	`?`
unit	`76.6% <ø> (-0.1%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 145 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add path-filtering detection for rust/ changes. When no Rust files are modified on a feature branch, the kona-build-release job skips cargo build and uses cached binaries from the restored target cache instead. This saves ~9 minutes on PRs that don't touch Rust code. Safety backstops: - Always builds on develop/main regardless - Falls through to full build if cached binaries are missing - Default parameter value (true) means all existing invocations are unaffected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

smartcontracts and others added 7 commits March 6, 2026 17:46

ci: make codecov patch coverage advisory-only

1e98273

With coverage generation moved to develop-only, make the patch coverage status check informational so it cannot block PRs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(ci): add tests/rules to shard, fix shellcheck warnings

cff926f

- Add new tests/rules package to ci-shard-misc gate - Fix SC2295 shellcheck warnings: quote expansions inside ${..} Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/ci/speed-up-critical-path' into …

a74db8b

…chore/naive-test-splitting

smartcontracts closed this Mar 9, 2026

smartcontracts deleted the chore/naive-test-splitting branch March 9, 2026 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiment: naive round-robin test splitting (compare vs #19423)#19451

experiment: naive round-robin test splitting (compare vs #19423)#19451
smartcontracts wants to merge 8 commits intodevelopfrom
chore/naive-test-splitting

smartcontracts commented Mar 9, 2026

Uh oh!

codecov bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

smartcontracts commented Mar 9, 2026

Summary

What to compare

Shard distribution

Uh oh!

codecov bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Mar 9, 2026 •

edited

Loading