Skip to content

experiment: naive round-robin test splitting (compare vs #19423)#19451

Closed
smartcontracts wants to merge 8 commits intodevelopfrom
chore/naive-test-splitting
Closed

experiment: naive round-robin test splitting (compare vs #19423)#19451
smartcontracts wants to merge 8 commits intodevelopfrom
chore/naive-test-splitting

Conversation

@smartcontracts
Copy link
Copy Markdown
Contributor

Summary

Experiment to compare wall-clock time of naive vs curated test splitting.

What to compare

Run both this PR and #19423, then compare the memory-shard-* job durations:

The longest shard determines wall-clock time. Curated sharding should produce more balanced shards since it groups by test weight, while naive sharding distributes blindly.

Shard distribution

Shard Packages
0 base, depreqres/reqressyncdisabled/elsync, interop/message, interop/upgrade-no-supervisor, rules, supernode/interop/same_timestamp_invalid
1 base/chain, depreqres/syncmodereqressync/clsync, interop/prep, interop/upgrade-singlechain, safeheaddb_clsync, sync/...
2 base/conductor, ecotone, interop/reorgs, isthmus, safeheaddb_elsync, sync_tester/sync_tester_e2e
3 base/deposit, fjord, interop/seqwindow, isthmus/erc20_bridge, sequencer, sync_tester/sync_tester_elsync
4 batcher/..., flashblocks, interop/smoke, isthmus/operator_fee, supernode, sync_tester/sync_tester_elsync_multi
5 custom_gas_token, fusaka, interop/sync/multisupervisor_interop, isthmus/pectra, supernode/interop, sync_tester/sync_tester_hfs
6 depreqres/reqressyncdisabled/clsync, interop/contract, interop/sync/simple_interop, isthmus/withdrawal_root, supernode/interop/follow_l2
7 depreqres/reqressyncdisabled/divergence, interop/loadtest, interop/upgrade, jovian/..., supernode/interop/reorg

🤖 Generated with Claude Code

smartcontracts and others added 7 commits March 6, 2026 17:46
Replace the single serial memory-all job with 8 parallel shard jobs using
CircleCI matrix. Each shard runs a non-overlapping subset of test packages
defined in acceptance-tests.yaml. Wall-clock = longest shard, not sum.

Also:
- Move contracts-bedrock-coverage to develop-only (saves ~14min from PR path)
- Move contracts-bedrock-upload to develop-only
- Add check-shard-coverage.sh to catch orphan test packages that aren't
  in any shard (runs automatically in each shard job)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With coverage generation moved to develop-only, make the patch coverage
status check informational so it cannot block PRs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move flaky RPC-dependent tests (op-deployer integration_test, op-validator,
etc.) from TEST_PKGS to RPC_TEST_PKGS. These only run in go-tests-full on
develop, not in go-tests-short on PRs.

Split op-e2e/system/... into 14 sub-packages and reorder TEST_PKGS for
better round-robin distribution across 12 CI nodes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add new tests/rules package to ci-shard-misc gate
- Fix SC2295 shellcheck warnings: quote expansions inside ${..}

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Emulates the naive sharding approach from ethereum-optimism/infra#566:
all 46 acceptance test packages sorted alphabetically and distributed
round-robin (i % 8) across 8 shards. Same packages, same CI structure,
just different assignment — compare wall-clock time against #19423's
curated shards to measure the cost of naive vs domain-aware splitting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces curated ci-shard gates with an exact emulation of the naive
sharding from ethereum-optimism/infra#566:

1. Discover all 68 test packages on disk (loadGatelessValidators)
2. sort.Strings (Go lexicographic)
3. i % 8 == shardIndex (round-robin)
4. --exclude-gates flake-shake (post-shard, removes 4 packages)

Result: 64 packages across 8 shards (7-9 per shard). This includes
fault-proof tests, external-network tests, and everything else that
the curated shards in #19423 intentionally excluded. The naive approach
has no awareness of test weight or prerequisites.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.1%. Comparing base (470ae6e) to head (9c07e2d).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@            Coverage Diff             @@
##           develop   #19451     +/-   ##
==========================================
- Coverage     76.5%    76.1%   -0.5%     
==========================================
  Files          729      591    -138     
  Lines        81441    74215   -7226     
==========================================
- Hits         62332    56504   -5828     
+ Misses       18965    17567   -1398     
  Partials       144      144             
Flag Coverage Δ
cannon-go-tests-64 66.4% <ø> (ø)
contracts-bedrock-tests ?
unit 76.6% <ø> (-0.1%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 145 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add path-filtering detection for rust/ changes. When no Rust files are
modified on a feature branch, the kona-build-release job skips cargo
build and uses cached binaries from the restored target cache instead.
This saves ~9 minutes on PRs that don't touch Rust code.

Safety backstops:
- Always builds on develop/main regardless
- Falls through to full build if cached binaries are missing
- Default parameter value (true) means all existing invocations are unaffected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@smartcontracts smartcontracts deleted the chore/naive-test-splitting branch March 9, 2026 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant