Skip to content

feat(op-acceptor): add --shard-index/--shard-total for parallel gateless sharding#566

Open
smartcontracts wants to merge 1 commit intomainfrom
feat/acceptor-shard-flag
Open

feat(op-acceptor): add --shard-index/--shard-total for parallel gateless sharding#566
smartcontracts wants to merge 1 commit intomainfrom
feat/acceptor-shard-flag

Conversation

@smartcontracts
Copy link
Contributor

Summary

  • Adds --shard-index and --shard-total flags to op-acceptor for round-robin test package sharding in gateless mode
  • After package discovery, packages are sorted alphabetically (deterministic), then filtered by index % total == shardIndex
  • New test packages are automatically picked up by shards; removed packages disappear — no manual gate curation needed

Motivation

The Optimism monorepo's memory-all CI job runs ~40 acceptance test packages in a single worker, which is the critical path bottleneck. With this change, CI can parallelize into N shards:

# .circleci matrix
matrix:
  parameters:
    shard: [0, 1, 2, 3, 4, 5, 6, 7]
op-acceptor \
  --testdir ./op-acceptance-tests/... \
  --shard-index $SHARD --shard-total 8 \
  --exclude-gates flake-shake \
  --allow-skips --timeout 120m \
  --orchestrator sysgo

This eliminates the need for manually maintained ci-shard-* gates in acceptance-tests.yaml and orphan detection scripts.

Changes

File Change
flags/flags.go New --shard-index (default -1) and --shard-total (default 0) flags with env vars
config.go Parse + validate: both must be set together, index in [0, total), gateless-only
nat.go Thread shard config into registry
registry/registry.go Sort packages, apply i % total == index filter in loadGatelessValidators()
registry/registry_test.go 7 new tests covering basic sharding, coverage, determinism, no-overlap, edge cases

Test plan

  • TestShardFiltering_Basic — 8 packages / 4 shards = 2 each
  • TestShardFiltering_Coverage — union of all shards == full package set
  • TestShardFiltering_Deterministic — same shard twice = same result
  • TestShardFiltering_NoOverlap — no package in multiple shards
  • TestShardFiltering_MoreShardsThanPackages — empty shards handled
  • TestShardFiltering_SinglePackage — 1 package / 2 shards
  • TestShardFiltering_Disabled — default values = no sharding
  • All 18 registry tests pass (11 existing + 7 new)
  • go vet clean on all changed packages

🤖 Generated with Claude Code

…ess test execution

Adds round-robin sharding to gateless mode so CI can split test packages
across N workers without maintaining manual gate lists.

After test package discovery, packages are sorted alphabetically for
determinism, then filtered by `index % total == shardIndex`. New packages
are automatically picked up; removed packages disappear — no manual
curation needed.

Flags:
  --shard-index  Zero-based shard index (default: -1, disabled)
  --shard-total  Total number of shards (default: 0, disabled)

Env vars: OP_ACCEPTOR_SHARD_INDEX, OP_ACCEPTOR_SHARD_TOTAL

Validation:
  - Both must be set together
  - Index must be in [0, total)
  - Only works in gateless mode (errors with --gate)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@smartcontracts smartcontracts requested a review from a team as a code owner March 6, 2026 18:03
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 54.54545% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.79%. Comparing base (6b0f7b2) to head (3b46771).

Files with missing lines Patch % Lines
op-acceptor/config.go 0.00% 15 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #566      +/-   ##
==========================================
- Coverage   57.82%   57.79%   -0.03%     
==========================================
  Files          96       96              
  Lines       14477    14508      +31     
==========================================
+ Hits         8371     8385      +14     
- Misses       5591     5608      +17     
  Partials      515      515              
Flag Coverage Δ
op-acceptor 58.97% <54.54%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
op-acceptor/flags/flags.go 81.81% <ø> (ø)
op-acceptor/nat.go 33.21% <100.00%> (+0.15%) ⬆️
op-acceptor/registry/registry.go 82.17% <100.00%> (+0.68%) ⬆️
op-acceptor/config.go 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@scharissis
Copy link
Contributor

We have a very similar PR as WIP here: #553

And the discussion which lead to it here:
https://oplabs-pbc.slack.com/archives/C09E7EJGN5P/p1770986805525969

Did you happen to see/consider any of this beforehand? I am wondering if we can streamline our efforts.

smartcontracts added a commit to ethereum-optimism/optimism that referenced this pull request Mar 9, 2026
Emulates the naive sharding approach from ethereum-optimism/infra#566:
all 46 acceptance test packages sorted alphabetically and distributed
round-robin (i % 8) across 8 shards. Same packages, same CI structure,
just different assignment — compare wall-clock time against #19423's
curated shards to measure the cost of naive vs domain-aware splitting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
smartcontracts added a commit to ethereum-optimism/optimism that referenced this pull request Mar 9, 2026
Replaces curated ci-shard gates with an exact emulation of the naive
sharding from ethereum-optimism/infra#566:

1. Discover all 68 test packages on disk (loadGatelessValidators)
2. sort.Strings (Go lexicographic)
3. i % 8 == shardIndex (round-robin)
4. --exclude-gates flake-shake (post-shard, removes 4 packages)

Result: 64 packages across 8 shards (7-9 per shard). This includes
fault-proof tests, external-network tests, and everything else that
the curated shards in #19423 intentionally excluded. The naive approach
has no awareness of test weight or prerequisites.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants