experiment: naive round-robin test splitting (compare vs #19423)#19451
Closed
smartcontracts wants to merge 8 commits intodevelopfrom
Closed
experiment: naive round-robin test splitting (compare vs #19423)#19451smartcontracts wants to merge 8 commits intodevelopfrom
smartcontracts wants to merge 8 commits intodevelopfrom
Conversation
Replace the single serial memory-all job with 8 parallel shard jobs using CircleCI matrix. Each shard runs a non-overlapping subset of test packages defined in acceptance-tests.yaml. Wall-clock = longest shard, not sum. Also: - Move contracts-bedrock-coverage to develop-only (saves ~14min from PR path) - Move contracts-bedrock-upload to develop-only - Add check-shard-coverage.sh to catch orphan test packages that aren't in any shard (runs automatically in each shard job) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With coverage generation moved to develop-only, make the patch coverage status check informational so it cannot block PRs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move flaky RPC-dependent tests (op-deployer integration_test, op-validator, etc.) from TEST_PKGS to RPC_TEST_PKGS. These only run in go-tests-full on develop, not in go-tests-short on PRs. Split op-e2e/system/... into 14 sub-packages and reorder TEST_PKGS for better round-robin distribution across 12 CI nodes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add new tests/rules package to ci-shard-misc gate
- Fix SC2295 shellcheck warnings: quote expansions inside ${..}
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…chore/naive-test-splitting
Emulates the naive sharding approach from ethereum-optimism/infra#566: all 46 acceptance test packages sorted alphabetically and distributed round-robin (i % 8) across 8 shards. Same packages, same CI structure, just different assignment — compare wall-clock time against #19423's curated shards to measure the cost of naive vs domain-aware splitting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces curated ci-shard gates with an exact emulation of the naive sharding from ethereum-optimism/infra#566: 1. Discover all 68 test packages on disk (loadGatelessValidators) 2. sort.Strings (Go lexicographic) 3. i % 8 == shardIndex (round-robin) 4. --exclude-gates flake-shake (post-shard, removes 4 packages) Result: 64 packages across 8 shards (7-9 per shard). This includes fault-proof tests, external-network tests, and everything else that the curated shards in #19423 intentionally excluded. The naive approach has no awareness of test weight or prerequisites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #19451 +/- ##
==========================================
- Coverage 76.5% 76.1% -0.5%
==========================================
Files 729 591 -138
Lines 81441 74215 -7226
==========================================
- Hits 62332 56504 -5828
+ Misses 18965 17567 -1398
Partials 144 144
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Add path-filtering detection for rust/ changes. When no Rust files are modified on a feature branch, the kona-build-release job skips cargo build and uses cached binaries from the restored target cache instead. This saves ~9 minutes on PRs that don't touch Rust code. Safety backstops: - Always builds on develop/main regardless - Falls through to full build if cached binaries are missing - Default parameter value (true) means all existing invocations are unaffected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Experiment to compare wall-clock time of naive vs curated test splitting.
i % 8distribution (emulates infra#566)What to compare
Run both this PR and #19423, then compare the
memory-shard-*job durations:The longest shard determines wall-clock time. Curated sharding should produce more balanced shards since it groups by test weight, while naive sharding distributes blindly.
Shard distribution
🤖 Generated with Claude Code