perf: Race parallel git subprocesses against filesystem walk for optimal index construction#12206
Merged
anthonyshew merged 1 commit intomainfrom Mar 9, 2026
Merged
Conversation
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
ea2430b to
e2d4e61
Compare
e2d4e61 to
2a6e68e
Compare
2a6e68e to
84ecb26
Compare
Contributor
Coverage Report
|
Build RepoGitIndex from parallel git subprocesses for the tracked index (ls-tree + diff-index) and a race between walk_candidate_files and git ls-files for untracked file discovery. Four operations run concurrently: - git ls-tree -r HEAD -z (blob OIDs, ~60-110ms) - git diff-index HEAD -z (modified/deleted, ~95-150ms) - walk_candidate_files (8-thread walk, ~440ms macOS / ~474ms Linux) - git ls-files --others -z (single-thread, ~530ms macOS / ~231ms Linux) The two untracked approaches race via mpsc channel. On macOS, the multi-threaded walk wins. On Linux, git's optimized subprocess wins. Using whichever finishes first guarantees no regressions on either platform. Benchmark (110-package monorepo, 30 runs, sandboxed Linux): baseline: 878ms ± 27ms improved: 437ms ± 7ms (2.01x faster)
84ecb26 to
aaa38e9
Compare
github-actions bot
added a commit
that referenced
this pull request
Mar 9, 2026
## Release v2.8.15-canary.11 Versioned docs: https://v2-8-15-canary-11.turborepo.dev ### Changes - release(turborepo): 2.8.15-canary.10 (#12205) (`a9bbb9e`) - perf: Race parallel git subprocesses against filesystem walk for optimal index construction (#12206) (`9fef3f5`) - perf: Two-phase HTTP client init to avoid macOS Keychain blocking (#12208) (`892cb1b`) --------- Co-authored-by: Turbobot <turbobot@vercel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
new_from_gix_index(stat every tracked file) +walk_candidate_fileswith a faster hybrid approachgit ls-tree+git diff-indexsubprocesses for the tracked index (simpler, fast everywhere)walk_candidate_files(8-thread ignore-crate walk) againstgit ls-files --others(git subprocess) for untracked file discovery — whichever finishes first winsWhy
No single untracked-file discovery method is fastest everywhere:
walk_candidate_files(8 threads)git ls-files --others(single thread)Racing both and using the winner eliminates regressions on either platform.
How the race works
Four operations spawn on separate threads:
git ls-tree -r HEAD -z— blob OIDs (~60-110ms)git diff-index HEAD -z— modified/deleted (~95-150ms)walk_candidate_files— 8-thread filesystem walkgit ls-files --others -z— git subprocessOperations 3 and 4 send results through an
mpscchannel. The first result wins. The losing thread runs to completion and its result is discarded.If
ls-fileswins: its output is the untracked file list directly.If
walkwins: candidates are filtered againstls-treehashes to find untracked files.Benchmark (110-package monorepo, 30 runs, sandboxed Linux)
2.01x faster on Linux. No regression on macOS (walk wins the race at ~440ms, same as the split-walk approach).
Test Coverage
18 new regression tests across three categories:
Category 7 — Ground truth (8 tests): Establish correct per-package hashes across edge cases (staged changes/new files/deletions, unstaged modifications, no-commit repos, comprehensive mixed state).
Category 8 — Subprocess+race equivalence (5 tests): Verify the race-based constructor produces identical results to gix-index and no-index paths.
Category 9 — Race arm equivalence (5 tests): Independently verify each arm of the race (walk path and ls-files path) produces identical results, so the winner is always correct regardless of which arm wins.
How to Review
repo_index.rs—new_from_subprocess_and_walk: the race implementation withmpscchannells_tree.rs—git_ls_tree_repo_root_sorted,git_diff_index_repo_root,git_ls_files_untracked+ parserslib.rs—build_repo_index_from_subprocessesaccepts prefixes, calls new constructorbuilder.rs— Passesall_package_prefixesinto the methodgit_index_regression_tests.rs—build_walk_arm_index,build_ls_files_arm_indexhelpers + 18 new tests