Skip to content

perf: Race parallel git subprocesses against filesystem walk for optimal index construction#12206

Merged
anthonyshew merged 1 commit intomainfrom
perf/subprocess-repo-index
Mar 9, 2026
Merged

perf: Race parallel git subprocesses against filesystem walk for optimal index construction#12206
anthonyshew merged 1 commit intomainfrom
perf/subprocess-repo-index

Conversation

@anthonyshew
Copy link
Contributor

@anthonyshew anthonyshew commented Mar 9, 2026

Summary

  • Replaces new_from_gix_index (stat every tracked file) + walk_candidate_files with a faster hybrid approach
  • Uses git ls-tree + git diff-index subprocesses for the tracked index (simpler, fast everywhere)
  • Races walk_candidate_files (8-thread ignore-crate walk) against git ls-files --others (git subprocess) for untracked file discovery — whichever finishes first wins
  • The race guarantees optimal performance on every platform without platform-specific code paths

Why

No single untracked-file discovery method is fastest everywhere:

Method macOS APFS Linux ext4
walk_candidate_files (8 threads) ~440ms ~474ms
git ls-files --others (single thread) ~530ms ~231ms

Racing both and using the winner eliminates regressions on either platform.

How the race works

Four operations spawn on separate threads:

  1. git ls-tree -r HEAD -z — blob OIDs (~60-110ms)
  2. git diff-index HEAD -z — modified/deleted (~95-150ms)
  3. walk_candidate_files — 8-thread filesystem walk
  4. git ls-files --others -z — git subprocess

Operations 3 and 4 send results through an mpsc channel. The first result wins. The losing thread runs to completion and its result is discarded.

If ls-files wins: its output is the untracked file list directly.
If walk wins: candidates are filtered against ls-tree hashes to find untracked files.

Benchmark (110-package monorepo, 30 runs, sandboxed Linux)

Mean Min Max
Baseline (main) 878ms ± 27ms 840ms 953ms
This PR 437ms ± 7ms 427ms 455ms

2.01x faster on Linux. No regression on macOS (walk wins the race at ~440ms, same as the split-walk approach).

Test Coverage

18 new regression tests across three categories:

Category 7 — Ground truth (8 tests): Establish correct per-package hashes across edge cases (staged changes/new files/deletions, unstaged modifications, no-commit repos, comprehensive mixed state).

Category 8 — Subprocess+race equivalence (5 tests): Verify the race-based constructor produces identical results to gix-index and no-index paths.

Category 9 — Race arm equivalence (5 tests): Independently verify each arm of the race (walk path and ls-files path) produces identical results, so the winner is always correct regardless of which arm wins.

How to Review

  1. repo_index.rsnew_from_subprocess_and_walk: the race implementation with mpsc channel
  2. ls_tree.rsgit_ls_tree_repo_root_sorted, git_diff_index_repo_root, git_ls_files_untracked + parsers
  3. lib.rsbuild_repo_index_from_subprocesses accepts prefixes, calls new constructor
  4. builder.rs — Passes all_package_prefixes into the method
  5. git_index_regression_tests.rsbuild_walk_arm_index, build_ls_files_arm_index helpers + 18 new tests

@anthonyshew anthonyshew requested a review from a team as a code owner March 9, 2026 11:34
@anthonyshew anthonyshew requested review from tknickman and removed request for a team March 9, 2026 11:34
@vercel
Copy link
Contributor

vercel bot commented Mar 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
examples-basic-web Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 0:08am
examples-designsystem-docs Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 0:08am
examples-gatsby-web Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 0:08am
examples-kitchensink-blog Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 0:08am
examples-nonmonorepo Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 0:08am
examples-svelte-web Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 0:08am
examples-tailwind-web Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 0:08am
examples-vite-web Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 0:08am
turbo-site Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 0:08am
turborepo-agents Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 0:08am
turborepo-test-coverage Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 0:08am

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Coverage Report

Metric Coverage
Lines 85.36%
Functions 81.32%
Branches 0.00%

View full report

Build RepoGitIndex from parallel git subprocesses for the tracked
index (ls-tree + diff-index) and a race between walk_candidate_files
and git ls-files for untracked file discovery.

Four operations run concurrently:
  - git ls-tree -r HEAD -z       (blob OIDs, ~60-110ms)
  - git diff-index HEAD -z       (modified/deleted, ~95-150ms)
  - walk_candidate_files          (8-thread walk, ~440ms macOS / ~474ms Linux)
  - git ls-files --others -z     (single-thread, ~530ms macOS / ~231ms Linux)

The two untracked approaches race via mpsc channel. On macOS, the
multi-threaded walk wins. On Linux, git's optimized subprocess wins.
Using whichever finishes first guarantees no regressions on either
platform.

Benchmark (110-package monorepo, 30 runs, sandboxed Linux):
  baseline: 878ms ± 27ms
  improved: 437ms ± 7ms  (2.01x faster)
@anthonyshew anthonyshew force-pushed the perf/subprocess-repo-index branch from 84ecb26 to aaa38e9 Compare March 9, 2026 12:07
@anthonyshew anthonyshew changed the title perf: Replace gix-index + filesystem walk with parallel git subprocesses perf: Race parallel git subprocesses against filesystem walk for optimal index construction Mar 9, 2026
@anthonyshew anthonyshew merged commit 9fef3f5 into main Mar 9, 2026
55 checks passed
@anthonyshew anthonyshew deleted the perf/subprocess-repo-index branch March 9, 2026 12:22
github-actions bot added a commit that referenced this pull request Mar 9, 2026
## Release v2.8.15-canary.11

Versioned docs: https://v2-8-15-canary-11.turborepo.dev

### Changes

- release(turborepo): 2.8.15-canary.10 (#12205) (`a9bbb9e`)
- perf: Race parallel git subprocesses against filesystem walk for
optimal index construction (#12206) (`9fef3f5`)
- perf: Two-phase HTTP client init to avoid macOS Keychain blocking
(#12208) (`892cb1b`)

---------

Co-authored-by: Turbobot <turbobot@vercel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant