Skip to content

Commit 9fef3f5

Browse files
authored
perf: Race parallel git subprocesses against filesystem walk for optimal index construction (#12206)
## Summary - Replaces `new_from_gix_index` (stat every tracked file) + `walk_candidate_files` with a faster hybrid approach - Uses `git ls-tree` + `git diff-index` subprocesses for the tracked index (simpler, fast everywhere) - **Races** `walk_candidate_files` (8-thread ignore-crate walk) against `git ls-files --others` (git subprocess) for untracked file discovery — whichever finishes first wins - The race guarantees optimal performance on every platform without platform-specific code paths ## Why No single untracked-file discovery method is fastest everywhere: | Method | macOS APFS | Linux ext4 | |---|---|---| | `walk_candidate_files` (8 threads) | **~440ms** | ~474ms | | `git ls-files --others` (single thread) | ~530ms | **~231ms** | Racing both and using the winner eliminates regressions on either platform. ## How the race works Four operations spawn on separate threads: 1. `git ls-tree -r HEAD -z` — blob OIDs (~60-110ms) 2. `git diff-index HEAD -z` — modified/deleted (~95-150ms) 3. `walk_candidate_files` — 8-thread filesystem walk 4. `git ls-files --others -z` — git subprocess Operations 3 and 4 send results through an `mpsc` channel. The first result wins. The losing thread runs to completion and its result is discarded. If `ls-files` wins: its output is the untracked file list directly. If `walk` wins: candidates are filtered against `ls-tree` hashes to find untracked files. ## Benchmark (110-package monorepo, 30 runs, sandboxed Linux) | | Mean | Min | Max | |---|---|---|---| | Baseline (main) | 878ms ± 27ms | 840ms | 953ms | | This PR | **437ms ± 7ms** | 427ms | 455ms | **2.01x faster** on Linux. No regression on macOS (walk wins the race at ~440ms, same as the split-walk approach). ## Test Coverage 18 new regression tests across three categories: **Category 7 — Ground truth** (8 tests): Establish correct per-package hashes across edge cases (staged changes/new files/deletions, unstaged modifications, no-commit repos, comprehensive mixed state). **Category 8 — Subprocess+race equivalence** (5 tests): Verify the race-based constructor produces identical results to gix-index and no-index paths. **Category 9 — Race arm equivalence** (5 tests): Independently verify each arm of the race (walk path and ls-files path) produces identical results, so the winner is always correct regardless of which arm wins. ## How to Review 1. `repo_index.rs` — `new_from_subprocess_and_walk`: the race implementation with `mpsc` channel 2. `ls_tree.rs` — `git_ls_tree_repo_root_sorted`, `git_diff_index_repo_root`, `git_ls_files_untracked` + parsers 3. `lib.rs` — `build_repo_index_from_subprocesses` accepts prefixes, calls new constructor 4. `builder.rs` — Passes `all_package_prefixes` into the method 5. `git_index_regression_tests.rs` — `build_walk_arm_index`, `build_ls_files_arm_index` helpers + 18 new tests
1 parent a9bbb9e commit 9fef3f5

File tree

5 files changed

+1025
-52
lines changed

5 files changed

+1025
-52
lines changed

crates/turborepo-lib/src/run/builder.rs

Lines changed: 16 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -261,24 +261,12 @@ impl RunBuilder {
261261
);
262262
let start_at = Local::now();
263263

264-
let (tracked_index_tx, tracked_index_rx) =
265-
tokio::sync::oneshot::channel::<Option<turborepo_scm::RepoGitIndex>>();
266-
let (git_root_tx, git_root_rx) =
267-
tokio::sync::oneshot::channel::<Option<turbopath::AbsoluteSystemPathBuf>>();
268264
let scm_task = {
269265
let repo_root = self.repo_root.clone();
270266
let git_root = self.opts.git_root.clone();
271-
tokio::task::spawn_blocking(move || {
272-
let scm = match git_root {
273-
Some(root) => SCM::new_with_git_root(&repo_root, root),
274-
None => SCM::new(&repo_root),
275-
};
276-
// Send git root immediately so the filesystem walk can start
277-
// while index construction continues.
278-
let _ = git_root_tx.send(scm.git_root().map(|r| r.to_owned()));
279-
let repo_index = scm.build_tracked_repo_index_eager();
280-
let _ = tracked_index_tx.send(repo_index);
281-
scm
267+
tokio::task::spawn_blocking(move || match git_root {
268+
Some(root) => SCM::new_with_git_root(&repo_root, root),
269+
None => SCM::new(&repo_root),
282270
})
283271
};
284272
let package_json_path = self.repo_root.join_component("package.json");
@@ -354,39 +342,24 @@ impl RunBuilder {
354342
repo_telemetry.track_size(pkg_dep_graph.len());
355343
run_telemetry.track_run_type(self.opts.run_opts.dry_run.is_some());
356344

357-
// Spawn the filesystem walk as soon as the git root is resolved.
358-
// It only needs the git root and package prefixes, not the tracked
359-
// index. The walk runs in parallel with new_from_gix_index (~267ms).
345+
// Build the repo index using parallel git subprocesses for the tracked
346+
// index (ls-tree + diff-index) and a race between walk_candidate_files
347+
// and git ls-files for untracked discovery. The race ensures optimal
348+
// performance: the walk wins on macOS, ls-files wins on Linux.
360349
let all_prefixes = Self::all_package_prefixes(&pkg_dep_graph);
361-
let walk_task = if all_prefixes.is_empty() {
350+
let scm = scm_task
351+
.instrument(tracing::info_span!("scm_task_await"))
352+
.await
353+
.expect("detecting scm panicked");
354+
let repo_index_task = if all_prefixes.is_empty() {
362355
None
363356
} else {
364-
Some(tokio::task::spawn(async move {
365-
let git_root = match git_root_rx.await {
366-
Ok(Some(root)) => root,
367-
_ => return None,
368-
};
369-
tokio::task::spawn_blocking(move || {
370-
let _span = tracing::info_span!("walk_candidate_files").entered();
371-
turborepo_scm::walk_candidate_files(git_root.as_std_path(), Some(&all_prefixes))
372-
.ok()
373-
})
374-
.await
375-
.ok()?
357+
let scm = scm.clone();
358+
Some(tokio::task::spawn_blocking(move || {
359+
let _span = tracing::info_span!("build_repo_index_subprocesses").entered();
360+
scm.build_repo_index_from_subprocesses(&all_prefixes)
376361
}))
377362
};
378-
379-
// Combine the walk results with the tracked index once both are ready.
380-
let repo_index_task = walk_task.map(|walk_task| {
381-
tokio::task::spawn(async move {
382-
let (candidates, tracked_index) = tokio::join!(walk_task, tracked_index_rx);
383-
let candidates = candidates.ok()??;
384-
let tracked_index = tracked_index.ok()??;
385-
let mut repo_index = tracked_index;
386-
repo_index.populate_untracked_from_candidates(candidates);
387-
Some(repo_index)
388-
})
389-
});
390363
let micro_frontend_configs = {
391364
let _span = tracing::info_span!("micro_frontends_from_disk").entered();
392365
match MicrofrontendsConfigs::from_disk(&self.repo_root, &pkg_dep_graph) {
@@ -497,13 +470,6 @@ impl RunBuilder {
497470
turbo_json_loader.preload_all();
498471
}
499472

500-
// Await the SCM background task. The tracked index was already
501-
// forwarded to the untracked walk via oneshot channel above.
502-
let scm = scm_task
503-
.instrument(tracing::info_span!("scm_task_await"))
504-
.await
505-
.expect("detecting scm panicked");
506-
507473
let filtered_pkgs = {
508474
let _span = tracing::info_span!("calculate_filtered_packages").entered();
509475
Self::calculate_filtered_packages(

0 commit comments

Comments
 (0)