Skip to content

perf: Decouple filesystem walk from git index construction#12204

Merged
anthonyshew merged 1 commit intomainfrom
perf/early-untracked-walk
Mar 9, 2026
Merged

perf: Decouple filesystem walk from git index construction#12204
anthonyshew merged 1 commit intomainfrom
perf/early-untracked-walk

Conversation

@anthonyshew
Copy link
Contributor

Summary

  • Splits the untracked-file discovery into two parallel phases: an I/O-bound filesystem walk and a CPU-bound index filter
  • The walk uses the ignore crate's native gitignore handling (reading .gitignore from disk), removing the dependency on the tracked git index
  • Git root is forwarded via a oneshot channel as soon as SCM::new() resolves it (~5ms), letting the walk start while new_from_gix_index (~267ms) continues

Why

After #12201, the critical path was new_from_gix_index (267ms) → find_untracked_files (509ms) = 776ms sequential. These two operations are now fully overlapped.

Benchmark (110-package monorepo, 30 runs, sandboxed)

Mean Min Max
Baseline (main + #12201) 853ms ± 19ms 834ms 928ms
This PR 619ms ± 6ms 606ms 635ms

1.38x faster (234ms wall clock saved). repo_index_untracked_await dropped from 505ms to 89ms.

Test Coverage

4 new regression tests validating the split walk produces identical results to the original single-pass approach:

  • test_split_walk_matches_original_path — baseline equivalence verified against no-index subprocess path
  • test_split_walk_respects_gitignore — root, nested, and package-level .gitignore rules
  • test_split_walk_with_untracked_gitignore — untracked .gitignore files are applied during the walk
  • test_split_walk_nested_gitignore_scoping — scoped ignore rules don't leak across packages

How to Review

  1. repo_index.rswalk_candidate_files (I/O phase), filter_untracked_from_candidates (CPU phase), populate_untracked_from_candidates (integration into RepoGitIndex)
  2. builder.rsgit_root_tx/git_root_rx channel, walk_task spawned from channel, tokio::join! to combine walk + index
  3. lib.rsSCM::git_root() accessor, re-export of walk_candidate_files
  4. git_index_regression_tests.rsbuild_split_repo_index helper and 4 new tests

@anthonyshew anthonyshew requested a review from a team as a code owner March 9, 2026 02:41
@anthonyshew anthonyshew requested review from tknickman and removed request for a team March 9, 2026 02:41
@vercel
Copy link
Contributor

vercel bot commented Mar 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
examples-basic-web Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 2:46am
examples-designsystem-docs Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 2:46am
examples-gatsby-web Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 2:46am
examples-kitchensink-blog Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 2:46am
examples-nonmonorepo Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 2:46am
examples-svelte-web Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 2:46am
examples-tailwind-web Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 2:46am
examples-vite-web Building Building Preview, Comment, Open in v0 Mar 9, 2026 2:46am
turbo-site Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 2:46am
turborepo-agents Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 2:46am
turborepo-test-coverage Ready Ready Preview, Comment, Open in v0 Mar 9, 2026 2:46am

Split find_untracked_files into two phases that can run in parallel:

1. walk_candidate_files (I/O-bound): Enumerates all non-gitignored files
   within scope using the ignore crate's native gitignore support. Only
   needs the git root path and package prefixes — no tracked index.

2. filter_untracked_from_candidates (CPU-bound): Binary-searches each
   candidate against ls_tree_hashes and status_entries to identify truly
   untracked files. Runs after the tracked index is ready.

The git root is sent via a oneshot channel as soon as SCM::new() resolves
it (~5ms), while new_from_gix_index continues (~267ms). The walk starts
immediately and runs in parallel with index construction.

Benchmark (110-package monorepo, 30 runs, sandbox):
  baseline: 853ms ± 19ms
  improved: 619ms ± 6ms  (1.38x faster)
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Coverage Report

Metric Coverage
Lines 85.36%
Functions 81.44%
Branches 0.00%

View full report

@anthonyshew anthonyshew merged commit 03f9749 into main Mar 9, 2026
54 checks passed
@anthonyshew anthonyshew deleted the perf/early-untracked-walk branch March 9, 2026 03:20
github-actions bot added a commit that referenced this pull request Mar 9, 2026
## Release v2.8.15-canary.10

Versioned docs: https://v2-8-15-canary-10.turborepo.dev

### Changes

- release(turborepo): 2.8.15-canary.9 (#12203) (`24bd765`)
- perf: Decouple filesystem walk from git index construction (#12204)
(`03f9749`)

---------

Co-authored-by: Turbobot <turbobot@vercel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant