Add file-based caching for PackageGraph to speed up turbo runs#12172
Draft
anthonyshew wants to merge 3 commits intomainfrom
Draft
Add file-based caching for PackageGraph to speed up turbo runs#12172anthonyshew wants to merge 3 commits intomainfrom
anthonyshew wants to merge 3 commits intomainfrom
Conversation
Add a content-hash fingerprinted cache for the package graph that saves ~136ms on subsequent `turbo run` invocations when no input files have changed. Uses xxHash64 content hashes (not mtimes) for staleness detection, atomic writes for concurrency safety, and graceful fallback to full rebuild on any error. Key components: - cache.rs: Serialization/deserialization of PackageGraph with fingerprint validation (format version + turbo version + content hashes) - lazy_lockfile.rs: Defers lockfile parsing to a background task on cache hit, moving ~113ms off the critical path - builder.rs: Integration point that checks cache before building, saves cache after building on miss Correctness guarantees: - Content hashes immune to git checkout/cp --preserve/NFS clock skew - Workspace discovery always runs (new/removed packages detected) - All errors fall back silently to full rebuild - Cache versioned by format + turbo binary version - Atomic writes via tempfile + rename https://claude.ai/code/session_01HXQRNfNhJQu26zPuR1VppH
…_scope When the git index stats don't match the filesystem (common in CI, containers, and fresh checkouts), all tracked files were classified as "modified" and deferred to per-package hashing during hash_scope. This caused thousands of unnecessary SHA1 file hashes on the critical path. Fix: content-hash stat-mismatched files during the parallel index build (which overlaps with package graph construction) instead of deferring to hash_scope. If the computed hash matches the index entry, classify the file as clean. This moves I/O from the critical path to a background thread. Also resolve racy-git entries (mtime >= index timestamp) the same way instead of conservatively marking them as modified. Measured on this repo (~30 packages, clean working tree): - hash_scope: 370ms → 93ms (-75%) - scm_task_await: 400ms → 159ms (-60%) - Total TTFT: 1690ms → 1010ms (-40%) Add tracing spans to task hashing pipeline for ongoing perf monitoring. https://claude.ai/code/session_01HXQRNfNhJQu26zPuR1VppH
The package graph cache saved ~160ms of graph construction but that work runs in parallel with SCM indexing (~688ms), so it never reduced wall clock time. Remove it to keep the codebase simpler. The actual performance win comes from the stat-mismatch resolution in the previous commit, which targets hash_scope on the critical path. https://claude.ai/code/session_01HXQRNfNhJQu26zPuR1VppH
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Coverage Report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR implements a file-based cache for the
PackageGraphto avoid rebuilding it from scratch on everyturbo run. The cache is keyed by content hashes of all input files that determine the package graph (workspace package.json files, lockfiles, workspace config, turbo.json files, etc.).Key features:
Implementation details:
cache.rsmodule inturborepo-repositorywith serialization/deserialization ofPackageGraphlazy_lockfile.rsmodule for background lockfile parsingRunBuilderto attempt cache load before building graph, and save after successful buildThe cache is stored at
.turbo/cache/package-graph-v1.jsonand is best-effort — failures during cache operations don't affect correctness.Testing Instructions
turbo runtwice in a monorepo without file changes — second run should load from cache (check logs for "package graph loaded from cache")turbo runstill works correctly with cache disabled or on first runhttps://claude.ai/code/session_01HXQRNfNhJQu26zPuR1VppH