Skip to content

Add file-based caching for PackageGraph to speed up turbo runs#12172

Draft
anthonyshew wants to merge 3 commits intomainfrom
claude/cache-graph-evaluation-A55zN
Draft

Add file-based caching for PackageGraph to speed up turbo runs#12172
anthonyshew wants to merge 3 commits intomainfrom
claude/cache-graph-evaluation-A55zN

Conversation

@anthonyshew
Copy link
Contributor

Description

This PR implements a file-based cache for the PackageGraph to avoid rebuilding it from scratch on every turbo run. The cache is keyed by content hashes of all input files that determine the package graph (workspace package.json files, lockfiles, workspace config, turbo.json files, etc.).

Key features:

  • Content-hash fingerprinting: Uses xxHash64 to fingerprint all input files, immune to git checkouts, file copies, and NFS clock skew
  • Workspace discovery always runs: New/removed packages are detected even with a cache hit
  • Version safety: Cache includes format version and turbo version to prevent cross-version skew
  • Atomic writes: Uses tempfile + rename to prevent partial reads
  • Graceful degradation: Any cache errors silently fall back to full rebuild
  • Lazy lockfile loading: Lockfile parsing happens in parallel with cache load to minimize overhead
  • Racy-git resolution: Enhanced git index handling to resolve stat mismatches via content hashing, avoiding re-hashing during per-package operations

Implementation details:

  • New cache.rs module in turborepo-repository with serialization/deserialization of PackageGraph
  • New lazy_lockfile.rs module for background lockfile parsing
  • Integration in RunBuilder to attempt cache load before building graph, and save after successful build
  • Enhanced git index classification to handle racy-git and stat mismatches more efficiently
  • Added tracing spans for observability of cache operations and file hashing

The cache is stored at .turbo/cache/package-graph-v1.json and is best-effort — failures during cache operations don't affect correctness.

Testing Instructions

  • Existing unit tests for fingerprinting and serialization pass
  • Run turbo run twice in a monorepo without file changes — second run should load from cache (check logs for "package graph loaded from cache")
  • Modify a workspace package.json and run again — cache should invalidate and rebuild
  • Add/remove a workspace package — cache should invalidate
  • Verify turbo run still works correctly with cache disabled or on first run

https://claude.ai/code/session_01HXQRNfNhJQu26zPuR1VppH

claude added 3 commits March 6, 2026 04:23
Add a content-hash fingerprinted cache for the package graph that saves
~136ms on subsequent `turbo run` invocations when no input files have
changed. Uses xxHash64 content hashes (not mtimes) for staleness
detection, atomic writes for concurrency safety, and graceful fallback
to full rebuild on any error.

Key components:
- cache.rs: Serialization/deserialization of PackageGraph with
  fingerprint validation (format version + turbo version + content hashes)
- lazy_lockfile.rs: Defers lockfile parsing to a background task on
  cache hit, moving ~113ms off the critical path
- builder.rs: Integration point that checks cache before building,
  saves cache after building on miss

Correctness guarantees:
- Content hashes immune to git checkout/cp --preserve/NFS clock skew
- Workspace discovery always runs (new/removed packages detected)
- All errors fall back silently to full rebuild
- Cache versioned by format + turbo binary version
- Atomic writes via tempfile + rename

https://claude.ai/code/session_01HXQRNfNhJQu26zPuR1VppH
…_scope

When the git index stats don't match the filesystem (common in CI,
containers, and fresh checkouts), all tracked files were classified as
"modified" and deferred to per-package hashing during hash_scope. This
caused thousands of unnecessary SHA1 file hashes on the critical path.

Fix: content-hash stat-mismatched files during the parallel index build
(which overlaps with package graph construction) instead of deferring to
hash_scope. If the computed hash matches the index entry, classify the
file as clean. This moves I/O from the critical path to a background
thread.

Also resolve racy-git entries (mtime >= index timestamp) the same way
instead of conservatively marking them as modified.

Measured on this repo (~30 packages, clean working tree):
- hash_scope: 370ms → 93ms (-75%)
- scm_task_await: 400ms → 159ms (-60%)
- Total TTFT: 1690ms → 1010ms (-40%)

Add tracing spans to task hashing pipeline for ongoing perf monitoring.

https://claude.ai/code/session_01HXQRNfNhJQu26zPuR1VppH
The package graph cache saved ~160ms of graph construction but that work
runs in parallel with SCM indexing (~688ms), so it never reduced wall
clock time. Remove it to keep the codebase simpler.

The actual performance win comes from the stat-mismatch resolution in
the previous commit, which targets hash_scope on the critical path.

https://claude.ai/code/session_01HXQRNfNhJQu26zPuR1VppH
@anthonyshew anthonyshew requested a review from a team as a code owner March 6, 2026 12:51
@anthonyshew anthonyshew requested review from tknickman and removed request for a team March 6, 2026 12:51
@vercel
Copy link
Contributor

vercel bot commented Mar 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
examples-basic-web Ready Ready Preview, Comment, Open in v0 Mar 6, 2026 0:51am
examples-designsystem-docs Ready Ready Preview, Comment, Open in v0 Mar 6, 2026 0:51am
examples-gatsby-web Ready Ready Preview, Comment, Open in v0 Mar 6, 2026 0:51am
examples-kitchensink-blog Ready Ready Preview, Comment, Open in v0 Mar 6, 2026 0:51am
examples-nonmonorepo Ready Ready Preview, Comment, Open in v0 Mar 6, 2026 0:51am
examples-svelte-web Ready Ready Preview, Comment, Open in v0 Mar 6, 2026 0:51am
examples-tailwind-web Ready Ready Preview, Comment, Open in v0 Mar 6, 2026 0:51am
examples-vite-web Ready Ready Preview, Comment, Open in v0 Mar 6, 2026 0:51am
turbo-site Ready Ready Preview, Comment, Open in v0 Mar 6, 2026 0:51am
turborepo-agents Ready Ready Preview, Comment, Open in v0 Mar 6, 2026 0:51am
turborepo-test-coverage Ready Ready Preview, Comment, Open in v0 Mar 6, 2026 0:51am

@anthonyshew anthonyshew marked this pull request as draft March 6, 2026 12:52
Copy link
Contributor

@vercel vercel bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Suggestion:

Doc comment on new_from_gix_index states racy-git entries are "deferred to per-package hashing" when the implementation does the exact opposite — content-hashing them inline.

Fix on Vercel

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Coverage Report

Metric Coverage
Lines 85.63%
Functions 81.77%
Branches 0.00%

View full report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants