Skip to content

[8.7.0] Cherry-pick remote repo contents cache feature#28997

Open
fmeum wants to merge 23 commits intobazelbuild:release-8.7.0from
fmeum:cherry-pick-rrcc-8.7.0
Open

[8.7.0] Cherry-pick remote repo contents cache feature#28997
fmeum wants to merge 23 commits intobazelbuild:release-8.7.0from
fmeum:cherry-pick-rrcc-8.7.0

Conversation

@fmeum
Copy link
Collaborator

@fmeum fmeum commented Mar 13, 2026

Summary

Cherry-picks the remote repo contents cache feature from master to release-8.7.0.

This consists of:

  • 3 manual prerequisite ports (recorded input ordering, repo env handling, predeclaredInputHash/ImmutableSortedMap changes)
  • 1 non-functional refactoring prerequisite
  • 20 feature commits implementing the remote repo contents cache

The remote repo contents cache allows caching external repository contents in a remote cache (HTTP/gRPC), served via an in-memory overlay filesystem (RemoteExternalOverlayFileSystem). This builds on the local repo contents cache already present on release-8.7.0.

Prerequisite commits (manually ported)

  • 5e3f0c8373 — Refactoring from: Make external repo file checking actually useful
  • 41ccfefb88 — Preserve order of recorded inputs (ImmutableMapImmutableSortedMap, Comparable on subclasses)
  • 01407ce758 — Fix and consolidate repo env handling (RepoEnvironmentFunction, EnvironmentVariableValue)
  • fe040a3271 — Fold environ into the predeclared inputs hash

Feature commits (cherry-picked with -x)

  1. Fix crash when repo contents cache is under the main repo
  2. Deduplicate identical repo contents cache entries during GC
  3. Make naming scheme for repo contents cache entries more reliable
  4. Prepare for the addition of a remote repo contents cache
  5. Add a remote repo contents cache
  6. Reproduce a Skyframe cycle with the repo contents cache in a test
  7. Fix NPE with remote repo contents cache
  8. Make remote repo contents cache less spammy
  9. Fix materialization edge cases in the remote repo contents cache
  10. Fix RemoteExternalOverlayFileSystem#resolveSymbolicLinks
  11. Allow more general exceptions in getConfiguration
  12. Prefetch .bzl files in the remote repo contents cache
  13. Show stack traces in the remote repo contents cache with --verbose_failures
  14. Fix repo contents cache FileValue staleness
  15. Get the local and remote repo contents cache to work together
  16. Clarify the invalidation of REPO_CONTENTS_CACHE_DIRS FileStateValues
  17. Fix cycles when checking the local repo contents cache
  18. Fix remote repo contents cache issues
  19. Materialize important outputs from remote external repos

Adaptation notes

Key structural differences from master:

  • On release-8.7.0, RepositoryDelegatorFunction.java + StarlarkRepositoryFunction.java exist separately; on master they were merged into RepositoryFetchFunction.java
  • DigestWriter is an inner class of RepositoryDelegatorFunction on release-8.7.0; it's a separate file on master
  • RepoEnvironmentFunction on release-8.7.0 checks PrecomputedValue.REPO_ENV first, then falls back to ClientEnvironmentFunction (vs master's consolidated approach

Test plan

  • passes
  • Full CI

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
EOF
)

…seful

Non-functional changes only: remove Pair indirection in
ExternalFilesHelper, extract getExternalRepoName() and
getExternalDirectory() helpers, move addExternalFilesDependencies
into ExternalFilesHelper, modernize switch expression in
DirtinessCheckerUtils, formatting fixes.

Does not include the functional behavior change of refetching repos
on external modifications.

(cherry picked from commit 5e3f0c8)
@fmeum fmeum requested a review from a team as a code owner March 13, 2026 20:37
@github-actions github-actions bot added team-Performance Issues for Performance teams team-Configurability platforms, toolchains, cquery, select(), config transitions team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Rules-CPP Issues for C++ rules team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Mar 13, 2026
@google-cla
Copy link

google-cla bot commented Mar 13, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@iancha1992 iancha1992 added this to the 8.7.0 release blockers milestone Mar 13, 2026
@fmeum fmeum force-pushed the cherry-pick-rrcc-8.7.0 branch from 93f7825 to 8691de4 Compare March 14, 2026 19:41
fmeum and others added 7 commits March 14, 2026 20:43
… inputs

Ports the essential API changes from 41ccfef needed by later feature
commits:
- Add RepoRecordedInput.WithValue record with parse/toString/escape/unescape
- Add overloaded isAnyValueOutdated(Environment, BlazeDirectories, List<WithValue>)
- Remove Comparable<RepoRecordedInput> and COMPARATOR (replaced by order preservation)
- Change TreeMap to LinkedHashMap in RepositoryDelegatorFunction for order preservation

(cherry picked from commit 41ccfef)
…nv handling

Ports the essential API changes from 01407ce needed by later feature
commits:
- Add EnvironmentVariableValue record type
- Add RepoEnvironmentFunction with REPO_ENV + client env fallback
- Register REPOSITORY_ENVIRONMENT_VARIABLE in SkyFunctions and SkyframeExecutor
- Update EnvVar.getSkyKey() to use RepoEnvironmentFunction
- Update EnvVar.isOutdated() to use EnvironmentVariableValue

On 8.7.0, RepoEnvironmentFunction checks --repo_env first, then falls
back to the client environment via ClientEnvironmentFunction, since the
consolidated repo env computation from CommandEnvironment is not present.

(cherry picked from commit 01407ce)
Ports the essential changes from fe040a3:
- Rename DigestWriter.ruleKey to predeclaredInputHash and make it
  package-private (needed by later feature commits)
- Switch RepoRecordedInput.File, Dirents, DirTree, EnvVar types to
  implement Comparable and use ImmutableSortedMap
- Add ImmutableSortedMap Gson type adapter
- Update LockFileModuleExtension, RunnableExtension, and related types
  to use ImmutableSortedMap for recorded inputs

Does NOT include the change to fold environ values into the
predeclared input hash computation itself; that requires
CommandEnvironment changes not present on 8.7.0.

(cherry picked from commit fe040a3)
* Rename `RepoContentsCache` to `LocalRepoContentsCache`
* Generalize `RemoteRepositoryRemoteExecutorFactory` to `RemoteRepositoryHelperFactory`

Work towards bazelbuild#6359

Closes bazelbuild#27311.

PiperOrigin-RevId: 822553693
Change-Id: I1bad204340c06621cea806368d6bec99ca450a0f
(cherry picked from commit 32be423)
@fmeum fmeum force-pushed the cherry-pick-rrcc-8.7.0 branch 7 times, most recently from 0b7b9e5 to 2e832ec Compare March 15, 2026 19:58
fmeum added 5 commits March 15, 2026 21:36
…test

(cherry picked from commit 0336a868183ebcf27e3d4f7fdfac8c9f8b5b3ad3)
I haven't been able to reproduce this in a test, but this should fix the following crash observed while running `bazel info`:
```
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.NullPointerException: Cannot invoke "java.util.concurrent.ExecutorService.shutdownNow()" because "this.materializationExecutor" is null
	at com.google.devtools.build.lib.remote.RemoteExternalOverlayFileSystem.afterCommand(RemoteExternalOverlayFileSystem.java:145)
	at com.google.devtools.build.lib.remote.RemoteModule.afterCommand(RemoteModule.java:1034)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.afterCommand(BlazeRuntime.java:787)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:807)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:266)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:608)
	at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$0(GrpcServerImpl.java:679)
	at io.grpc.Context$1.run(Context.java:566)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
```

Closes bazelbuild#27690.

PiperOrigin-RevId: 833722608
Change-Id: I88c485a01e5967657ec3b5529a47639b743b18e6
(cherry picked from commit a7d0e91)
Don't print a message when it's successful. Users can always look under `external` to verify which repo came from the cache.

Closes bazelbuild#27699.

PiperOrigin-RevId: 834096735
Change-Id: I3916fb240218a6b68ecf48417142b998ca281598
(cherry picked from commit 3ca9ce1)
Fixes the creation of empty directories and also contains a speculative fix for the following issue observed during a sequence of real builds:

```
Error in path: Failed to materialize remote repo @@protoc-gen-validate+: [unix_jni.cc:302] /home/ubuntu/.cache/bazel/_bazel_ubuntu/123/external/protoc-gen-validate+/example-workspace/.bazelrc (File exists)
ERROR: //:foo :: Error loading option //:foo: error evaluating module extension @@gazelle+//:extensions.bzl%go_deps
```

The mentioned file is a symlink.

Closes bazelbuild#27711.

PiperOrigin-RevId: 836122472
Change-Id: I8becd8c3640a659d28dc433340db962c18563d9f
(cherry picked from commit b27ea05)
fmeum added 2 commits March 15, 2026 21:36
Ensures that the returned `Path` is still in the overlay file system.

Also make the error message emitted by `Path#checkSameFileSystem` more informative. This is motivated by and helped discover the above as the fix for the following crash observed when using the remote repo contents cache with an explicit `--sandbox_base`:

```
Caused by: java.lang.IllegalArgumentException: Files are on different filesystems: /dev/shm/bazel-sandbox.b10976335efa519b0184f3091ac8e21f7beefb92142303f9ab2c3341f45a2f28/linux-sandbox/18/execroot/_main/external/c-ares+/configs/ares_build.h (on com.google.devtools.build.lib.unix.UnixFileSystem@5e0a8154), /home/ubuntu/.cache/bazel/_bazel_ubuntu/123/execroot/_main/external/c-ares+/configs/ares_build.h (on com.google.devtools.build.lib.remote.RemoteExternalOverlayFileSystem@6cd9bfda)
        at com.google.devtools.build.lib.vfs.Path.checkSameFileSystem(Path.java:964)
        at com.google.devtools.build.lib.vfs.Path.createSymbolicLink(Path.java:523)
        at com.google.devtools.build.lib.vfs.Path.createSymbolicLink(Path.java:535)
        at com.google.devtools.build.lib.sandbox.SymlinkedSandboxedSpawn.copyFile(SymlinkedSandboxedSpawn.java:129)
```

Alternative to bazelbuild#27721

Closes bazelbuild#27802.

PiperOrigin-RevId: 837832265
Change-Id: I3b73167496b011aef66954d59ca3804b4b64996f
(cherry picked from commit 8eaf6a9)
Fixes bazelbuild#27981

Fixes the following type of crash and, incidentally, a remote repo contents cache test that resulted in a related crash:
```
    FATAL: bazel crashed due to an internal error. Printing stack trace:
    java.lang.IllegalStateException: Unknown error during configuration creation evaluation
            at com.google.devtools.build.lib.skyframe.SkyframeExecutor.getConfiguration(SkyframeExecutor.java:2143)
            at com.google.devtools.build.lib.skyframe.SkyframeExecutor.createConfiguration(SkyframeExecutor.java:1876)
            at com.google.devtools.build.lib.analysis.BuildView.update(BuildView.java:281)
            at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.runAnalysisPhase(AnalysisPhaseRunner.java:399)
            at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.execute(AnalysisPhaseRunner.java:144)
            at com.google.devtools.build.lib.buildtool.BuildTool.buildTargetsWithoutMergedAnalysisExecution(BuildTool.java:512)
            at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:414)
            at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:907)
            at com.google.devtools.build.lib.runtime.commands.CqueryCommand.exec(CqueryCommand.java:197)
            at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:783)
            at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:266)
            at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:608)
            at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$0(GrpcServerImpl.java:679)
            at io.grpc.Context$1.run(Context.java:566)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
            at java.base/java.lang.Thread.run(Unknown Source)
    Caused by: com.google.devtools.build.lib.skyframe.toolchains.PlatformLookupUtil$InvalidPlatformException: com.google.devtools.build.lib.packages.BuildFileNotFoundException: no such package '@@[unknown repo 'toolchains_llvm_boostrapped' requested from @@ (did you mean 'toolchains_llvm_bootstrapped'?)]//platforms': The repository '@@[unknown repo 'toolchains_llvm_boostrapped' requested from @@ (did you mean 'toolchains_llvm_bootstrapped'?)]' could not be resolved: No repository visible as '@toolchains_llvm_boostrapped' from main repository
            at com.google.devtools.build.lib.analysis.platform.PlatformFunction.compute(PlatformFunction.java:75)
            at com.google.devtools.build.lib.analysis.platform.PlatformFunction.compute(PlatformFunction.java:43)
            at com.google.devtools.build.skyframe.ParallelEvaluator.bubbleErrorUp(ParallelEvaluator.java:414)
            at com.google.devtools.build.skyframe.ParallelEvaluator.waitForCompletionAndConstructResult(ParallelEvaluator.java:207)
            at com.google.devtools.build.skyframe.ParallelEvaluator.doMutatingEvaluation(ParallelEvaluator.java:173)
            at com.google.devtools.build.skyframe.ParallelEvaluator.eval(ParallelEvaluator.java:672)
            at com.google.devtools.build.skyframe.AbstractInMemoryMemoizingEvaluator.evaluate(AbstractInMemoryMemoizingEvaluator.java:182)
            at com.google.devtools.build.lib.skyframe.SkyframeExecutor.evaluate(SkyframeExecutor.java:4279)
            at com.google.devtools.build.lib.skyframe.SkyframeExecutor.lambda$evaluateSkyKeys$0(SkyframeExecutor.java:2278)
            at com.google.devtools.build.lib.concurrent.Uninterruptibles.callUninterruptibly(Uninterruptibles.java:35)
            at com.google.devtools.build.lib.skyframe.SkyframeExecutor.evaluateSkyKeys(SkyframeExecutor.java:2274)
            at com.google.devtools.build.lib.skyframe.SkyframeExecutor.getConfiguration(SkyframeExecutor.java:2126)
            ... 16 more
```

Closes bazelbuild#28004.

PiperOrigin-RevId: 845941915
Change-Id: I6ead8dd1662efe90f529a6e21041a225882415dc
(cherry picked from commit d6dc631)
@fmeum fmeum force-pushed the cherry-pick-rrcc-8.7.0 branch from 2e832ec to 40a98b0 Compare March 15, 2026 20:37
fmeum and others added 8 commits March 15, 2026 22:14
`.bzl` files are typically small, but can form deep DAGs that require a large number of sequential cache requests to fetch lazily. By prefetching them (as well as `REPO.bazel` files) eagerly, the wall time of one particular fully cached cold `--nobuild` build of Bazel itself decreased by a factor of 5.

Along the way, make remote repo contents cache failures non-fatal, matching the behavior of the remote cache.

Closes bazelbuild#27910.

PiperOrigin-RevId: 853153815
Change-Id: I368a14a845a8d9fb543f473d8c0c2178a4590c78
(cherry picked from commit 361c420)
…erbose_failures`

Makes it easier to debug issues with this experimental feature and also matches the behavior of remote execution/caching.

Work towards bazelbuild#27965

Closes bazelbuild#27970.

PiperOrigin-RevId: 853238791
Change-Id: Id46ccbb105d93fd17114fab13b086d0b46139fb4
(cherry picked from commit fc5f160)
Ensures that files under repo contents cache entries are not reported as missing after the cache has been deleted while the Bazel server is running. See the long comment in `RepositoryFetchFunction` for why this happens and how it is fixed.

Fixes bazelbuild#26450

Closes bazelbuild#28147.

PiperOrigin-RevId: 853622194
Change-Id: Ifba953b72258030e0a640ac49947ac5c5fc7620a
(cherry picked from commit 7019132)
* Also upload to the remote cache when the local cache is in use. The fix is simple but subtle: the logic for the two caches in `RepositoryFetchFunction` has to be flipped since the Skyframe restart after adding an entry to the local cache meant that the same code path would not be taken again.
* Fix a crash when using both by ensuring that the local repo contents cache uses the file system backing the output base, not the workspace directory:
```
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: Unrecoverable error while evaluating node 'REPOSITORY_DIRECTORY:@@rules_python+' (requested by nodes 'REPO_FILE:@@rules_python+')
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:552)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:435)
	at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: Files are on different filesystems: C:/users/runneradmin/_bazel_runneradmin/ebfu7cpi/external/@rules_python+.marker (on com.google.devtools.build.lib.remote.RemoteExternalOverlayFileSystem@79583b9), C:/Users/runneradmin/.cache/bazel-repo/contents/_trash/26a5feef-bf8c-4326-bf3d-500997c7362e (on com.google.devtools.build.lib.windows.WindowsFileSystem@24180f0f)
	at com.google.devtools.build.lib.vfs.Path.checkSameFileSystem(Path.java:964)
	at com.google.devtools.build.lib.vfs.Path.renameTo(Path.java:630)
	at com.google.devtools.build.lib.vfs.FileSystemUtils.moveFile(FileSystemUtils.java:456)
	at com.google.devtools.build.lib.bazel.repository.cache.LocalRepoContentsCache.moveToCache(LocalRepoContentsCache.java:172)
	at com.google.devtools.build.lib.bazel.repository.RepositoryFetchFunction.compute(RepositoryFetchFunction.java:297)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:471)
```

Closes bazelbuild#28002.

PiperOrigin-RevId: 855211557
Change-Id: I2f3c40a6aef594682fba989853f7ee982f30c294
(cherry picked from commit b143070)
…eValues

Since this behavior is quite surprising (it definitely was to the author), this change also improves the test coverage for repo contents cache deletion by asserting that non-BUILD files within it actually exist on disk rather than just exist from the point of Skyframe.

Also fix a crash observed while working on the test improvements.

Closes bazelbuild#28222.

PiperOrigin-RevId: 855225639
Change-Id: Ie4a88e93d14a4f4b7bb5217fc924e998a1779ccd
(cherry picked from commit 4839f46)
Fixes bazelbuild#27517 by checking Skyframe deps in batches that stop right before any dep that may cause a cycle if checked while previous deps are out-of-date.

This is accompanied by a restructuring of `RepoRecordedInput` that consolidates all Skyframe logic associated with the computation of the corresponding value exclusively within that class. This will also be helpful in adding support for dynamic inputs to the remote repo contents cache in future work.

Also made the entirety of `RepositoryFetchFunction` use skyframe workers, so that checking the up-to-dateness of local repo contents cache entries isn't quadratic.

Closes bazelbuild#28206.

Co-authored-by: Xudong Yang <wyverald@gmail.com>
PiperOrigin-RevId: 855252657
Change-Id: Ica18760ae79da5155fc0f3d8cd4f24c52a034c86
(cherry picked from commit 72a25a9)

(cherry picked from commit 72a25a9)
* The cache was always written to, even if not enabled.
* Google RBE doesn't accept `Command`s without the (deprecated) `Platform` field set. We set it both on `Command` and `Action`, just to be safe.

Fixes bazelbuild#28294 (comment)

Closes bazelbuild#28295.

PiperOrigin-RevId: 856169835
Change-Id: I2479119a173e325a7d39643a36536569f5f831fc

(cherry picked from commit a9946096847e22de98e0e11b1f5dfbb6ec6ecdbb)
…elbuild#28308)

Important outputs and runfiles from external repos that are remote repo contents cache hits got stuck at various levels of the materialization pipeline for being source artifacts. This is fixed by consolidating the skip logic in a `RemoteOutputChecker` static helper.

Closes bazelbuild#28308.

PiperOrigin-RevId: 881618604
Change-Id: Ifaae8e39b0bcab3803653ca82bcf00d26c487316

(cherry picked from commit 16613f1)
@fmeum fmeum force-pushed the cherry-pick-rrcc-8.7.0 branch from 40a98b0 to 4daa6c5 Compare March 15, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-review PR is awaiting review from an assigned reviewer team-Configurability platforms, toolchains, cquery, select(), config transitions team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Performance Issues for Performance teams team-Remote-Exec Issues and PRs for the Execution (Remote) team team-Rules-CPP Issues for C++ rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants