Skip to content

Fix OSS RE client: FindMissingCache staleness, TTL, and silent upload skip#1273

Open
mattparks wants to merge 1 commit intofacebook:mainfrom
mattparks:patch-1
Open

Fix OSS RE client: FindMissingCache staleness, TTL, and silent upload skip#1273
mattparks wants to merge 1 commit intofacebook:mainfrom
mattparks:patch-1

Conversation

@mattparks
Copy link

Three related bugs in the OSS RE gRPC client cause remote_upload_error when deferred-materialized outputs from one action are used as inputs to another action.

Bug 1: Hardcoded TTL of 0 in OSS shim

convert_action_result() hardcodes ttl: 0 for all output files and action_result_ttl: 0 for execute responses. The standard REv2 protocol has no TTL field — TTL is a buck2 extension. With ttl=0, the deferred materializer treats every blob as immediately expired, so get_digests_ttl() always queries FindMissingBlobs instead of trusting that recently-produced outputs still exist.

Fix: use cas_ttl_secs from runtime options (configured via cas_ttl_secs buckconfig key, default 3 hours) instead of hardcoding 0.

Bug 2: FindMissingCache caches "Missing" state

get_digests_ttl() caches DigestRemoteState::Missing in the FindMissingCache LRU (500K entries, 12-hour TTL). "Missing" is a transient state — the blob may be uploaded or produced by an RE action at any time. A stale "Missing" entry causes get_digests_ttl() to return ttl=0 without sending a FindMissingBlobs RPC, even though the blob now exists in CAS. Since deferred materialization never downloaded the blob, buck2 cannot upload it and fails with remote_upload_error.

Fix: stop caching DigestRemoteState::Missing. Only cache ExistsOnRemote. Additionally, mark top-level output file and directory digests as ExistsOnRemote when processing action results from Execute, GetActionResult, and WriteActionResult responses.

Bug 3: Silent skip of missing CAS artifacts during upload

When the uploader encounters a RequiresCasDownload file that FindMissingBlobs reports as missing, it logs a soft_error!("cas_missing") and silently skips the file (continue). The downstream action then fails with incomplete inputs.

Fix: instead of silently skipping, materialize the file locally (which downloads from CAS, bypassing FindMissingCache) and add it to the upload list. If CAS truly doesn't have the blob, ensure_materialized fails with a clear error instead of a mysterious downstream action failure.

@meta-cla
Copy link

meta-cla bot commented Mar 18, 2026

Hi @mattparks!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 18, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 18, 2026

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this in D97149108. (Because this pull request was imported automatically, there will not be any future comments.)

@mattparks mattparks marked this pull request as ready for review March 18, 2026 18:39
… skip

Four bugs in the OSS RE gRPC client cause CasArtifactNotFound errors
when concurrent actions share input digests.

Bug 1: convert_action_result() hardcodes ttl: 0, causing the deferred
materializer to treat every output as immediately expired. Fix: use the
configured cas_ttl_secs.

Bug 2: get_digests_ttl() caches ALL checked digests as ExistsOnRemote
BEFORE processing the FindMissing response. A concurrent action that
hits the cache between caching and response processing sees a premature
ExistsOnRemote for a digest that is actually missing, skips upload, and
its Execute fails. Fix: build the missing set first, then only cache
digests confirmed to exist.

Bug 3: get_digests_ttl() also caches DigestRemoteState::Missing. A
"Missing" entry is transient — the blob may appear at any time. Caching
it prevents future FindMissingBlobs RPCs from being sent. Fix: do not
cache Missing entries.

Bug 4: FindMissingCache global TTL is 12 hours, but cas_ttl_secs
defaults to 3 hours (since PR facebook#1248). Cache entries outlive the blobs
they describe. Fix: set cache TTL to cas_ttl_secs.

Bug 5: Uploader silently skips RequiresCasDownload files reported
missing by FindMissingBlobs. This leaves downstream actions with
incomplete inputs. Fix: materialize locally and re-upload instead of
skipping.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant