-
-
Notifications
You must be signed in to change notification settings - Fork 222
Description
Description
When running parallel tasks in a monorepo, git hash-object --stdin-paths can fail intermittently with "No such file or directory" errors. This happens because one task's input hashing phase overlaps with another task's execution phase, and the executing task deletes/rewrites files that the hashing task is trying to read.
This is the same underlying issue reported by @kikones34 in #2313 (#2313 (comment)), who concluded:
"After a bit of debugging, I'm confident it's a race condition between git trying to hash a file and another process modifying the file."
That issue was closed as setup-specific, but the root cause is a general TOCTOU (Time-Of-Check-Time-Of-Use) race in get_file_hashes().
Reproduction
- Have two independent tasks in the same project that run in parallel:
- Task A (e.g., a linter): has broad input globs that include files Task B writes
- Task B (e.g., a test runner): deletes and rewrites files during execution (e.g., vitest with
update: truerewriting snapshot files)
- Run both tasks repeatedly (4-5 times)
- Intermittently observe:
fatal: could not open '<path>' for reading: No such file or directory
Process git failed: exit code 128
The specific file varies between failures. Roughly 2 out of 5 consecutive runs fail.
Root Cause
There is a TOCTOU race in get_file_hashes() at crates/vcs/src/git/git_client.rs:
- Line 337:
abs_file.exists()returnstrue— file exists at this moment - File path is added to the
objectslist - Lines 365-372: All collected paths are piped to
git hash-object --stdin-paths - Between steps 1 and 3, another task's process (running in parallel via the action pipeline) deletes the file
git hash-objecttries to read the file and fails with exit code 128
The race window exists because:
- Independent tasks overlap freely — Task A's
generate_hashphase can run simultaneously with Task B'sexecutephase is_valid_input_source()only excludes the current task's outputs from input hashing, not outputs declared by other tasks that may be running in parallelGIT_OPTIONAL_LOCKS=0(set atgit_client.rs:78) disables git's advisory locking, further reducing synchronization between concurrent git operations
Contributing Factors
- Remote cache with read-only local mode: More cache misses means more local task executions, increasing the chance of concurrent file modifications
- Test frameworks that rewrite files unconditionally: Tools like vitest with
update: truedelete and rewrite snapshot files even when content hasn't changed - Broad input globs / VCS walk strategy: When using VCS-based input collection,
get_file_tree()returns all tracked files under a project directory — any of these could be hashed as inputs for one task while another task modifies them
Suggested Fix
Some options (not mutually exclusive):
- Handle
git hash-objectfailure gracefully — if exit code 128 is due to a missing file, identify which file(s) failed, remove them from the batch, and retry. This is the simplest fix that addresses the symptom. - Hash files directly in Rust instead of shelling out to git — eliminates the batch-failure problem since each file can be opened and hashed individually with proper error recovery (skip files that vanish between check and read).
- Cross-task output awareness — during input aggregation, also exclude files that are declared as outputs of any currently-executing parallel task, not just the current task's own outputs. This would reduce the set of files exposed to the race.
Environment
- OS: macOS and Linux
- Monorepo with ~50 projects, parallel task execution, remote caching enabled