-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Copy Task on macOS Copy-on-Write filesystems is not resilient to concurrent copies #13463
Description
Issue Description
The Copy task includes Retries as a core concept to help buffer against point-in-time availability issues taking down the entire build. However, it seems that in some concurrent-access cases, especially on macOS under copy-on-write file systems, the Copy task doesn't recover from temporary access denial. This leads to macOS builds being more fragile than those on other platforms, especially under testing or multi-agent scenarios.
Steps to Reproduce
The SDK's tests that do copies are flaky on macOS when copying files from the NuGet cache to publish directories of apps when under contention.
Expected Behavior
Short-duration locks related to CoW-style copies shouldn't skip the Copy task's retry mechanisms.
Actual Behavior
Copies from the NuGet package dir to project output directories do seem to error and stop the build when under contention.
Analysis
The macOS File.Copy path goes through two stages, each with distinct failure modes under concurrency:
Stage 1: clonefile() (macOS-only fast path)
The runtime first tries clonefile(src, dst) (`FileSystem.TryCloneFile.OSX.cs). This is an atomic kernel call that CoW-clones a file. Under concurrent copying to the same destination:
- clonefile → EEXIST: When the destination already exists, clonefile fails immediately with EEXIST. This maps to IOException with message "The file '{path}' already exists." (
IO_FileExists_Name). This is what a racing second thread will see. - With overwrite: true, the runtime then tries to take an exclusive
flock(LOCK_EX | LOCK_NB)on the destination andunlink()it before re-attempting clonefile. If another thread already holds the fd open and locked, this is where the concurrency race becomes visible in Stage 2.
Stage 2: open() + flock() + CopyFile (fallback path)
When clonefile is unavailable or fails, the runtime falls back to SafeFileHandle.Open(dest, FileMode.Create, FileShare.None) followed by Interop.Sys.CopyFile(src, dst). The FileShare.None causes the runtime to call flock(fd, LOCK_EX | LOCK_NB) immediately after open() succeeds.
flock(LOCK_NB)→EWOULDBLOCK(=EAGAIN): If another process/thread holds the file open with an exclusive or conflicting lock, the non-blocking lock attempt fails withEWOULDBLOCK. This maps to:
IOException: "The process cannot access the file '{path}' because it is being used by another process."
(IO_SharingViolation_File, with RawErrno as the HResult — this is EAGAIN = errno 35 on macOS)
-
open()→EACCES/EPERM: If file permissions prevent opening (e.g., readonly bit set after one thread already started writing), these map toUnauthorizedAccessException(UnauthorizedAccess_IODenied_Path). -
open()→ETXTBSY: On macOS, trying to write to a file that is currently being executed (e.g., copying over a binary that's running) returnsETXTBSY. This is not explicitly handled inInterop.IOErrors.cs— it falls through to the default case, producing a plain IOException with the rawstrerror(ETXTBSY)message ("Text file busy") andRawErrnoas theHResult.
Copy task interaction
The IOException from EWOULDBLOCK (sharing violation) is the error that MSBuild's Copy task is designed to retry for (ERROR_SHARING_VIOLATION). However, on Unix, the RawErrno HResult baked into that IOException is EAGAIN (errno 35 on macOS), not the Windows ERROR_SHARING_VIOLATION (0x20 / 32). Copy's DoCopyWithRetries checks Marshal.GetHRForException(e) == NativeMethods.ERROR_SHARING_VIOLATION — a Windows-specific constant. On macOS, that HRESULT check simply doesn't match, so the exception falls through to the generic IOException retry path instead.
Versions & Configurations
No response