Skip to content

Copy Task on macOS Copy-on-Write filesystems is not resilient to concurrent copies #13463

@baronfel

Description

@baronfel

Issue Description

The Copy task includes Retries as a core concept to help buffer against point-in-time availability issues taking down the entire build. However, it seems that in some concurrent-access cases, especially on macOS under copy-on-write file systems, the Copy task doesn't recover from temporary access denial. This leads to macOS builds being more fragile than those on other platforms, especially under testing or multi-agent scenarios.

Steps to Reproduce

The SDK's tests that do copies are flaky on macOS when copying files from the NuGet cache to publish directories of apps when under contention.

Expected Behavior

Short-duration locks related to CoW-style copies shouldn't skip the Copy task's retry mechanisms.

Actual Behavior

Copies from the NuGet package dir to project output directories do seem to error and stop the build when under contention.

Analysis

The macOS File.Copy path goes through two stages, each with distinct failure modes under concurrency:

Stage 1: clonefile() (macOS-only fast path)

The runtime first tries clonefile(src, dst) (`FileSystem.TryCloneFile.OSX.cs). This is an atomic kernel call that CoW-clones a file. Under concurrent copying to the same destination:

  • clonefile → EEXIST: When the destination already exists, clonefile fails immediately with EEXIST. This maps to IOException with message "The file '{path}' already exists." (IO_FileExists_Name). This is what a racing second thread will see.
  • With overwrite: true, the runtime then tries to take an exclusive flock(LOCK_EX | LOCK_NB) on the destination and unlink() it before re-attempting clonefile. If another thread already holds the fd open and locked, this is where the concurrency race becomes visible in Stage 2.

Stage 2: open() + flock() + CopyFile (fallback path)

When clonefile is unavailable or fails, the runtime falls back to SafeFileHandle.Open(dest, FileMode.Create, FileShare.None) followed by Interop.Sys.CopyFile(src, dst). The FileShare.None causes the runtime to call flock(fd, LOCK_EX | LOCK_NB) immediately after open() succeeds.

  • flock(LOCK_NB)EWOULDBLOCK (= EAGAIN): If another process/thread holds the file open with an exclusive or conflicting lock, the non-blocking lock attempt fails with EWOULDBLOCK. This maps to:
    IOException: "The process cannot access the file '{path}' because it is being used by another process."

(IO_SharingViolation_File, with RawErrno as the HResult — this is EAGAIN = errno 35 on macOS)

  • open()EACCES / EPERM: If file permissions prevent opening (e.g., readonly bit set after one thread already started writing), these map to UnauthorizedAccessException (UnauthorizedAccess_IODenied_Path).

  • open()ETXTBSY: On macOS, trying to write to a file that is currently being executed (e.g., copying over a binary that's running) returns ETXTBSY. This is not explicitly handled in Interop.IOErrors.cs — it falls through to the default case, producing a plain IOException with the raw strerror(ETXTBSY) message ("Text file busy") and RawErrno as the HResult.

Copy task interaction

The IOException from EWOULDBLOCK (sharing violation) is the error that MSBuild's Copy task is designed to retry for (ERROR_SHARING_VIOLATION). However, on Unix, the RawErrno HResult baked into that IOException is EAGAIN (errno 35 on macOS), not the Windows ERROR_SHARING_VIOLATION (0x20 / 32). Copy's DoCopyWithRetries checks Marshal.GetHRForException(e) == NativeMethods.ERROR_SHARING_VIOLATION — a Windows-specific constant. On macOS, that HRESULT check simply doesn't match, so the exception falls through to the generic IOException retry path instead.

Versions & Configurations

No response

Metadata

Metadata

Labels

Area: TasksIssues impacting the tasks shipped in Microsoft.Build.Tasks.Core.dll.triaged

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions