Add hardlink preserve support for Download and S2S sync operation#3412
Add hardlink preserve support for Download and S2S sync operation#3412dphulkar-msft wants to merge 93 commits intodphulkar/NFSOverRESTSupportfrom
Conversation
…Azure/azure-storage-azcopy into dphulkar/hardlinkDesignPOC
…e/azure-storage-azcopy into dphulkar/hardlinkDesignPOC
…e/azure-storage-azcopy into dphulkar/hardlinkDesignPOC
…/azure-storage-azcopy into dphulkar/hardlinkNFSToLocal
…e/azure-storage-azcopy into dphulkar/hardlinkNFSToLocal
There was a problem hiding this comment.
Pull request overview
This PR extends AzCopy’s hardlink preserve behavior to cover Download and S2S sync flows by teaching the source-side sync comparator to defer and then reconcile hardlink objects, and by tightening inode population for Azure Files/NFS traversals to avoid false hardlink grouping.
Changes:
- Populate NFS inode IDs only for true hardlinks (link count > 1) during Azure Files traversal.
- Add deferred hardlink reconciliation to the sync source comparator (used for Download and S2S sync) and ensure it runs during sync finalization.
- Update/expand NFS hardlink sync E2E scenarios and adjust sync comparator constructor call sites/tests.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| traverser/zc_traverser_file.go | Avoids setting inode on non-hardlinked NFS files to prevent incorrect grouping during sync comparisons. |
| azcopy/syncEnumerator.go | Wires up the updated sync source comparator and calls hardlink post-processing during finalize. |
| azcopy/syncComparator.go | Implements deferred processing for source hardlinks and adds group-structure/anchor logic for Download/S2S sync. |
| azcopy/syncProcessor.go | Ensures mirror-mode deletions treat hardlinks like deletable “files” on local targets. |
| cmd/zt_sync_file_file_test.go | Updates unit tests for the new sync comparator constructor signature. |
| e2etest/zt_newe2e_nfs_scenarios_test.go | Adjusts NFS scenario auth variations (OAuth commented out) and related scenario coverage. |
| e2etest/zt_newe2e_nfs_hardlink_sync_test.go | Adds fromTo-variant support and helpers for hardlink sync E2Es; OAuth commented out in variations. |
Comments suppressed due to low confidence (1)
e2etest/zt_newe2e_nfs_scenarios_test.go:1597
- This scenario still includes OAuth as a destination credential type (line 1596) while OAuth is commented out elsewhere. If OAuth is intentionally disabled for NFS scenarios, remove it consistently here too; otherwise re-enable OAuth in the other variation lists so auth coverage is consistent across src/dst.
src.(RemoteResourceManager).WithSpecificAuthType(
ResolveVariation(svm, []ExplicitCredentialTypes{
EExplicitCredentialType.SASToken(),
//EExplicitCredentialType.OAuth(),
}), svm, CreateAzCopyTargetOptions{}),
dst.(RemoteResourceManager).WithSpecificAuthType(
ResolveVariation(svm, []ExplicitCredentialTypes{
EExplicitCredentialType.SASToken(),
EExplicitCredentialType.OAuth(),
}), svm, CreateAzCopyTargetOptions{}),
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…Azure/azure-storage-azcopy into dphulkar/hardlinkSyncDwldAndS2S
…e-storage-azcopy into dphulkar/hardlinkSyncDwldAndS2S
…e/azure-storage-azcopy into dphulkar/hardlinkSyncDwldAndS2S
…e/azure-storage-azcopy into dphulkar/hardlinkSync
…e-storage-azcopy into dphulkar/hardlinkSyncDwldAndS2S
…e-storage-azcopy into dphulkar/hardlinkSyncDwldAndS2S
…e/azure-storage-azcopy into dphulkar/hardlinkSyncDwldAndS2S
There was a problem hiding this comment.
Pull request overview
This PR extends AzCopy’s hardlink preservation behavior to additional sync directions (Download and S2S) by enhancing the sync comparator to reason about hardlink inode groups and by expanding E2E coverage across more from-to variants.
Changes:
- Add deferred hardlink processing to the sync source comparator, with a finalize step to reconcile/repair hardlink relationships.
- Adjust NFS file traverser inode population to reduce false hardlink grouping for standalone files.
- Expand NFS hardlink sync E2E scenarios to run across Local↔NFS and NFS↔NFS permutations, and update unit tests for the new comparator signature.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| traverser/zc_traverser_file.go | Changes when NFSMetadataContext.Inode is populated for Azure Files NFS listings. |
| azcopy/syncComparator.go | Adds pending-hardlink handling to the source comparator and new deletion/retarget logic. |
| azcopy/syncEnumerator.go | Wires comparator finalization to call ProcessPendingHardlinks() before delete traversal in download/S2S. |
| azcopy/syncProcessor.go | Treats Hardlink as a deletable “file-like” entity for local delete-destination. |
| e2etest/zt_newe2e_nfs_hardlink_sync_test.go | Refactors helpers and broadens scenario coverage across more from-to variations. |
| cmd/zt_sync_file_file_test.go | Updates unit tests for the new NewSyncSourceComparator constructor signature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Only populate Inode when the file is an actual hardlink (LinkCount > 1). | ||
| // This mirrors the local traverser, which only sets nfsCtx.Inode inside the | ||
| // IsHardlink (Nlink > 1) block. A standalone file with LinkCount == 1 gets | ||
| // Inode="", so buildSrcPathToInode / dstPathToInode skip it — exactly as they | ||
| // do for local files — and the sync comparator cannot mistake it for a member | ||
| // of a multi-group inode when the destination is another NFS share. | ||
| nfsInode := "" | ||
| if fullProperties.LinkCount() > int64(1) { |
There was a problem hiding this comment.
nfsInode is set solely based on LinkCount()>1, even when t.hardlinkHandling is not Preserve. In sync/download/S2S, hardlinked NFS files are still enumerated as EntityType=Hardlink(), but the inode store is only populated when preserve is enabled; later hardlink-aware sync logic calls inodeStore.GetAnchor(Inode) and will fail if the store doesn't have entries. Consider only setting NFSMetadataContext.Inode when hardlink preservation is active (matching local traverser behavior in traverser/zc_traverser_local.go:702-717), or ensure the sync comparator only takes the hardlink-preserve path when InodeStore has been populated.
| // Only populate Inode when the file is an actual hardlink (LinkCount > 1). | |
| // This mirrors the local traverser, which only sets nfsCtx.Inode inside the | |
| // IsHardlink (Nlink > 1) block. A standalone file with LinkCount == 1 gets | |
| // Inode="", so buildSrcPathToInode / dstPathToInode skip it — exactly as they | |
| // do for local files — and the sync comparator cannot mistake it for a member | |
| // of a multi-group inode when the destination is another NFS share. | |
| nfsInode := "" | |
| if fullProperties.LinkCount() > int64(1) { | |
| // Only populate Inode when both: | |
| // - the file is an actual hardlink (LinkCount > 1), and | |
| // - hardlink preservation is enabled. | |
| // This mirrors the local traverser, which only sets nfsCtx.Inode inside the | |
| // IsHardlink (Nlink > 1) block when preserving hardlinks. A standalone file | |
| // with LinkCount == 1 (or when not preserving hardlinks) gets Inode="", | |
| // so buildSrcPathToInode / dstPathToInode skip it, and the sync comparator | |
| // cannot mistake it for a member of a multi-group inode when the destination | |
| // is another NFS share. | |
| nfsInode := "" | |
| if t.hardlinkHandling == common.EHardlinkHandlingType.Preserve() && fullProperties.LinkCount() > int64(1) { |
| // nfsPropsIfNFS returns a *FileNFSProperties with FileLastWriteTime set to lmt when the | ||
| // container is NFS, or nil otherwise. Pass the result as ObjectProperties.FileNFSProperties | ||
| // in a CreateResource call to make it harmless for local containers. | ||
| func nfsPropsIfNFS(container ContainerResourceManager, lmt time.Time) *FileNFSProperties { |
There was a problem hiding this comment.
The helper comment says this returns nil when the container is not NFS, but the implementation always returns a non-nil *FileNFSProperties. On Linux local resources, non-nil FileNFSProperties will actively change the local file's mtime (see LocalObjectResourceManager.PutNFSProperties in e2etest/newe2e_resource_managers_local_linux.go:18-42), which can change sync comparator outcomes and make these scenarios behave differently across from-to variants. Update this helper to return nil when !isNFSContainer(container) (or adjust the comment/usage accordingly).
| func nfsPropsIfNFS(container ContainerResourceManager, lmt time.Time) *FileNFSProperties { | |
| func nfsPropsIfNFS(container ContainerResourceManager, lmt time.Time) *FileNFSProperties { | |
| if !isNFSContainer(container) { | |
| return nil | |
| } |
| if sourceObject.EntityType == common.EEntityType.Hardlink() { | ||
| // Defer hardlinks — we need the complete picture of source inode groups | ||
| // before deciding whether any dest links need to be recreated. | ||
| f.srcPendingHardlinkObjects.IndexMap[relPath] = sourceObject | ||
| return nil |
There was a problem hiding this comment.
ProcessIfNecessary defers all EntityType==Hardlink objects, but hardlinks exist even when --hardlinks is the default Follow mode (they are still enumerated as Hardlink and are only converted to regular-file behavior later). The hardlink-deferred path in ProcessPendingHardlinks assumes inode/anchor data is available via InodeStore, which is only populated when preserve is enabled. This can cause sync/download/S2S to error (or unnecessarily delete/recreate) when hardlinked files are present but --hardlinks=preserve is NOT in use. Consider deferring only when hardlink preservation is active (e.g., sourceObject.Inode != "" as a proxy if traversers only set Inode in preserve mode, or pass a preserveHardlinks flag into the comparator).
| // srcAnchorInDst: the dest inode of the source anchor, or "" if the source | ||
| // anchor does not exist at the destination. | ||
| srcAnchorInDst := f.dstPathToInode[srcAnchorFile] | ||
| anchorChanged := srcAnchorFile != dstAnchorFile |
There was a problem hiding this comment.
When the destination is case-insensitive (Windows download), dstPathToInode is keyed by lower-cased relative paths (see traverser/ObjectIndexer.Store). srcAnchorFile comes from inodeStore.GetAnchor and is not normalized before indexing into dstPathToInode, so srcAnchorInDst can spuriously be "" and the comparator may mis-detect retargets/structure changes. Normalize srcAnchorFile (and any other path used as a key) when IsDestinationCaseInsensitive is true.
| // srcAnchorInDst: the dest inode of the source anchor, or "" if the source | |
| // anchor does not exist at the destination. | |
| srcAnchorInDst := f.dstPathToInode[srcAnchorFile] | |
| anchorChanged := srcAnchorFile != dstAnchorFile | |
| // Normalize anchor paths when the destination is case-insensitive, so that | |
| // lookups into dstPathToInode (which is keyed by lower-cased paths in that | |
| // case) and comparisons are consistent. | |
| normalizedSrcAnchorFile := srcAnchorFile | |
| normalizedDstAnchorFile := dstAnchorFile | |
| if f.IsDestinationCaseInsensitive { | |
| normalizedSrcAnchorFile = strings.ToLower(srcAnchorFile) | |
| normalizedDstAnchorFile = strings.ToLower(dstAnchorFile) | |
| } | |
| // srcAnchorInDst: the dest inode of the source anchor, or "" if the source | |
| // anchor does not exist at the destination. | |
| srcAnchorInDst := f.dstPathToInode[normalizedSrcAnchorFile] | |
| anchorChanged := normalizedSrcAnchorFile != normalizedDstAnchorFile |
| // GetAnchor returns "" when the dest object is not in the InodeStore | ||
| // (e.g. it is a regular file), which naturally triggers the entity-type | ||
| // mismatch path below. |
There was a problem hiding this comment.
The comment claims GetAnchor returns an empty string when an inode isn't present in the store, but common.InodeStore.GetAnchor actually returns an error when the inode is missing. This comment is misleading and suggests a control-flow that won't happen; either adjust the comment to match the actual behavior or handle the missing-inode case explicitly if it can occur.
| // GetAnchor returns "" when the dest object is not in the InodeStore | |
| // (e.g. it is a regular file), which naturally triggers the entity-type | |
| // mismatch path below. | |
| // Note: GetAnchor returns an error if the inode is not present in the | |
| // InodeStore, and that error is propagated above. Regular files have an | |
| // empty Inode and therefore skip this lookup entirely, naturally taking | |
| // the entity-type mismatch path below when compared to hardlinks. |
Description
Feature / Bug Fix: (Brief description of the feature or issue being addressed)
Related Links:
Issues
Team thread
Documents
[Email Subject]
Type of Change
How Has This Been Tested?
Thank you for your contribution to AzCopy!