[Breaking change]: TarWriter now uses hard link entries for files with the same inode
Description
Starting in .NET 11 Preview 3, the System.Formats.Tar library introduces a change in how the TarWriter handles files that are hard-linked to the same inode. Previously, when creating a tar archive, the TarWriter would write a full file entry for each hard-linked file, duplicating the file's content in the archive. With this change, the TarWriter now generates a HardLink entry for subsequent files that are hard-linked to the same inode, instead of duplicating the file content.
This change aligns the behavior of TarWriter with the behavior of GNU tar and other common tar implementations, which use hard link entries to save space and preserve the hard link relationships in the archive.
Version
.NET 11 Preview 3
Previous behavior
When creating a tar archive with TarWriter, files that were hard-linked to the same inode were treated as separate files, and their content was duplicated in the archive. For example:
using System.Formats.Tar;
using System.IO;
string filePath1 = "file1.txt";
string filePath2 = "file2.txt";
// Create two hard-linked files
File.WriteAllText(filePath1, "Hello, world!");
File.CreateHardLink(filePath2, filePath1);
using (var stream = File.Create("archive.tar"))
using (var writer = new TarWriter(stream, TarEntryFormat.Pax, leaveOpen: false))
{
writer.WriteEntry(filePath1, "file1.txt");
writer.WriteEntry(filePath2, "file2.txt");
}
In the resulting archive.tar, both file1.txt and file2.txt would have separate file entries, each containing the full file content.
New behavior
Starting in .NET 11 Preview 3, the TarWriter will detect when multiple files are hard-linked to the same inode and will write a HardLink entry for subsequent files instead of duplicating the file content.
Using the same code as in the previous section, the resulting archive.tar, file1.txt will have a full file entry with its content, while file2.txt will have a HardLink entry pointing to file1.txt.
Reason for change
This change was introduced to improve the efficiency of tar archives created using the System.Formats.Tar library. By using HardLink entries for files that share the same inode, the size of the resulting tar archive is reduced, and the hard link relationships between files are preserved. This behavior is consistent with the behavior of GNU tar and other widely used tar implementations.
Recommended action
Developers who rely on the previous behavior of duplicating file content for hard-linked files should update their code to account for the new behavior. If the previous behavior is required, developers can use the new TarWriterOptions class to configure the TarWriter to always dereference hard links and write full file entries for all files.
For example:
using System.Formats.Tar;
using System.IO;
string filePath1 = "file1.txt";
string filePath2 = "file2.txt";
// Create two hard-linked files
File.WriteAllText(filePath1, "Hello, world!");
File.CreateHardLink(filePath2, filePath1);
var options = new TarWriterOptions(TarEntryFormat.Pax)
{
HardLinkMode = TarHardLinkMode.CopyContents
};
using (var stream = File.Create("archive.tar"))
using (var writer = new TarWriter(stream, options, leaveOpen: false))
{
writer.WriteEntry(filePath1, "file1.txt");
writer.WriteEntry(filePath2, "file2.txt");
}
In this example, the HardLinkMode property of TarWriterOptions is set to TarHardLinkMode.CopyContents, which restores the previous behavior of duplicating file content for hard-linked files.
Affected APIs
System.Formats.Tar.TarWriter
System.Formats.Tar.TarWriter.WriteEntry
System.Formats.Tar.TarWriter.WriteEntryAsync
Type of breaking change
- Behavioral change: Existing binaries might behave differently at runtime.
Additional notes
- Extracting tar archives with hard links to file systems that do not support hard links will result in an
IOException. This behavior is consistent with GNU tar.
- Developers can handle such exceptions and implement custom fallback logic, such as copying the file content instead of creating a hard link
- .NET 11 also introduces
TarExtractOptions which enable users to specify whether hard links should be extracted as hard links or whether the file should be copied. This keeps the archive small while enabling to extract it to file systems that don't support/permit hard links.
Associated WorkItem - 564677
[Breaking change]: TarWriter now uses hard link entries for files with the same inode
Description
Starting in .NET 11 Preview 3, the
System.Formats.Tarlibrary introduces a change in how theTarWriterhandles files that are hard-linked to the same inode. Previously, when creating a tar archive, theTarWriterwould write a full file entry for each hard-linked file, duplicating the file's content in the archive. With this change, theTarWriternow generates aHardLinkentry for subsequent files that are hard-linked to the same inode, instead of duplicating the file content.This change aligns the behavior of
TarWriterwith the behavior of GNU tar and other common tar implementations, which use hard link entries to save space and preserve the hard link relationships in the archive.Version
.NET 11 Preview 3
Previous behavior
When creating a tar archive with
TarWriter, files that were hard-linked to the same inode were treated as separate files, and their content was duplicated in the archive. For example:In the resulting
archive.tar, bothfile1.txtandfile2.txtwould have separate file entries, each containing the full file content.New behavior
Starting in .NET 11 Preview 3, the
TarWriterwill detect when multiple files are hard-linked to the same inode and will write aHardLinkentry for subsequent files instead of duplicating the file content.Using the same code as in the previous section, the resulting
archive.tar,file1.txtwill have a full file entry with its content, whilefile2.txtwill have aHardLinkentry pointing tofile1.txt.Reason for change
This change was introduced to improve the efficiency of tar archives created using the
System.Formats.Tarlibrary. By usingHardLinkentries for files that share the same inode, the size of the resulting tar archive is reduced, and the hard link relationships between files are preserved. This behavior is consistent with the behavior of GNU tar and other widely used tar implementations.Recommended action
Developers who rely on the previous behavior of duplicating file content for hard-linked files should update their code to account for the new behavior. If the previous behavior is required, developers can use the new
TarWriterOptionsclass to configure theTarWriterto always dereference hard links and write full file entries for all files.For example:
In this example, the
HardLinkModeproperty ofTarWriterOptionsis set toTarHardLinkMode.CopyContents, which restores the previous behavior of duplicating file content for hard-linked files.Affected APIs
System.Formats.Tar.TarWriterSystem.Formats.Tar.TarWriter.WriteEntrySystem.Formats.Tar.TarWriter.WriteEntryAsyncType of breaking change
Additional notes
IOException. This behavior is consistent with GNU tar.TarExtractOptionswhich enable users to specify whether hard links should be extracted as hard links or whether the file should be copied. This keeps the archive small while enabling to extract it to file systems that don't support/permit hard links.Associated WorkItem - 564677