You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a bit awkward. For writing a tar entry, we need to know both
the name and size of the file ahead of time. The implementation in
this commit accomplishes that by reading the Put content into a
buffer, hashing and sizing the buffer, and then calling
WriteTarEntryByName to create the entry. With a filesystem-backed CAS
engine, we could avoid the buffer by writing the file to a temporary
location with rolling hash and size tracking and then renaming the
temporary file to the appropriate path.
WriteTarEntryByName itself has awkward buffering to avoid dropping
anything onto disk. It reads through its current file and writes the
new tar into a buffer, and then writes that buffer back back over its
current file. There are a few issues with this:
* It's a lot more work than you need if you're just appending a new
entry to the end of the tarball. But writing the whole file into a
buffer means we don't have to worry about the trailing blocks that
mark the end of the tarball; that's all handled transparently for us
by the Go implementation. And this implementation doesn't have to
be performant (folks should not be using tarballs to back
write-heavy engines).
* It could leave you with a corrupted tarball if the caller dies
mid-overwrite. Again, I expect folks will only ever write to a
tarball when building a tarball for publishing. If the caller dies,
you can just start over. Folks looking for a more reliable
implementation should use a filesystem-backed engine.
* It could leave you with dangling bytes at the end of the tarball. I
couldn't find a Go invocation to truncate the file. Go does have an
ftruncate(2) wrapper [1], but it doesn't seem to be exposed at the
io.Reader/io.Writer/... level. So if you write a shorter file with
the same name as the original, you may end up with some dangling
bytes.
cas.Engine.Put protects against excessive writes with a Get guard;
after hashing the new data, Put trys to Get it from the tarball and
only writes a new entry if it can't find an existing entry. This also
protects the CAS engine from the dangling-bytes issue.
The 0666 file modes and 0777 directory modes rely on the caller's
umask to appropriately limit user/group/other permissions for the
tarball itself and any content extracted to the filesystem from the
tarball.
The trailing slash manipulation (stripping before comparison and
injecting before creation) is based on part of libarchive's
description of old-style archives [2]:
name
Pathname, stored as a null-terminated string. Early tar
implementations only stored regular files (including hardlinks to
those files). One common early convention used a trailing "/"
character to indicate a directory name, allowing directory
permissions and owner information to be archived and restored.
and POSIX ustar archives [3]:
name, prefix
... The standard does not require a trailing / character on
directory names, though most implementations still include this
for compatibility reasons.
[1]: https://golang.org/pkg/syscall/#Ftruncate
[2]: https://github.com/libarchive/libarchive/wiki/ManPageTar5#old-style-archive-format
[3]: https://github.com/libarchive/libarchive/wiki/ManPageTar5#posix-ustar-archives
Signed-off-by: W. Trevor King <[email protected]>
0 commit comments