Skip to content

Commit d1657b5

Browse files
peffgitster
authored andcommitted
archive-tar: write extended headers for file sizes >= 8GB
The ustar format has a fixed-length field for the size of each file entry which is supposed to contain up to 11 bytes of octal-formatted data plus a NUL or space terminator. These means that the largest size we can represent is 077777777777, or 1 byte short of 8GB. The correct solution for a larger file, according to POSIX.1-2001, is to add an extended pax header, similar to how we handle long filenames. This patch does that, and writes zero for the size field in the ustar header (the last bit is not mentioned by POSIX, but it matches how GNU tar behaves with --format=pax). This should be a strict improvement over the current behavior, which is to die in xsnprintf with a "BUG". However, there's some interesting history here. Prior to f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), we silently overflowed the "size" field. The extra bytes ended up in the "mtime" field of the header, which was then immediately written itself, overwriting our extra bytes. What that means depends on how many bytes we wrote. If the size was 64GB or greater, then we actually overflowed digits into the mtime field, meaning our value was effectively right-shifted by those lost octal digits. And this patch is again a strict improvement over that. But if the size was between 8GB and 64GB, then our 12-byte field held all of the actual digits, and only our NUL terminator overflowed. According to POSIX, there should be a NUL or space at the end of the field. However, GNU tar seems to be lenient here, and will correctly parse a size up 64GB (minus one) from the field. So sizes in this range might have just worked, depending on the implementation reading the tarfile. This patch is mostly still an improvement there, as the 8GB limit is specifically mentioned in POSIX as the correct limit. But it's possible that it could be a regression (versus the pre-f2f0267 state) if all of the following are true: 1. You have a file between 8GB and 64GB. 2. Your tar implementation _doesn't_ know about pax extended headers. 3. Your tar implementation _does_ parse 12-byte sizes from the ustar header without a delimiter. It's probably not worth worrying about such an obscure set of conditions, but I'm documenting it here just in case. Helped-by: René Scharfe <[email protected]> Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent e51217e commit d1657b5

File tree

2 files changed

+31
-4
lines changed

2 files changed

+31
-4
lines changed

archive-tar.c

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,13 @@ static int tar_umask = 002;
1818
static int write_tar_filter_archive(const struct archiver *ar,
1919
struct archiver_args *args);
2020

21+
/*
22+
* This is the max value that a ustar size header can specify, as it is fixed
23+
* at 11 octal digits. POSIX specifies that we switch to extended headers at
24+
* this size.
25+
*/
26+
#define USTAR_MAX_SIZE 077777777777UL
27+
2128
/* writes out the whole block, but only if it is full */
2229
static void write_if_needed(void)
2330
{
@@ -137,6 +144,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
137144
strbuf_addch(sb, '\n');
138145
}
139146

147+
/*
148+
* Like strbuf_append_ext_header, but for numeric values.
149+
*/
150+
static void strbuf_append_ext_header_uint(struct strbuf *sb,
151+
const char *keyword,
152+
uintmax_t value)
153+
{
154+
char buf[40]; /* big enough for 2^128 in decimal, plus NUL */
155+
int len;
156+
157+
len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value);
158+
strbuf_append_ext_header(sb, keyword, buf, len);
159+
}
160+
140161
static unsigned int ustar_header_chksum(const struct ustar_header *header)
141162
{
142163
const unsigned char *p = (const unsigned char *)header;
@@ -208,7 +229,7 @@ static int write_tar_entry(struct archiver_args *args,
208229
struct ustar_header header;
209230
struct strbuf ext_header = STRBUF_INIT;
210231
unsigned int old_mode = mode;
211-
unsigned long size;
232+
unsigned long size, size_in_header;
212233
void *buffer;
213234
int err = 0;
214235

@@ -267,7 +288,13 @@ static int write_tar_entry(struct archiver_args *args,
267288
memcpy(header.linkname, buffer, size);
268289
}
269290

270-
prepare_header(args, &header, mode, size);
291+
size_in_header = size;
292+
if (S_ISREG(mode) && size > USTAR_MAX_SIZE) {
293+
size_in_header = 0;
294+
strbuf_append_ext_header_uint(&ext_header, "size", size);
295+
}
296+
297+
prepare_header(args, &header, mode, size_in_header);
271298

272299
if (ext_header.len > 0) {
273300
err = write_extended_header(args, sha1, ext_header.buf,

t/t5000-tar-tree.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -360,7 +360,7 @@ test_expect_success 'set up repository with huge blob' '
360360

361361
# We expect git to die with SIGPIPE here (otherwise we
362362
# would generate the whole 64GB).
363-
test_expect_failure 'generate tar with huge size' '
363+
test_expect_success 'generate tar with huge size' '
364364
{
365365
git archive HEAD
366366
echo $? >exit-code
@@ -369,7 +369,7 @@ test_expect_failure 'generate tar with huge size' '
369369
test_cmp expect exit-code
370370
'
371371

372-
test_expect_failure TAR_HUGE 'system tar can read our huge size' '
372+
test_expect_success TAR_HUGE 'system tar can read our huge size' '
373373
echo 68719476737 >expect &&
374374
tar_info huge.tar | cut -d" " -f1 >actual &&
375375
test_cmp expect actual

0 commit comments

Comments
 (0)