Skip to content

Commit b897bf5

Browse files
peffgitster
authored andcommitted
fast-export: use xmemdupz() for anonymizing oids
Our anonymize_mem() function is careful to take a ptr/len pair to allow storing binary tokens like object ids, as well as partial strings (e.g., just "foo" of "foo/bar"). But it duplicates the hash key using xstrdup()! That means that: - for a partial string, we'd store all bytes up to the NUL, even though we'd never look at anything past "len". This didn't produce wrong behavior, but was wasteful. - for a binary oid that doesn't contain a zero byte, we'd copy garbage bytes off the end of the array (though as long as nothing complained about reading uninitialized bytes, further reads would be limited by "len", and we'd produce the correct results) - for a binary oid that does contain a zero byte, we'd copy _fewer_ bytes than intended into the hashmap struct. When we later try to look up a value, we'd access uninitialized memory and potentially falsely claim that a particular oid is not present. The most common reason to store an oid is an anonymized gitlink, but our test case doesn't have any gitlinks at all. So let's add one whose oid contains a NUL and is present at two different paths. ASan catches the memory error, but even without it we can detect the bug because the oid is not anonymized the same way for both paths. And of course the fix is to copy the correct number of bytes. We don't technically need the appended NUL from xmemdupz(), but it doesn't hurt as an extra protection against anybody treating it like a string (plus a future patch will push us more in that direction). Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent b8c0689 commit b897bf5

File tree

2 files changed

+16
-1
lines changed

2 files changed

+16
-1
lines changed

builtin/fast-export.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ static const void *anonymize_mem(struct hashmap *map,
162162
if (!ret) {
163163
ret = xmalloc(sizeof(*ret));
164164
hashmap_entry_init(&ret->hash, key.hash.hash);
165-
ret->orig = xstrdup(orig);
165+
ret->orig = xmemdupz(orig, *len);
166166
ret->orig_len = *len;
167167
ret->anon = generate(orig, len);
168168
ret->anon_len = *len;

t/t9351-fast-export-anonymize.sh

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ test_expect_success 'setup simple repo' '
1010
mkdir subdir &&
1111
test_commit subdir/bar &&
1212
test_commit subdir/xyzzy &&
13+
fake_commit=$(echo $ZERO_OID | sed s/0/a/) &&
14+
git update-index --add --cacheinfo 160000,$fake_commit,link1 &&
15+
git update-index --add --cacheinfo 160000,$fake_commit,link2 &&
16+
git commit -m "add gitlink" &&
1317
git tag -m "annotated tag" mytag
1418
'
1519

@@ -26,6 +30,12 @@ test_expect_success 'stream omits path names' '
2630
! grep xyzzy stream
2731
'
2832

33+
test_expect_success 'stream omits gitlink oids' '
34+
# avoid relying on the whole oid to remain hash-agnostic; this is
35+
# plenty to be unique within our test case
36+
! grep a000000000000000000 stream
37+
'
38+
2939
test_expect_success 'stream allows master as refname' '
3040
grep master stream
3141
'
@@ -89,6 +99,11 @@ test_expect_success 'paths in subdir ended up in one tree' '
8999
test_cmp expect actual
90100
'
91101

102+
test_expect_success 'identical gitlinks got identical oid' '
103+
awk "/commit/ { print \$3 }" <root | sort -u >commits &&
104+
test_line_count = 1 commits
105+
'
106+
92107
test_expect_success 'tag points to branch tip' '
93108
git rev-parse $other_branch >expect &&
94109
git for-each-ref --format="%(*objectname)" | grep . >actual &&

0 commit comments

Comments
 (0)