Skip to content

Commit 613bef5

Browse files
peffgitster
authored andcommitted
shorten_unambiguous_ref(): avoid sscanf()
To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <[email protected]> Helped-by: Eric Sunshine <[email protected]> Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 8f416f6 commit 613bef5

File tree

2 files changed

+77
-35
lines changed

2 files changed

+77
-35
lines changed

refs.c

Lines changed: 43 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1310,53 +1310,62 @@ int update_ref(const char *msg, const char *refname,
13101310
old_oid, flags, onerr);
13111311
}
13121312

1313-
char *refs_shorten_unambiguous_ref(struct ref_store *refs,
1314-
const char *refname, int strict)
1313+
/*
1314+
* Check that the string refname matches a rule of the form
1315+
* "{prefix}%.*s{suffix}". So "foo/bar/baz" would match the rule
1316+
* "foo/%.*s/baz", and return the string "bar".
1317+
*/
1318+
static const char *match_parse_rule(const char *refname, const char *rule,
1319+
size_t *len)
13151320
{
1316-
int i;
1317-
static char **scanf_fmts;
1318-
char *short_name;
1319-
struct strbuf resolved_buf = STRBUF_INIT;
1320-
1321-
if (!scanf_fmts) {
1322-
/*
1323-
* Pre-generate scanf formats from ref_rev_parse_rules[].
1324-
* Generate a format suitable for scanf from a
1325-
* ref_rev_parse_rules rule by interpolating "%s" at the
1326-
* location of the "%.*s".
1327-
*/
1328-
size_t total_len = 0;
1329-
size_t offset = 0;
1321+
/*
1322+
* Check that rule matches refname up to the first percent in the rule.
1323+
* We can bail immediately if not, but otherwise we leave "rule" at the
1324+
* %-placeholder, and "refname" at the start of the potential matched
1325+
* name.
1326+
*/
1327+
while (*rule != '%') {
1328+
if (!*rule)
1329+
BUG("rev-parse rule did not have percent");
1330+
if (*refname++ != *rule++)
1331+
return NULL;
1332+
}
13301333

1331-
for (i = 0; i < NUM_REV_PARSE_RULES; i++)
1332-
/* -2 for strlen("%.*s") - strlen("%s"); +1 for NUL */
1333-
total_len += strlen(ref_rev_parse_rules[i]) - 2 + 1;
1334+
/*
1335+
* Check that our "%" is the expected placeholder. This assumes there
1336+
* are no other percents (placeholder or quoted) in the string, but
1337+
* that is sufficient for our rev-parse rules.
1338+
*/
1339+
if (!skip_prefix(rule, "%.*s", &rule))
1340+
return NULL;
13341341

1335-
scanf_fmts = xmalloc(st_add(st_mult(sizeof(char *), NUM_REV_PARSE_RULES), total_len));
1342+
/*
1343+
* And now check that our suffix (if any) matches.
1344+
*/
1345+
if (!strip_suffix(refname, rule, len))
1346+
return NULL;
13361347

1337-
offset = 0;
1338-
for (i = 0; i < NUM_REV_PARSE_RULES; i++) {
1339-
assert(offset < total_len);
1340-
scanf_fmts[i] = (char *)&scanf_fmts[NUM_REV_PARSE_RULES] + offset;
1341-
offset += xsnprintf(scanf_fmts[i], total_len - offset,
1342-
ref_rev_parse_rules[i], 2, "%s") + 1;
1343-
}
1344-
}
1348+
return refname; /* len set by strip_suffix() */
1349+
}
13451350

1346-
/* buffer for scanf result, at most refname must fit */
1347-
short_name = xstrdup(refname);
1351+
char *refs_shorten_unambiguous_ref(struct ref_store *refs,
1352+
const char *refname, int strict)
1353+
{
1354+
int i;
1355+
struct strbuf resolved_buf = STRBUF_INIT;
13481356

13491357
/* skip first rule, it will always match */
13501358
for (i = NUM_REV_PARSE_RULES - 1; i > 0 ; --i) {
13511359
int j;
13521360
int rules_to_fail = i;
1361+
const char *short_name;
13531362
size_t short_name_len;
13541363

1355-
if (1 != sscanf(refname, scanf_fmts[i], short_name))
1364+
short_name = match_parse_rule(refname, ref_rev_parse_rules[i],
1365+
&short_name_len);
1366+
if (!short_name)
13561367
continue;
13571368

1358-
short_name_len = strlen(short_name);
1359-
13601369
/*
13611370
* in strict mode, all (except the matched one) rules
13621371
* must fail to resolve to a valid non-ambiguous ref
@@ -1394,12 +1403,11 @@ char *refs_shorten_unambiguous_ref(struct ref_store *refs,
13941403
*/
13951404
if (j == rules_to_fail) {
13961405
strbuf_release(&resolved_buf);
1397-
return short_name;
1406+
return xmemdupz(short_name, short_name_len);
13981407
}
13991408
}
14001409

14011410
strbuf_release(&resolved_buf);
1402-
free(short_name);
14031411
return xstrdup(refname);
14041412
}
14051413

t/t1401-symbolic-ref.sh

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,4 +189,38 @@ test_expect_success 'symbolic-ref pointing at another' '
189189
test_cmp expect actual
190190
'
191191

192+
test_expect_success 'symbolic-ref --short handles complex utf8 case' '
193+
name="测试-加-增加-加-增加" &&
194+
git symbolic-ref TEST_SYMREF "refs/heads/$name" &&
195+
# In the real world, we saw problems with this case only
196+
# when the locale includes UTF-8. Set it here to try to make things as
197+
# hard as possible for us to pass, but in practice we should do the
198+
# right thing regardless (and of course some platforms may not even
199+
# have this locale).
200+
LC_ALL=en_US.UTF-8 git symbolic-ref --short TEST_SYMREF >actual &&
201+
echo "$name" >expect &&
202+
test_cmp expect actual
203+
'
204+
205+
test_expect_success 'symbolic-ref --short handles name with suffix' '
206+
git symbolic-ref TEST_SYMREF "refs/remotes/origin/HEAD" &&
207+
git symbolic-ref --short TEST_SYMREF >actual &&
208+
echo "origin" >expect &&
209+
test_cmp expect actual
210+
'
211+
212+
test_expect_success 'symbolic-ref --short handles almost-matching name' '
213+
git symbolic-ref TEST_SYMREF "refs/headsXfoo" &&
214+
git symbolic-ref --short TEST_SYMREF >actual &&
215+
echo "headsXfoo" >expect &&
216+
test_cmp expect actual
217+
'
218+
219+
test_expect_success 'symbolic-ref --short handles name with percent' '
220+
git symbolic-ref TEST_SYMREF "refs/heads/%foo" &&
221+
git symbolic-ref --short TEST_SYMREF >actual &&
222+
echo "%foo" >expect &&
223+
test_cmp expect actual
224+
'
225+
192226
test_done

0 commit comments

Comments
 (0)