Skip to content

Commit 03818a4

Browse files
peffgitster
authored andcommitted
split_ident: parse timestamp from end of line
Split_ident currently parses left to right. Given this input: Your Name <[email protected]> 123456789 -0500\n We assume the name starts the line and runs until the first "<". That starts the email address, which runs until the first ">". Everything after that is assumed to be the timestamp. This works fine in the normal case, but is easily broken by corrupted ident lines that contain an extra ">". Some examples seen in the wild are: 1. Name <email>-<> 123456789 -0500\n 2. Name <email> <Name<email>> 123456789 -0500\n 3. Name1 <email1>, Name2 <email2> 123456789 -0500\n Currently each of these produces some email address (which is not necessarily the one the user intended) and end up with a NULL date (which is generally interpreted as the epoch by "git log" and friends). But in each case we could get the correct timestamp simply by parsing from the right-hand side, looking backwards for the final ">", and then reading the timestamp from there. In general, it's a losing battle to try to automatically guess what the user meant with their broken crud. But this particular workaround is probably worth doing. One, it's dirt simple, and can't impact non-broken cases. Two, it doesn't catch a single breakage we've seen, but rather a large class of errors (i.e., any breakage inside the email angle brackets may affect the email, but won't spill over into the timestamp parsing). And three, the timestamp is arguably more valuable to get right, because it can affect correctness (e.g., in --until cutoffs). This patch implements the right-to-left scheme described above. We adjust the tests in t4212, which generate a commit with such a broken ident, and now gets the timestamp right. We also add a test that fsck continues to detect the breakage. For reference, here are pointers to the breakages seen (as numbered above): [1] http://article.gmane.org/gmane.comp.version-control.git/221441 [2] http://article.gmane.org/gmane.comp.version-control.git/222362 [3] http://perl5.git.perl.org/perl.git/commit/13b79730adea97e660de84bbe67f9d7cbe344302 Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 02a110a commit 03818a4

File tree

2 files changed

+22
-3
lines changed

2 files changed

+22
-3
lines changed

ident.c

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,21 @@ int split_ident_line(struct ident_split *split, const char *line, int len)
233233
if (!split->mail_end)
234234
return status;
235235

236-
for (cp = split->mail_end + 1; cp < line + len && isspace(*cp); cp++)
236+
/*
237+
* Look from the end-of-line to find the trailing ">" of the mail
238+
* address, even though we should already know it as split->mail_end.
239+
* This can help in cases of broken idents with an extra ">" somewhere
240+
* in the email address. Note that we are assuming the timestamp will
241+
* never have a ">" in it.
242+
*
243+
* Note that we will always find some ">" before going off the front of
244+
* the string, because will always hit the split->mail_end closing
245+
* bracket.
246+
*/
247+
for (cp = line + len - 1; *cp != '>'; cp--)
248+
;
249+
250+
for (cp = cp + 1; cp < line + len && isspace(*cp); cp++)
237251
;
238252
if (line + len <= cp)
239253
goto person_only;

t/t4212-log-corrupt.sh

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,16 @@ test_expect_success 'setup' '
1313
git update-ref refs/heads/broken_email $(cat broken_email.hash)
1414
'
1515

16+
test_expect_success 'fsck notices broken commit' '
17+
git fsck 2>actual &&
18+
test_i18ngrep invalid.author actual
19+
'
20+
1621
test_expect_success 'git log with broken author email' '
1722
{
1823
echo commit $(cat broken_email.hash)
1924
echo "Author: A U Thor <[email protected]>"
20-
echo "Date: Thu Jan 1 00:00:00 1970 +0000"
25+
echo "Date: Thu Apr 7 15:13:13 2005 -0700"
2126
echo
2227
echo " foo"
2328
} >expect.out &&
@@ -30,7 +35,7 @@ test_expect_success 'git log with broken author email' '
3035
'
3136

3237
test_expect_success 'git log --format with broken author email' '
33-
echo "A U [email protected]+" >expect.out &&
38+
echo "A U [email protected]+Thu Apr 7 15:13:13 2005 -0700" >expect.out &&
3439
: >expect.err &&
3540
3641
git log --format="%an+%ae+%ad" broken_email >actual.out 2>actual.err &&

0 commit comments

Comments
 (0)