Skip to content

Commit ea1615d

Browse files
peffgitster
authored andcommitted
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the closing ">" of the email, and then expect the timestamp right after that. But we've seen some broken cases in the wild where this fails, but we _could_ find the timestamp with a little extra work. E.g.: Name <Name<email>> 123456789 -0500 This means that features that rely on the committer timestamp, like --since or --until, will treat the commit as happening at time 0 (i.e., 1970). This is doubly confusing because the pretty-print parser learned to handle these in 03818a4 (split_ident: parse timestamp from end of line, 2013-10-14). So printing them via "git show", etc, makes everything look normal, but --until, etc are still broken (despite the fact that that commit explicitly mentioned --until!). So let's use the same trick as 03818a4: find the end of the line, and parse back to the final ">". In theory we could use split_ident_line() here, but it's actually a bit more strict. In particular, it requires a valid time-zone token, too. That should be present, of course, but we wouldn't want to break --until for cases that are working currently. We might want to teach split_ident_line() to become more lenient there, but it would require checking its many callers (since right now they can assume that if date_start is non-NULL, so is tz_start). So for now we'll just reimplement the same trick in the commit parser. The test is in t4212, which already covers similar cases, courtesy of 03818a4. We'll just adjust the broken commit to munge both the author and committer timestamps. Note that we could match (author|committer) here, but alternation can't be used portably in sed. Since we wouldn't expect to see ">" except as part of an ident line, we can just match that character on any line. Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 2063b86 commit ea1615d

File tree

2 files changed

+22
-9
lines changed

2 files changed

+22
-9
lines changed

commit.c

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ struct commit *lookup_commit_reference_by_name(const char *name)
9191
static timestamp_t parse_commit_date(const char *buf, const char *tail)
9292
{
9393
const char *dateptr;
94+
const char *eol;
9495

9596
if (buf + 6 >= tail)
9697
return 0;
@@ -102,16 +103,23 @@ static timestamp_t parse_commit_date(const char *buf, const char *tail)
102103
return 0;
103104
if (memcmp(buf, "committer", 9))
104105
return 0;
105-
while (buf < tail && *buf++ != '>')
106-
/* nada */;
107-
if (buf >= tail)
106+
107+
/*
108+
* Jump to end-of-line so that we can walk backwards to find the
109+
* end-of-email ">". This is more forgiving of malformed cases
110+
* because unexpected characters tend to be in the name and email
111+
* fields.
112+
*/
113+
eol = memchr(buf, '\n', tail - buf);
114+
if (!eol)
108115
return 0;
109-
dateptr = buf;
110-
while (buf < tail && *buf++ != '\n')
111-
/* nada */;
112-
if (buf >= tail)
116+
dateptr = eol;
117+
while (dateptr > buf && dateptr[-1] != '>')
118+
dateptr--;
119+
if (dateptr == buf || dateptr == eol)
113120
return 0;
114-
/* dateptr < buf && buf[-1] == '\n', so parsing will stop at buf-1 */
121+
122+
/* dateptr < eol && *eol == '\n', so parsing will stop at eol */
115123
return parse_timestamp(dateptr, NULL, 10);
116124
}
117125

t/t4212-log-corrupt.sh

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ test_expect_success 'setup' '
99
test_commit foo &&
1010
1111
git cat-file commit HEAD >ok.commit &&
12-
sed "/^author /s/>/>-<>/" <ok.commit >broken_email.commit &&
12+
sed "s/>/>-<>/" <ok.commit >broken_email.commit &&
1313
1414
git hash-object --literally -w -t commit broken_email.commit >broken_email.hash &&
1515
git update-ref refs/heads/broken_email $(cat broken_email.hash)
@@ -44,6 +44,11 @@ test_expect_success 'git log --format with broken author email' '
4444
test_must_be_empty actual.err
4545
'
4646

47+
test_expect_success '--until handles broken email' '
48+
git rev-list --until=1980-01-01 broken_email >actual &&
49+
test_must_be_empty actual
50+
'
51+
4752
munge_author_date () {
4853
git cat-file commit "$1" >commit.orig &&
4954
sed "s/^\(author .*>\) [0-9]*/\1 $2/" <commit.orig >commit.munge &&

0 commit comments

Comments
 (0)