Skip to content

Commit fb62eb7

Browse files
René Scharfegitster
authored andcommitted
grep -w: forward to next possible position after rejected match
grep -w accepts matches between non-word characters, only. If a match from regexec() doesn't meet this criteria, grep continues its search after the first character of that match. We can be a bit smarter here and skip all positions that follow a word character first, as they can't match our criteria. This way we can consume characters quite cheaply and don't need to special-case the handling of the beginning of a line. Here's a contrived example command on msysgit (best of five runs): $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.611s user 0m0.000s sys 0m0.015s With the patch it's quite a bit faster: $ time git grep -w ...... v1.6.1 >/dev/null real 0m1.179s user 0m0.000s sys 0m0.015s More common search patterns will gain a lot less, but it's a nice clean up anyway. Signed-off-by: Rene Scharfe <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent c123b7c commit fb62eb7

File tree

1 file changed

+7
-4
lines changed

1 file changed

+7
-4
lines changed

grep.c

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,6 @@ static struct {
294294
static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol, char *eol, enum grep_context ctx)
295295
{
296296
int hit = 0;
297-
int at_true_bol = 1;
298297
int saved_ch = 0;
299298
regmatch_t pmatch[10];
300299

@@ -337,7 +336,7 @@ static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol
337336
* either end of the line, or at word boundary
338337
* (i.e. the next char must not be a word char).
339338
*/
340-
if ( ((pmatch[0].rm_so == 0 && at_true_bol) ||
339+
if ( ((pmatch[0].rm_so == 0) ||
341340
!word_char(bol[pmatch[0].rm_so-1])) &&
342341
((pmatch[0].rm_eo == (eol-bol)) ||
343342
!word_char(bol[pmatch[0].rm_eo])) )
@@ -349,10 +348,14 @@ static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol
349348
/* There could be more than one match on the
350349
* line, and the first match might not be
351350
* strict word match. But later ones could be!
351+
* Forward to the next possible start, i.e. the
352+
* next position following a non-word char.
352353
*/
353354
bol = pmatch[0].rm_so + bol + 1;
354-
at_true_bol = 0;
355-
goto again;
355+
while (word_char(bol[-1]) && bol < eol)
356+
bol++;
357+
if (bol < eol)
358+
goto again;
356359
}
357360
}
358361
if (p->token == GREP_PATTERN_HEAD && saved_ch)

0 commit comments

Comments
 (0)