Skip to content

Commit 5fdb3e5

Browse files
committed
toke.c: Add conditional to skip memcmp()
S_scan_str() calls memEQ() a bunch of times. The general case is that the strings are multiple bytes long, but most of the time, at least one of the operands will be just a single byte. We can avoid a libc call by just comparing the single bytes when the length is 1. Tony Cook is of the opinion that other uses of memEQ() will more likely be multiple bytes, so adding this check to all calls would not be beneficial. I looked through the core source, and agree. So this adds a macro that tests for a single byte, and if multiple calls plain memEQ(). Spotted by Daniel Dragan Since the tokenizer is not hot code, this won't make a noticeable difference in performance. I think the reason to do it is mainly to show it has been done for people in the future who would otherwise come along and notice this
1 parent 2cb0034 commit 5fdb3e5

File tree

1 file changed

+9
-5
lines changed

1 file changed

+9
-5
lines changed

toke.c

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11564,6 +11564,11 @@ S_scan_inputsymbol(pTHX_ char *start)
1156411564
return s;
1156511565
}
1156611566

11567+
/* In the function below, it's quite likely that the calls to memEQ() will have
11568+
* 'len' == 1. So create a new macro that adds a conditional to skip the libc
11569+
* call. */
11570+
#define memEQ_1(a, b, len) \
11571+
((*(a) == *(b)) && ((LIKELY((len) <= 1) || memEQ((a)+1, (b)+1, (len)-1))))
1156711572

1156811573
/* scan_str
1156911574
takes:
@@ -11616,7 +11621,6 @@ S_scan_inputsymbol(pTHX_ char *start)
1161611621
For convenience, the terminating delimiter character is stuffed into
1161711622
SvIVX of the SV.
1161811623
*/
11619-
1162011624
char *
1162111625
Perl_scan_str(pTHX_ char *start, int keep_bracketed_quoted, int keep_delims, int re_reparse,
1162211626
char **delimp
@@ -11796,18 +11800,18 @@ Perl_scan_str(pTHX_ char *start, int keep_bracketed_quoted, int keep_delims, int
1179611800
* discard those that escape the closing delimiter, just
1179711801
* discard this one */
1179811802
if ( ! keep_bracketed_quoted
11799-
&& ( memEQ(s + 1, open_delim_str, delim_byte_len)
11803+
&& ( memEQ_1(s + 1, open_delim_str, delim_byte_len)
1180011804
|| ( PL_multi_open == PL_multi_close
1180111805
&& re_reparse && s[1] == '\\')
11802-
|| memEQ(s + 1, close_delim_str, delim_byte_len)))
11806+
|| memEQ_1(s + 1, close_delim_str, delim_byte_len)))
1180311807
{
1180411808
s++;
1180511809
}
1180611810
else /* any other escapes are simply copied straight through */
1180711811
*to++ = *s++;
1180811812
}
1180911813
else if ( s < PL_bufend - (delim_byte_len - 1)
11810-
&& memEQ(s, close_delim_str, delim_byte_len)
11814+
&& memEQ_1(s, close_delim_str, delim_byte_len)
1181111815
&& --brackets <= 0)
1181211816
{
1181311817
/* Found unescaped closing delimiter, unnested if we care about
@@ -11836,7 +11840,7 @@ Perl_scan_str(pTHX_ char *start, int keep_bracketed_quoted, int keep_delims, int
1183611840
/* No nesting if open eq close */
1183711841
else if ( PL_multi_open != PL_multi_close
1183811842
&& s < PL_bufend - (delim_byte_len - 1)
11839-
&& memEQ(s, open_delim_str, delim_byte_len))
11843+
&& memEQ_1(s, open_delim_str, delim_byte_len))
1184011844
{
1184111845
brackets++;
1184211846
}

0 commit comments

Comments
 (0)