Skip to content

Conversation

@khwilliamson
Copy link
Contributor

S_scan_str() calls memEQ() a bunch of times. The general case is that the strings are multiple bytes long, but most of the time, at least one of the operands will be just a single byte. We can avoid a libc call by just comparing the single bytes when the length is 1.

Tony Cook is of the opinion that other uses of memEQ() will more likely be multiple bytes, so adding this check to all calls would not be beneficial. I looked through the core source, and agree. So this adds a macro that tests for a single byte, and if multiple calls plain memEQ().

Spotted by Daniel Dragan

  • This set of changes does not require a perldelta entry.

@xenu
Copy link
Member

xenu commented Sep 2, 2025

Is this worth complicating the code? Is this particular part of the lexer a perfomance bottleneck? Hell, is the lexer in its entirety a bottleneck?

@khwilliamson
Copy link
Contributor Author

The lexer is not a bottleneck, in its entirety.

So, its a judgment call whether this complication is worth it. The original p.r. this idea is taken from #23533, and which was approved at one point, added tests at every place. I thought that made the code too hard to read, and am offering this alternative, which moves the complexity into a macro.

I don't think this p.r. will have any noticeable impact on performance, but I don't think it makes things harder to comprehend. It might keep future code readers from making the same change.

S_scan_str() calls memEQ() a bunch of times.  The general case is that
the strings are multiple bytes long, but most of the time, at least one
of the operands will be just a single byte.  We can avoid a libc call by
just comparing the single bytes when the length is 1.

Tony Cook is of the opinion that other uses of memEQ() will more likely
be multiple bytes, so adding this check to all calls would not be
beneficial.  I looked through the core source, and agree.  So this adds
a macro that tests for a single byte, and if multiple calls plain
memEQ().

Spotted by Daniel Dragan

Since the tokenizer is not hot code, this won't make a noticeable
difference in performance.  I think the reason to do it is mainly to
show it has been done for people in the future who would otherwise come
along and notice this
@khwilliamson khwilliamson merged commit 5fdb3e5 into Perl:blead Sep 20, 2025
33 checks passed
khwilliamson added a commit that referenced this pull request Sep 22, 2025
Somehow this got left out of the rebase for GH #23667
@khwilliamson khwilliamson deleted the memEQ1 branch September 23, 2025 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants