toke.c: Add conditional to skip memcmp() #23667

khwilliamson · 2025-09-01T20:08:58Z

S_scan_str() calls memEQ() a bunch of times. The general case is that the strings are multiple bytes long, but most of the time, at least one of the operands will be just a single byte. We can avoid a libc call by just comparing the single bytes when the length is 1.

Tony Cook is of the opinion that other uses of memEQ() will more likely be multiple bytes, so adding this check to all calls would not be beneficial. I looked through the core source, and agree. So this adds a macro that tests for a single byte, and if multiple calls plain memEQ().

Spotted by Daniel Dragan

This set of changes does not require a perldelta entry.

toke.c

xenu · 2025-09-02T04:41:24Z

Is this worth complicating the code? Is this particular part of the lexer a perfomance bottleneck? Hell, is the lexer in its entirety a bottleneck?

khwilliamson · 2025-09-02T18:23:01Z

The lexer is not a bottleneck, in its entirety.

So, its a judgment call whether this complication is worth it. The original p.r. this idea is taken from #23533, and which was approved at one point, added tests at every place. I thought that made the code too hard to read, and am offering this alternative, which moves the complexity into a macro.

I don't think this p.r. will have any noticeable impact on performance, but I don't think it makes things harder to comprehend. It might keep future code readers from making the same change.

S_scan_str() calls memEQ() a bunch of times. The general case is that the strings are multiple bytes long, but most of the time, at least one of the operands will be just a single byte. We can avoid a libc call by just comparing the single bytes when the length is 1. Tony Cook is of the opinion that other uses of memEQ() will more likely be multiple bytes, so adding this check to all calls would not be beneficial. I looked through the core source, and agree. So this adds a macro that tests for a single byte, and if multiple calls plain memEQ(). Spotted by Daniel Dragan Since the tokenizer is not hot code, this won't make a noticeable difference in performance. I think the reason to do it is mainly to show it has been done for people in the future who would otherwise come along and notice this

Somehow this got left out of the rebase for GH #23667

khwilliamson force-pushed the memEQ1 branch from 9b2c1a3 to 9c06390 Compare September 1, 2025 20:34

tonycoz reviewed Sep 2, 2025

View reviewed changes

toke.c Outdated Show resolved Hide resolved

khwilliamson force-pushed the memEQ1 branch from 9c06390 to b16c505 Compare September 2, 2025 18:40

khwilliamson force-pushed the memEQ1 branch from b16c505 to 6eb0d35 Compare September 20, 2025 19:02

khwilliamson merged commit 5fdb3e5 into Perl:blead Sep 20, 2025
33 checks passed

khwilliamson added a commit that referenced this pull request Sep 22, 2025

toke.c: Add missing assert()

90f96e7

Somehow this got left out of the rebase for GH #23667

khwilliamson deleted the memEQ1 branch September 23, 2025 02:43

khwilliamson mentioned this pull request Oct 1, 2025

toke.c dont call libc's memcmp() to test 1 byte in Perl_scan_str() #23533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

toke.c: Add conditional to skip memcmp() #23667

toke.c: Add conditional to skip memcmp() #23667

Uh oh!

khwilliamson commented Sep 1, 2025

Uh oh!

Uh oh!

xenu commented Sep 2, 2025

Uh oh!

khwilliamson commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

toke.c: Add conditional to skip memcmp() #23667

toke.c: Add conditional to skip memcmp() #23667

Uh oh!

Conversation

khwilliamson commented Sep 1, 2025

Uh oh!

Uh oh!

xenu commented Sep 2, 2025

Uh oh!

khwilliamson commented Sep 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants