-
Notifications
You must be signed in to change notification settings - Fork 601
make pregexec() handle zero-length strings again #23980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
GH #23903 In embed.fnc, commit v5.43.3-167-g45ea12db26 added SPTR, EPTR parameter modifiers to (amongst other API functions), Perl_pregexec(). These cause assert constraints to be added to the effect that SPTR < EPTR (since the latter is supposed to be a pointer to the byte after the last character in the string). This falls down for an empty string since in this case pregexec() is called with strbeg == strend. This was causing an assert failure in the test suite for Package-Stash-XS. The reason it wasn't noticed before is because: 1) pregexec() is a thin wrapper over regexec_flags(); 2) The perl core (e.g. pp_match()) calls regexec_flags() rather than pregexec(); 3) Package::Stash::XS has XS code which calls pregexec() directly rather than using CALLREGEXEC() (which would call regexec_flags()); 4) In embed.fnc, regexec_flags()'s strend parameter is declared as NN rather than EPTR, so it doesn't get the assert added. So very little code was actually using pregexec(). This commit, for now, changes pregexec()'s strend parameter from EPTR to EPTRQ, which has the net effect of allowing zero-length strings to be passed, and thus fixes the CPAN issue. But longer term, we need to decide: is the general logic for EPTR wrong? Should the assert be SPTR <= EPTR? And should EPTR be applied to regexec_flags()'s strend parameter too?
Most of the EPTR cases in I suspect the main problem is naming, my inclination is that APIs that take strings generally accept empty strings, so I can see that the apparent default case
So I don't think it's a problem that we have a pointer decorator that requires non-empty strings, but perhaps the names could change. To avoid a default case, maybe 'EPTRE', allows empty, and
I think I had a quick look through the I'll do a PR for that. |
This was incorrectly asserting the supplied string had at least one character, which could produce an assertion instead of a useful error message for the user of pack(). Related to Perl#23980
|
On 12/3/25 05:02, David Mitchell wrote:
But longer term, we need to decide: is the general logic for EPTR wrong?
Should the assert be SPTR <= EPTR? And should EPTR be applied to
regexec_flags()'s strend parameter too?
The reason I chose strictly less than for the default case is that
corresponds to statements all over the perl core
while (s < e)
It's comparatively rare to see <=
But I have no investment in the current names. And this is the only
file the names exist in, so it's no problem changing them. I was trying
to use a minimum of characters in the spellings while still being
meaningful. Apparently something else is needed.
When choosing < vs <= to use in the initial commit; I tried < for all.
I changed those that failed the test suite to '<='. Then I looked
manually at the remainder, for clues. I deferred the ones where I
didn't see an obvious clue to some future point in time. Obviously I
misread the clues in (at least) these two cases.
|
This was incorrectly asserting the supplied string had at least one character, which could produce an assertion instead of a useful error message for the user of pack(). Related to #23980
|
On Wed, Dec 03, 2025 at 02:35:59PM -0800, Tony Cook wrote:
To avoid a default case, maybe 'EPTRE', allows empty, and `EPTRNE`, require non-empty, though NE is kind of overloaded
How about EPTR0 and EPTR1 - the former allows zero-length strings, the
latter 1+.
|
Considering perl and regexp notation maybe But I'd be happy with anything that doesn't have "default" case. |
GH #23903
In embed.fnc, commit v5.43.3-167-g45ea12db26 added SPTR, EPTR parameter modifiers to (amongst other API functions), Perl_pregexec().
These cause assert constraints to be added to the effect that SPTR < EPTR (since the latter is supposed to be a pointer to the byte after the last character in the string).
This falls down for an empty string since in this case pregexec() is called with strbeg == strend.
This was causing an assert failure in the test suite for Package-Stash-XS.
The reason it wasn't noticed before is because:
pregexec() is a thin wrapper over regexec_flags();
The perl core (e.g. pp_match()) calls regexec_flags() rather than
pregexec();
Package::Stash::XS has XS code which calls pregexec() directly rather
than using CALLREGEXEC() (which would call regexec_flags());
In embed.fnc, regexec_flags()'s strend parameter is declared as
NN rather than EPTR, so it doesn't get the assert added.
So very little code was actually using pregexec().
This commit, for now, changes pregexec()'s strend parameter from EPTR to EPTRQ, which has the net effect of allowing zero-length strings to be passed, and thus fixes the CPAN issue.
But longer term, we need to decide: is the general logic for EPTR wrong? Should the assert be SPTR <= EPTR? And should EPTR be applied to regexec_flags()'s strend parameter too?