Skip to content

Conversation

@iabyn
Copy link
Contributor

@iabyn iabyn commented Dec 3, 2025

GH #23903

In embed.fnc, commit v5.43.3-167-g45ea12db26 added SPTR, EPTR parameter modifiers to (amongst other API functions), Perl_pregexec().

These cause assert constraints to be added to the effect that SPTR < EPTR (since the latter is supposed to be a pointer to the byte after the last character in the string).

This falls down for an empty string since in this case pregexec() is called with strbeg == strend.

This was causing an assert failure in the test suite for Package-Stash-XS.

The reason it wasn't noticed before is because:

  1. pregexec() is a thin wrapper over regexec_flags();

  2. The perl core (e.g. pp_match()) calls regexec_flags() rather than
    pregexec();

  3. Package::Stash::XS has XS code which calls pregexec() directly rather
    than using CALLREGEXEC() (which would call regexec_flags());

  4. In embed.fnc, regexec_flags()'s strend parameter is declared as
    NN rather than EPTR, so it doesn't get the assert added.

So very little code was actually using pregexec().

This commit, for now, changes pregexec()'s strend parameter from EPTR to EPTRQ, which has the net effect of allowing zero-length strings to be passed, and thus fixes the CPAN issue.

But longer term, we need to decide: is the general logic for EPTR wrong? Should the assert be SPTR <= EPTR? And should EPTR be applied to regexec_flags()'s strend parameter too?

  • This set of changes does not require a perldelta entry, as it fixes a bug only introduced in previous dev release.

GH #23903

In embed.fnc, commit v5.43.3-167-g45ea12db26 added SPTR, EPTR parameter
modifiers to (amongst other API functions), Perl_pregexec().

These cause assert constraints to be added to the effect that SPTR <
EPTR (since the latter is supposed to be a pointer to the byte after the
last character in the string).

This falls down for an empty string since in this case pregexec() is
called with strbeg == strend.

This was causing an assert failure in the test suite for
Package-Stash-XS.

The reason it wasn't noticed before is because:

1) pregexec() is a thin wrapper over regexec_flags();

2) The perl core (e.g. pp_match()) calls regexec_flags() rather than
   pregexec();

3) Package::Stash::XS has XS code which calls pregexec() directly rather
   than using CALLREGEXEC() (which would call regexec_flags());

4) In embed.fnc, regexec_flags()'s strend parameter is declared as
   NN rather than EPTR, so it doesn't get the assert added.

So very little code was actually using pregexec().

This commit, for now, changes pregexec()'s strend parameter from EPTR to
EPTRQ, which has the net effect of allowing zero-length strings to be
passed, and thus fixes the CPAN issue.

But longer term, we need to decide: is the general logic for EPTR wrong?
Should the assert be SPTR <= EPTR? And should EPTR be applied to
regexec_flags()'s strend parameter too?
@tonycoz
Copy link
Contributor

tonycoz commented Dec 3, 2025

But longer term, we need to decide: is the general logic for EPTR wrong? Should the assert be SPTR <= EPTR?

Most of the EPTR cases in embed.fnc are for cases where an empty string makes no sense - the caller should be checking that a non-empty string is supplied.

I suspect the main problem is naming, my inclination is that APIs that take strings generally accept empty strings, so I can see that the apparent default case EPTR requiring non-empty being unexpected.

EPTRQ doesn't scream 'allows empty' to me either.

So I don't think it's a problem that we have a pointer decorator that requires non-empty strings, but perhaps the names could change.

To avoid a default case, maybe 'EPTRE', allows empty, and EPTRNE, require non-empty, though NE is kind of overloaded, I might be oversensitised to that right now.

And should EPTR be applied to regexec_flags()'s strend parameter too?

I think EPTRQ or whatever name replaces it should be applied.

I had a quick look through the EPTR cases and only identified one other incorrect decoration:

tony@venus:.../git/perl6$ ./perl -le 'print pack("C[")'
perl: pp_pack.c:572: S_group_end: Assertion `patptr < patend' failed.
Aborted
tony@venus:.../git/perl6$ perl -le 'print pack("C[")'
No group ending character ']' found in template at -e line 1.

I'll do a PR for that.

tonycoz added a commit to tonycoz/perl5 that referenced this pull request Dec 3, 2025
This was incorrectly asserting the supplied string had at least one
character, which could produce an assertion instead of a useful error
message for the user of pack().

Related to Perl#23980
@khwilliamson
Copy link
Contributor

khwilliamson commented Dec 4, 2025 via email

tonycoz added a commit that referenced this pull request Dec 4, 2025
This was incorrectly asserting the supplied string had at least one
character, which could produce an assertion instead of a useful error
message for the user of pack().

Related to #23980
@iabyn
Copy link
Contributor Author

iabyn commented Dec 4, 2025 via email

@iabyn iabyn merged commit 343b977 into blead Dec 4, 2025
68 checks passed
@iabyn iabyn deleted the davem/gh23903 branch December 4, 2025 18:42
@tonycoz
Copy link
Contributor

tonycoz commented Dec 4, 2025

The reason I chose strictly less than for the default case is that corresponds to statements all over the perl core

while (s < e)

It's comparatively rare to see <=

while (s < e) is perfectly valid for empty and longer strings - an empty string is just the trivial case where nothing is done.

How about EPTR0 and EPTR1 - the former allows zero-length strings, the latter 1+.

Considering perl and regexp notation maybe EPTR* and EPTR+, the * in context shouldn't be ambiguous.

But I'd be happy with anything that doesn't have "default" case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants