Skip to content

Conversation

@khwilliamson
Copy link
Contributor

find_by_class() is used in pattern matching.

This is a subtle bug fix when the input is malformed UTF-8. We say we don't support malformed, but this commit is a step towards better protecting against that eventuality.

frior to this commit, some patterns that use find_by_class() would exhibit different matching behavior of malformed input depending on if utf8 warnings were enabled or not.

This is because utf8_to_uvchr_buf() returns NUL if utf8 warnings are on; and the REPLACEMENT CHARACTER if they are off. If the match criteria accepts one but not the other, the behavior would differ.

Now, malformed input never matches a class

  • This set of changes does not require a perldelta entry.

find_by_class() is used in pattern matching.

This is a subtle bug fix when the input is malformed UTF-8.  We say we
don't support malformed, but this commit is a step towards better
protecting against that eventuality.

frior to this commit, some patterns that use find_by_class() would exhibit
different matching behavior of malformed input depending on if utf8
warnings were enabled or not.

This is because utf8_to_uvchr_buf() returns NUL if utf8 warnings are on;
and the REPLACEMENT CHARACTER if they are off.  If the match criteria
accepts one but not the other, the behavior would differ.

Now, malformed input never matches a class
@khwilliamson
Copy link
Contributor Author

This was superseded by #23060

@khwilliamson khwilliamson deleted the find_by_class branch March 18, 2025 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant