Skip to content

Commit a1805b9

Browse files
committed
Merge branch 'Fix utf8 corner cases' into blead
There are around 20 different functions that take a UTF-8 sequence of bytes and try to find the ordinal code point represented by them. It was becoming clear that the existing tests in our suite were inadequate, not finding glaring bugs. And UTF-8 handling is important, with failures in it having been exploited by hackers in various products over the years for various nefarious purposes. I set out to improve the tests, spending way too much time before realizing that adding band aids to the current scheme was not going to work out. So I undertook rewriting the tests. This turned out to be way harder and time consuming than I expected. And it still isn't ready to go into blead. But along the way, I discovered that it was finding corner case bugs that I would never have anticipated. This series of commits fixes those, while simplifying the code and reducing redundancy. The new test file needs clean-up, and probably ways to make it faster, but it is finally far enough along that I believe it has caught most of the bugs out there. So I'm submitting these now to get into v5.42. The deadline for the test file is later in the development process.
2 parents 6a4f62c + cab4c62 commit a1805b9

File tree

4 files changed

+569
-558
lines changed

4 files changed

+569
-558
lines changed

ext/XS-APItest/t/utf8_warn_base.pl

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2020,8 +2020,12 @@ ($)
20202020
@warnings_gotten = @returned_warnings;
20212021
}
20222022

2023+
SKIP: {
2024+
skip "$0 doesn't handle _msgs functions AV returns", 1
2025+
if $utf8_func =~ /_msgs/;
20232026
do_warnings_test(@expected_warnings)
20242027
or diag "Call was: " . utf8n_display_call($eval_text);
2028+
}
20252029
undef @warnings_gotten;
20262030

20272031
# Check CHECK_ONLY results when the input is

inline.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3244,7 +3244,6 @@ PERL_STATIC_INLINE UV
32443244
Perl_utf8_to_uvchr_buf(pTHX_ const U8 *s, const U8 *send, STRLEN *retlen)
32453245
{
32463246
PERL_ARGS_ASSERT_UTF8_TO_UVCHR_BUF;
3247-
assert(s < send);
32483247

32493248
UV cp;
32503249

0 commit comments

Comments
 (0)