Skip to content

Commit 15454ad

Browse files
committed
utf8.c: Check specially for perl-extended UTF-8
More rigorous testing of the overlong malformation, yet to be committed, showed that this needs to be handled specially. This commit does part of that. Perl extended UTF-8 means you are using a start byte not recognized by any UTF-8 standard. Suppose it is an overlong sequence that reduces down to something representable using standard UTF-8. The string still used non-standard UTF-8 to get there, so should still be called out when the input parameters to this function ask for that. This commit is a first step towards that.
1 parent 9efd3ac commit 15454ad

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

utf8.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1601,6 +1601,15 @@ Perl__utf8n_to_uvchr_msgs_helper(const U8 *s,
16011601
* the overflow handling code */
16021602
&& LIKELY(! (possible_problems & UTF8_GOT_OVERFLOW)))
16031603
{
1604+
/* By examining just the first byte, we can see if this is using
1605+
* non-standard UTF-8. Even if it is an overlong that reduces to a
1606+
* small code point, it is still using this Perl invention, so mark it
1607+
* as such */
1608+
if (UNLIKELY(UTF8_IS_PERL_EXTENDED(s0))) {
1609+
possible_problems |= UTF8_GOT_SUPER;
1610+
}
1611+
else {
1612+
/* See if the input has malformations besides possibly overlong */
16041613
if (UNLIKELY(possible_problems & ~UTF8_GOT_LONG)) {
16051614

16061615
/* Here, there is a malformation other than overlong, we need to
@@ -1646,6 +1655,7 @@ Perl__utf8n_to_uvchr_msgs_helper(const U8 *s,
16461655
possible_problems |= UTF8_GOT_NONCHAR;
16471656
}
16481657
}
1658+
}
16491659
}
16501660

16511661
ready_to_handle_errors:

0 commit comments

Comments
 (0)