Skip to content

Commit f254d77

Browse files
committed
utf8n_to_uvchr_msgs_helper(): Refactor expression
More rigorous testing of the overlong malformation, yet to be committed, showed that this didn't work as intended. The IS_UTF8_START_BYTE() excludes start bytes that always lead to overlong sequences. Fortunately the logic caused that to be mostly bypassed. But this commit fixes it all.
1 parent 3cf2df0 commit f254d77

File tree

1 file changed

+11
-7
lines changed

1 file changed

+11
-7
lines changed

utf8.c

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1534,15 +1534,19 @@ Perl__utf8n_to_uvchr_msgs_helper(const U8 *s,
15341534
possible_problems |= UTF8_GOT_OVERFLOW;
15351535
}
15361536

1537+
/* Is the first byte of 's' a start byte in the UTF-8 encoding system, not
1538+
* excluding starting an overlong sequence? */
1539+
#define UTF8_IS_SYNTACTIC_START_BYTE(s) (NATIVE_TO_I8(*s) >= 0xC0)
1540+
15371541
/* Check for overlong. If no problems so far, 'uv' is the correct code
1538-
* point value. Simply see if it is expressible in fewer bytes. Otherwise
1539-
* we must look at the UTF-8 byte sequence itself to see if it is for an
1540-
* overlong */
1542+
* point value. Simply see if it is expressible in fewer bytes. But if
1543+
* there are other malformations, we may be still be able to tell if this
1544+
* is an overlong by looking at the UTF-8 byte sequence itself */
15411545
if ( ( LIKELY(! possible_problems)
1542-
&& UNLIKELY(expectlen > (STRLEN) OFFUNISKIP(uv)))
1543-
|| ( UNLIKELY(possible_problems)
1544-
&& ( UNLIKELY(! UTF8_IS_START(*s0))
1545-
|| (UNLIKELY(0 < is_utf8_overlong(s0, s - s0))))))
1546+
&& UNLIKELY(expectlen > OFFUNISKIP(uv)))
1547+
|| ( UNLIKELY(possible_problems)
1548+
&& UTF8_IS_SYNTACTIC_START_BYTE(s0)
1549+
&& UNLIKELY(0 < is_utf8_overlong(s0, s - s0))))
15461550
{
15471551
possible_problems |= UTF8_GOT_LONG;
15481552
}

0 commit comments

Comments
 (0)