Skip to content

Commit 9d310cc

Browse files
committed
utf8_to_bytes: Refactor loop
The previous version did not make sure that it wasn't reading beyond the end of the buffer in all cases, and the first pass through the input string already ruled out it having most problems. Thus we don't need the full generality here of the macro UTF8_IS_DOWNGRADEABLE_START; and this simplifies things
1 parent 47b98c3 commit 9d310cc

File tree

1 file changed

+16
-13
lines changed

1 file changed

+16
-13
lines changed

utf8.c

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2518,23 +2518,26 @@ Perl_utf8_to_bytes(pTHX_ U8 *s, STRLEN *lenp)
25182518
U8 * d = s = first_variant;
25192519

25202520
while (s < send) {
2521+
U8 c = *s++;
2522+
if (! UVCHR_IS_INVARIANT(c)) {
2523+
2524+
/* Then it is a multi-byte character. The first pass above
2525+
* determined that the string contains only invariants, the C2 and
2526+
* C3 start bytes, and continuation bytes. The condition above
2527+
* excluded this from being an invariant. To be well formed, it
2528+
* needs to be a start byte followed by a continuation byte. */
2529+
if ( UNLIKELY( UTF8_IS_CONTINUATION(c))
2530+
|| UNLIKELY( s >= send)
2531+
|| UNLIKELY(! UTF8_IS_CONTINUATION(*s)))
2532+
{
2533+
goto cant_convert;
2534+
}
25212535

2522-
if (UVCHR_IS_INVARIANT(*s)) {
2523-
*d++ = *s++;
2524-
continue;
2525-
}
2526-
2527-
/* Here it is two-byte encoded. */
2528-
if ( LIKELY(UTF8_IS_DOWNGRADEABLE_START(*s))
2529-
&& LIKELY(UTF8_IS_CONTINUATION((s[1]))))
2530-
{
2531-
U8 first_byte = *s++;
2532-
*d++ = EIGHT_BIT_UTF8_TO_NATIVE(first_byte, *s);
2536+
c = EIGHT_BIT_UTF8_TO_NATIVE(c, *s);
25332537
s++;
2534-
continue;
25352538
}
25362539

2537-
goto cant_convert;
2540+
*d++ = c;
25382541
}
25392542

25402543
/* Success! */

0 commit comments

Comments
 (0)