Skip to content

Commit f22cd66

Browse files
committed
utf8.c: Don't throw away work
Don't execute this loop if it would be pointless.
1 parent cc8bd1e commit f22cd66

File tree

1 file changed

+8
-7
lines changed

1 file changed

+8
-7
lines changed

utf8.c

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1564,7 +1564,9 @@ Perl__utf8n_to_uvchr_msgs_helper(const U8 *s,
15641564
}
15651565
else {
15661566
/* See if the input has malformations besides possibly overlong */
1567-
if (UNLIKELY(possible_problems & ~UTF8_GOT_LONG)) {
1567+
if ( UNLIKELY(possible_problems & ~UTF8_GOT_LONG)
1568+
&& LIKELY(flags & ~(UTF8_DISALLOW_NONCHAR|UTF8_WARN_NONCHAR)))
1569+
{
15681570

15691571
/* Here, the input is malformed in some way besides possibly
15701572
* overlong, except it doesn't overflow. If you look at the
@@ -1576,6 +1578,10 @@ Perl__utf8n_to_uvchr_msgs_helper(const U8 *s,
15761578
* be enough information present to determine if what we have
15771579
* so far would, if filled out completely, be for one of these
15781580
* problematic code points we are being asked to check for.
1581+
* But to determine if a code point is a non-character, we need
1582+
* all bytes, so this effort would be wasted, hence the
1583+
* conditional above excludes this step if those are the only
1584+
* thing being checked for.
15791585
*
15801586
* The range of surrogates is
15811587
* ASCII platforms EBCDIC I8
@@ -1601,12 +1607,7 @@ Perl__utf8n_to_uvchr_msgs_helper(const U8 *s,
16011607
* This is done by pretending the input was filled out to its
16021608
* full length with occurrences of the smallest continuation
16031609
* byte. For surrogates we could just look at the bytes, but
1604-
* this single algorithm works for both those and supers.
1605-
*
1606-
* To determine if a code point is a non-character, we need all
1607-
* bytes, so this effort is wasted if the caller is looking for
1608-
* just those, but that is unlikely; the two official Unicode
1609-
* restrictions include the other two. */
1610+
* this single algorithm works for both those and supers. */
16101611
for (unsigned i = curlen; i < expectlen; i++) {
16111612
uv = UTF8_ACCUMULATE(uv,
16121613
I8_TO_NATIVE_UTF8(UTF_MIN_CONTINUATION_BYTE));

0 commit comments

Comments
 (0)