@@ -2007,8 +2007,27 @@ Perl_utf8_to_uv_msgs_helper_(const U8 * const s0,
2007
2007
*
2008
2008
* There are two main categories of potential problems.
2009
2009
*
2010
- * a) One type is by default not considered to be a problem. These
2011
- * are for when the input was syntactically valid
2010
+ * a) One type is considered by default to be problematic. There are
2011
+ * three subclasses:
2012
+ * 1) Some syntactic malformation meant that no code point could
2013
+ * be calculated for the input. An example is that the
2014
+ * sequence was incomplete, more bytes were called for than
2015
+ * the input contained. The function returns the Unicode
2016
+ * REPLACEMENT CHARACTER as the translation of these.
2017
+ * 2) The sequence is legal Perl extended UTF-8, but is for a
2018
+ * code point too large to be represented on this platform.
2019
+ * The function returns the Unicode REPLACEMENT CHARACTER as
2020
+ * the translation of these.
2021
+ * 3) The sequence represents a code point which can also be
2022
+ * represented by a shorter sequence. These have been
2023
+ * declared illegal by Unicode fiat because they were being
2024
+ * used as Trojan horses to successfully attack applications.
2025
+ * One undocumented flag causes these to be accepted, but
2026
+ * otherwise the function returns the Unicode REPLACEMENT
2027
+ * CHARACTER as the translation of these.
2028
+ *
2029
+ * b) The other type is by default not considered to be a problem.
2030
+ * These are for when the input was syntactically valid
2012
2031
* Perl-extended-UTF-8 for a code point that is representable on
2013
2032
* this platform, but that code point isn't considered by Unicode
2014
2033
* to be freely exchangeable between applications. To get here,
@@ -2027,25 +2046,6 @@ Perl_utf8_to_uv_msgs_helper_(const U8 * const s0,
2027
2046
* using the rules common to both types of problems, and detailed
2028
2047
* below.
2029
2048
*
2030
- * b) The other type is considered by default to be problematic.
2031
- * There are three subclasses:
2032
- * 1) Some syntactic malformation meant that no code point could
2033
- * be calculated for the input. An example is that the
2034
- * sequence was incomplete, more bytes were called for than
2035
- * the input contained. The function returns the Unicode
2036
- * REPLACEMENT CHARACTER as the translation of these.
2037
- * 2) The sequence is legal Perl extended UTF-8, but is for a
2038
- * code point too large to be represented on this platform.
2039
- * The function returns the Unicode REPLACEMENT CHARACTER as
2040
- * the translation of these.
2041
- * 3) The sequence represents a code point which can also be
2042
- * represented by a shorter sequence. These have been
2043
- * declared illegal by Unicode fiat because they were being
2044
- * used as Trojan horses to successfully attack applications.
2045
- * One undocumented flag causes these to be accepted, but
2046
- * otherwise the function returns the Unicode REPLACEMENT
2047
- * CHARACTER as the translation of these.
2048
- *
2049
2049
* In all cases the corresponding bit in *errors is set. This is
2050
2050
* in contrast to the other type of problem where the input
2051
2051
* 'flags' affect if the bit is set or not.
0 commit comments