utf8n_to_uvchr_msgs(): Fix wrong overflow warn category

khwilliamson · khwilliamson · commit 6c47c372584d · 2024-11-26T10:49:55.000-07:00
This function looks for various malformation in the input string that is
being converted from UTF-8 to its equivalen code point ordinal value.
When it finds an issue, it can either raise a warning (if appropriate) or
pass the needed information about the warning back to the caller when
the function parameters say to.  The data for each should be identical,
but prior to this commit, they weren't in one unlikely case.

This happened when the input UTF-8 sequence represents a code point
whose value doesn't fit in the platform's word size.  This is viewed
as a malformation, and, if enabled, a warning using the WARN_UTF8
category is raised.  But if disabled, another way to look at it is that
this is an attempt to use a code point that isn't legal Unicode.  There
is another warnings category for that, WARN_NON_UNICODE.  And, so a warning
is raised if that category is enabled.

Note that WARN_NON_UNICODE is a subcategory of WARN_UTF8, so the only
way to get to this situation is

    no warnings 'utf8'; use warnings 'non_unicode';

(those two statements could be separated by many lines)

Prior to this commit, if the caller asked for the warning information to
be passed to it instead of raising the warnings, WARN_NON_UNICODE never
was returned, making the two modes sometimes inconsistent.

With this commit, WARN_NON_UNICODE is passed to the caller if (and only
if) a warning would otherwise have been generated using it.

This bug was found with tests that will be commited later.
diff --git a/utf8.c b/utf8.c
@@ -1727,12 +1727,16 @@ Perl__utf8n_to_uvchr_msgs_helper(const U8 *s,
                      * necessarily do so in the future.  We output (only) the
                      * most dire warning */
                     if (! (flags & UTF8_CHECK_ONLY)) {
-                        if (msgs || ckWARN_d(WARN_UTF8)) {
+                        if (ckWARN_d(WARN_UTF8)) {
                             pack_warn = packWARN(WARN_UTF8);
                         }
-                        else if (msgs || ckWARN_d(WARN_NON_UNICODE)) {
+                        else if (ckWARN_d(WARN_NON_UNICODE)) {
                             pack_warn = packWARN(WARN_NON_UNICODE);
                         }
+                        else if (msgs) {
+                            pack_warn = packWARN(WARN_UTF8);
+                        }
+
                         if (pack_warn) {
                             message = Perl_form(aTHX_ "%s: %s (overflows)",
                                             malformed_text,