Skip to content

Commit d7cccbb

Browse files
committed
Merge branch 'jk/utf-8-can-be-spelled-differently' into maint
Some platforms and users spell UTF-8 differently; retry with the most official "UTF-8" when the system does not understand the user-supplied encoding name that are the common alternative spellings of UTF-8. * jk/utf-8-can-be-spelled-differently: utf8: accept alternate spellings of UTF-8
2 parents 307d68e + 5c680be commit d7cccbb

File tree

1 file changed

+18
-2
lines changed

1 file changed

+18
-2
lines changed

utf8.c

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -507,9 +507,25 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e
507507

508508
if (!in_encoding)
509509
return NULL;
510+
510511
conv = iconv_open(out_encoding, in_encoding);
511-
if (conv == (iconv_t) -1)
512-
return NULL;
512+
if (conv == (iconv_t) -1) {
513+
/*
514+
* Some platforms do not have the variously spelled variants of
515+
* UTF-8, so let's fall back to trying the most official
516+
* spelling. We do so only as a fallback in case the platform
517+
* does understand the user's spelling, but not our official
518+
* one.
519+
*/
520+
if (is_encoding_utf8(in_encoding))
521+
in_encoding = "UTF-8";
522+
if (is_encoding_utf8(out_encoding))
523+
out_encoding = "UTF-8";
524+
conv = iconv_open(out_encoding, in_encoding);
525+
if (conv == (iconv_t) -1)
526+
return NULL;
527+
}
528+
513529
out = reencode_string_iconv(in, strlen(in), conv);
514530
iconv_close(conv);
515531
return out;

0 commit comments

Comments
 (0)