Skip to content

Commit cffb5af

Browse files
committed
perldelta for utf8_to_uv() family
1 parent ae865e7 commit cffb5af

File tree

2 files changed

+39
-3
lines changed

2 files changed

+39
-3
lines changed

pod/perldelta.pod

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -406,6 +406,42 @@ well.
406406

407407
=item *
408408

409+
New API functions are introduced to convert strings encoded in UTF-8 to
410+
their ordinal code point equivalent. These are safe to use by default,
411+
and generally more convenient to use than the existing ones.
412+
413+
L<perlapi/C<utf8_to_uv>> replaces L<perlapi/C<utf8_to_uvchr>> (which is
414+
retained for backwards compatibility), but you should convert to use the
415+
new form, as likely you aren't using the old one safely.
416+
417+
There are also two new functions, L<perlapi/C<strict_utf8_to_uv>> and
418+
L<perlapi/C<c9strict_utf8_to_uv>> which do the same thing except when
419+
the input string represents a code point that Unicode doesn't accept as
420+
legal for interchange, using either the strict original definition
421+
(C<strict_utf8_to_uv>), or the looser one given by
422+
L<Unicode Corrigendum #9|https://www.unicode.org/versions/corrigendum9.html>
423+
(C<c9strict_utf8_to_uv>). When the input string represents one of the
424+
restricted code points, these functions return the Unicode
425+
C<REPLACEMENT CHARACTER> instead.
426+
427+
Also L<perlapi/C<extended_utf8_to_uv>> is a synonym for C<utf8_to_uv>, for use
428+
when you want to emphasize that the entire range of Perl extended UTF-8
429+
is acceptable.
430+
431+
There are also replacement functions for the three more specialized
432+
conversion functions that you are unlikely to need to use. Again, the
433+
old forms are kept for backwards compatibility, but you should convert
434+
to use the new forms.
435+
436+
L<perlapi/C<utf8_to_uv_flags>> replaces L<perlapi/C<utf8n_to_uvchr>>.
437+
438+
L<perlapi/C<utf8_to_uv_errors>> replaces L<perlapi/C<utf8n_to_uvchr_error>>.
439+
440+
L<perlapi/C<utf8_to_uv_msgs>> replaces
441+
L<perlapi/C<utf8n_to_uvchr_msgs>>.
442+
443+
=item *
444+
409445
Three new API functions are introduced to convert strings encoded in
410446
UTF-8 to native bytes format (if possible). These are easier to use
411447
than the existing ones, and they avoid unnecessary memory allocations.

utf8.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1065,20 +1065,20 @@ syntactically invalid UTF-8.
10651065
10661066
=over 4
10671067
1068-
=item C<strict_utf8_to_uv>
1068+
=item * C<strict_utf8_to_uv>
10691069
10701070
additionally rejects any UTF-8 that translates into a code point that isn't
10711071
specified by Unicode to be freely exchangeable, namely the surrogate characters
10721072
and non-character code points (besides non-Unicode code points, any above
10731073
0x10FFFF). It does not raise a warning when rejecting.
10741074
1075-
=item C<c9strict_utf8_to_uv>
1075+
=item * C<c9strict_utf8_to_uv>
10761076
10771077
instead uses the exchangeable definition given by Unicode's Corregendum #9,
10781078
which accepts non-character code points while still rejecting surrogates. It
10791079
does not raise a warning when rejecting.
10801080
1081-
=item C<extended_utf8_to_uv>
1081+
=item * C<extended_utf8_to_uv>
10821082
10831083
accepts all syntactically valid UTF-8, as extended by Perl to allow 64-bit code
10841084
points to be encoded.

0 commit comments

Comments
 (0)