Skip to content

Commit 3d2e1f3

Browse files
committed
perlapi: Give more detail about utf8n_to_uvchr
1 parent cf50778 commit 3d2e1f3

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

utf8.c

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1246,12 +1246,14 @@ C<utf8n_to_uvchr> is more like an extension of C<utf8_to_uvchr_buf>, but
12461246
with fewer quirks, and a different method of specifying the bytes in C<s> it is
12471247
allowed to examine. It has a C<curlen> parameter instead of an C<e> parameter,
12481248
so the furthest byte in C<s> it can look at is S<C<s + curlen - 1>>. Its
1249-
return value is, like C<utf8_to_uvchr_buf>, ambiguous with respect to the NUL
1250-
and REPLACEMENT characters, but the value of C<*retlen> can be relied on
1251-
(except with the C<UTF8_CHECK_ONLY> flag described below) to know where the
1252-
next possible character along C<s> starts, removing that quirk. Hence, you
1253-
always should use C<*retlen> to determine where the next character in C<s>
1254-
starts.
1249+
failure return value is not dependent on if warnings are enabled or not. It is
1250+
always 0 upon failure. But like C<utf8_to_uvchr_buf>, 0 could also be the
1251+
return for a successful translation of an input C<NUL> character. Use the same
1252+
method given above for disambiguating this. Unlike C<utf8_to_uvchr_buf>,
1253+
C<*retlen> can be relied on (except with the C<UTF8_CHECK_ONLY> flag described
1254+
below) to know where the next possible character along C<s> starts, removing
1255+
that quirk. Hence, you always should use C<*retlen> to determine where the
1256+
next character in C<s> starts.
12551257
12561258
These functions have an additional parameter, C<flags>, besides the ones in
12571259
C<utf8_to_uv> and C<utf8_to_uvchr_buf>, which can be used to broaden or

0 commit comments

Comments
 (0)