Skip to content

perlapi: Give more detail about utf8n_to_uvchr #23556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: blead
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions utf8.c
Original file line number Diff line number Diff line change
Expand Up @@ -1246,12 +1246,14 @@ C<utf8n_to_uvchr> is more like an extension of C<utf8_to_uvchr_buf>, but
with fewer quirks, and a different method of specifying the bytes in C<s> it is
allowed to examine. It has a C<curlen> parameter instead of an C<e> parameter,
so the furthest byte in C<s> it can look at is S<C<s + curlen - 1>>. Its
return value is, like C<utf8_to_uvchr_buf>, ambiguous with respect to the NUL
and REPLACEMENT characters, but the value of C<*retlen> can be relied on
(except with the C<UTF8_CHECK_ONLY> flag described below) to know where the
next possible character along C<s> starts, removing that quirk. Hence, you
always should use C<*retlen> to determine where the next character in C<s>
starts.
failure return value is not dependent on if warnings are enabled or not. It is
always 0 upon failure. But like C<utf8_to_uvchr_buf>, 0 could also be the
return for a successful translation of an input C<NUL> character. Use the same
method given above for disambiguating this. Unlike C<utf8_to_uvchr_buf>,
C<*retlen> can be relied on (except with the C<UTF8_CHECK_ONLY> flag described
below) to know where the next possible character along C<s> starts, removing
that quirk. Hence, you always should use C<*retlen> to determine where the
next character in C<s> starts.

These functions have an additional parameter, C<flags>, besides the ones in
C<utf8_to_uv> and C<utf8_to_uvchr_buf>, which can be used to broaden or
Expand Down
Loading