@@ -279,19 +279,31 @@ X<-C>
279279
280280The B<-C> flag controls some of the Perl Unicode features.
281281
282+ B<CAUTION:> As with the L<C<:utf8> PerlIO layer|PerlIO/:utf8>, none of
283+ the features enabled by this flag or the equivalent C<PERL_UNICODE>
284+ environment variable validate that input is valid UTF-8, nor guarantee
285+ to produce valid UTF-8. Instead it will assume input is provided in
286+ Perl's internal upgraded byte encoding, and provide output in this
287+ encoding, which is a superset of UTF-8 that can encode any character
288+ allowed in Perl strings. (On EBCDIC systems, it is a superset of
289+ UTF-EBCDIC instead.) This can result in broken Perl strings or output
290+ bytes which are not valid in UTF-8. This internal encoding will be
291+ referred to as C<utf8> below to differentiate it from a strict UTF-8
292+ encoding format.
293+
282294As of 5.8.1, the B<-C> can be followed either by a number or a list
283295of option letters. The letters, their numeric values, and effects
284296are as follows; listing the letters is equal to summing the numbers.
285297
286- I 1 STDIN is assumed to be in UTF-8
287- O 2 STDOUT will be in UTF-8
288- E 4 STDERR will be in UTF-8
298+ I 1 STDIN is assumed to be in utf8
299+ O 2 STDOUT will be in utf8
300+ E 4 STDERR will be in utf8
289301 S 7 I + O + E
290- i 8 UTF-8 is the default PerlIO layer for input streams
291- o 16 UTF-8 is the default PerlIO layer for output streams
302+ i 8 :utf8 is the default PerlIO layer for input streams
303+ o 16 :utf8 is the default PerlIO layer for output streams
292304 D 24 i + o
293305 A 32 the @ARGV elements are expected to be strings encoded
294- in UTF-8
306+ in utf8
295307 L 64 normally the "IOEioA" are unconditional, the L makes
296308 them conditional on the locale environment variables
297309 (the LC_ALL, LC_CTYPE, and LANG, in the order of
@@ -307,22 +319,22 @@ perl.h gives W/128 as PERL_UNICODE_WIDESYSCALLS "/* for Sarathy */"
307319perltodo mentions Unicode in %ENV and filenames. I guess that these will be
308320options e and f (or F).
309321
310- For example, B<-COE> and B<-C6> will both turn on UTF-8 -ness on both
322+ For example, B<-COE> and B<-C6> will both turn on utf8 -ness on both
311323STDOUT and STDERR. Repeating letters is just redundant, not cumulative
312324nor toggling.
313325
314326The C<io> options mean that any subsequent open() (or similar I/O
315327operations) in main program scope will have the C<:utf8> PerlIO layer
316- implicitly applied to them, in other words, UTF-8 is expected from any
317- input stream, and UTF-8 is produced to any output stream. This is just
328+ implicitly applied to them, in other words, utf8 is expected from any
329+ input stream, and utf8 is produced to any output stream. This is just
318330the default set via L<C<${^OPEN}>|perlvar/${^OPEN}>,
319331with explicit layers in open() and with binmode() one can
320332manipulate streams as usual. This has no effect on code run in modules.
321333
322334B<-C> on its own (not followed by any number or option list), or the
323335empty string C<""> for the L</PERL_UNICODE> environment variable, has the
324336same effect as B<-CSDL>. In other words, the standard I/O handles and
325- the default C<open()> layer are UTF-8 -fied I<but> only if the locale
337+ the default C<open()> layer are utf8 -fied I<but> only if the locale
326338environment variables indicate a UTF-8 locale. This behaviour follows
327339the I<implicit> (and problematic) UTF-8 behaviour of Perl 5.8.0.
328340(See L<perl581delta/UTF-8 no longer default under UTF-8 locales>.)
0 commit comments