@@ -279,19 +279,30 @@ X<-C>
279279
280280The B<-C> flag controls some of the Perl Unicode features.
281281
282+ B<CAUTION:> As with the L<C<:utf8> PerlIO layer|PerlIO/:utf8>, none of
283+ the features enabled by this flag or the equivalent C<PERL_UNICODE>
284+ environment variable validate that input is valid UTF-8, nor guarantee
285+ to produce valid UTF-8. Instead it will assume input is provided in
286+ Perl's internal upgraded byte encoding, and provide output in this
287+ encoding, which is a superset of UTF-8 that can encode any character
288+ allowed in Perl strings. This can result in broken Perl strings or
289+ output bytes which are not valid UTF-8. This internal encoding will be
290+ referred to as C<utf8> below to differentiate it from a strict UTF-8
291+ encoding format.
292+
282293As of 5.8.1, the B<-C> can be followed either by a number or a list
283294of option letters. The letters, their numeric values, and effects
284295are as follows; listing the letters is equal to summing the numbers.
285296
286- I 1 STDIN is assumed to be in UTF-8
287- O 2 STDOUT will be in UTF-8
288- E 4 STDERR will be in UTF-8
297+ I 1 STDIN is assumed to be in utf8
298+ O 2 STDOUT will be in utf8
299+ E 4 STDERR will be in utf8
289300 S 7 I + O + E
290- i 8 UTF-8 is the default PerlIO layer for input streams
291- o 16 UTF-8 is the default PerlIO layer for output streams
301+ i 8 :utf8 is the default PerlIO layer for input streams
302+ o 16 :utf8 is the default PerlIO layer for output streams
292303 D 24 i + o
293304 A 32 the @ARGV elements are expected to be strings encoded
294- in UTF-8
305+ in utf8
295306 L 64 normally the "IOEioA" are unconditional, the L makes
296307 them conditional on the locale environment variables
297308 (the LC_ALL, LC_CTYPE, and LANG, in the order of
@@ -307,22 +318,22 @@ perl.h gives W/128 as PERL_UNICODE_WIDESYSCALLS "/* for Sarathy */"
307318perltodo mentions Unicode in %ENV and filenames. I guess that these will be
308319options e and f (or F).
309320
310- For example, B<-COE> and B<-C6> will both turn on UTF-8 -ness on both
321+ For example, B<-COE> and B<-C6> will both turn on utf8 -ness on both
311322STDOUT and STDERR. Repeating letters is just redundant, not cumulative
312323nor toggling.
313324
314325The C<io> options mean that any subsequent open() (or similar I/O
315326operations) in main program scope will have the C<:utf8> PerlIO layer
316- implicitly applied to them, in other words, UTF-8 is expected from any
317- input stream, and UTF-8 is produced to any output stream. This is just
327+ implicitly applied to them, in other words, utf8 is expected from any
328+ input stream, and utf8 is produced to any output stream. This is just
318329the default set via L<C<${^OPEN}>|perlvar/${^OPEN}>,
319330with explicit layers in open() and with binmode() one can
320331manipulate streams as usual. This has no effect on code run in modules.
321332
322333B<-C> on its own (not followed by any number or option list), or the
323334empty string C<""> for the L</PERL_UNICODE> environment variable, has the
324335same effect as B<-CSDL>. In other words, the standard I/O handles and
325- the default C<open()> layer are UTF-8 -fied I<but> only if the locale
336+ the default C<open()> layer are utf8 -fied I<but> only if the locale
326337environment variables indicate a UTF-8 locale. This behaviour follows
327338the I<implicit> (and problematic) UTF-8 behaviour of Perl 5.8.0.
328339(See L<perl581delta/UTF-8 no longer default under UTF-8 locales>.)
0 commit comments