Skip to content

Commit 10228bc

Browse files
Add a section for locale names.
1 parent b78ca1b commit 10228bc

File tree

1 file changed

+63
-11
lines changed

1 file changed

+63
-11
lines changed

Doc/library/locale.rst

Lines changed: 63 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -34,16 +34,15 @@ The :mod:`locale` module defines the following exception and functions:
3434

3535
If *locale* is given and not ``None``, :func:`setlocale` modifies the locale
3636
setting for the *category*. The available categories are listed in the data
37-
description below. *locale* may be a string, or a pair,
38-
language code and encoding. If it is a pair, it is converted to a locale
39-
name using the locale aliasing engine. An empty string specifies the user's
37+
description below. *locale* may be a :ref:`string <locale_name>`, or a pair,
38+
language code and encoding. An empty string specifies the user's
4039
default settings. If the modification of the locale fails, the exception
4140
:exc:`Error` is raised. If successful, the new locale setting is returned.
4241

43-
The format of the *locale* and the language code strings is platform
44-
dependent, but the forms ``language[_territory][.encoding][@modifier]``
45-
and ``language[_territory]`` respectively are typically accepted on all
46-
platforms.
42+
If *locale* is a pair, it is converted to a locale name using
43+
the locale aliasing engine.
44+
The language code has the same format as a :ref:`locale name <locale_name>`,
45+
but without encoding and ``@``-modifier.
4746
The language code and encoding can be ``None``.
4847

4948
If *locale* is omitted or ``None``, the current setting for *category* is
@@ -351,8 +350,8 @@ The :mod:`locale` module defines the following exception and functions:
351350
``'LANG'``. The GNU gettext search path contains ``'LC_ALL'``,
352351
``'LC_CTYPE'``, ``'LANG'`` and ``'LANGUAGE'``, in that order.
353352

354-
The format of the language code is platform depended, but on Posix
355-
platforms it usually looks like ``language[_territory]``.
353+
The language code has the same format as a :ref:`locale name <locale_name>`,
354+
but without encoding and ``@``-modifier.
356355
The language code and encoding may be ``None`` if their values cannot be
357356
determined.
358357
The "C" locale is represented as ``(None, None)``.
@@ -366,8 +365,8 @@ The :mod:`locale` module defines the following exception and functions:
366365
the language code and encoding. *category* may be one of the :const:`!LC_\*`
367366
values except :const:`LC_ALL`. It defaults to :const:`LC_CTYPE`.
368367

369-
The format of the language code is platform dependent, but on Posix
370-
platforms it usually looks like ``language[_territory]``.
368+
The language code has the same format as a :ref:`locale name <locale_name>`,
369+
but without encoding and ``@``-modifier.
371370
The language code and encoding may be ``None`` if their values cannot be
372371
determined.
373372
The "C" locale is represented as ``(None, None)``.
@@ -625,6 +624,59 @@ whose high bit is set (i.e., non-ASCII bytes) are never converted or considered
625624
part of a character class such as letter or whitespace.
626625

627626

627+
.. _locale_name:
628+
629+
Locale names
630+
------------
631+
632+
The format of the locale name is platform dependent, and the set of supported
633+
locales can depend on the system configuration.
634+
635+
On Posix platforms, it usually has the format
636+
637+
.. productionlist:: locale_name
638+
: language ["_" territory] ["." charset] ["@" modifier]
639+
640+
where *language* is a two- or three-letter language code from `ISO 639`_,
641+
*territory* is a two-letter country or region code from ISO 3166,
642+
*charset* is a locale encoding, and *modifier* is a script name,
643+
a language subtag, a sort order identifier, or other locale modifier
644+
(e.g. "latin", "valencia", "stroke" and "euro").
645+
646+
On Windows, several formats are supported.
647+
A subset of `IETF BCP 47`_ tags:
648+
649+
.. productionlist:: locale_name
650+
: language ["-" script] ["-" territory] ["." charset]
651+
: language ["-" script] "-" territory "-" modifier
652+
653+
where *language* and *territory* has the same meaning as in Posix,
654+
*script* is a four-letter script code from `ISO 15924`_,
655+
and *modifier* is a language subtag, a sort order identifier
656+
or custom modifier (e.g. "valencia", "stroke" or "x-python").
657+
Both hyphen ("``-``") and underscore ("``_``") separators are supported.
658+
Only UTF-8 encoding is allowed for BCP 47 tags.
659+
660+
Windows supports also locale names in the format
661+
662+
.. productionlist:: locale_name
663+
: language ["_" territory] ["." charset]
664+
665+
where *language* and *territory* are long names, such as "English" and
666+
"United States", and *charset* is either a code page number (e.g. "1252")
667+
or UTF-8.
668+
Only the underscore separator is supported in this format.
669+
670+
The "C" locale is supported on all platforms.
671+
672+
.. _ISO 639: https://www.iso.org/iso-639-language-code
673+
.. _IETF BCP 47: https://www.rfc-editor.org/info/bcp47
674+
.. _ISO 15924: https://www.unicode.org/iso15924/
675+
676+
.. https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html#tag_08_02
677+
.. https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings
678+
679+
628680
.. _embedding-locale:
629681

630682
For extension writers and programs that embed Python

0 commit comments

Comments
 (0)