CLDR-19021 Misc items missing from spec (#5102)

macchiati · web-flow · commit 62631618025e · 2025-09-30T17:10:46.000-07:00
diff --git a/docs/ldml/tr35-general.md b/docs/ldml/tr35-general.md
@@ -167,36 +167,101 @@ For example, for the locale identifier zh_Hant_CN_co_pinyin_cu_USD, the display
 <type type="pinyin" key="collation">Pinyin Sort Order</type>
 ```
 
-### <a name="locale_display_name_algorithm" href="#locale_display_name_algorithm">Locale Display Name Algorithm</a>
+The `language` element has the additional `alt="menu"` option, that allows for related languages to be sorted together.
 
-A locale display name LDN is generated for a locale identifier L in the following way. First, convert the locale identifier to *canonical syntax* per **[Part 1, Canonical Unicode Locale Identifiers](tr35.md#Canonical_Unicode_Locale_Identifiers)**. That will put the subtags in a defined order, and replace aliases by their canonical counterparts. (That defined order is followed in the processing below.)
+```xml
+<language type="yue" alt="menu">Chinese, Cantonese</language> 
+<language type="zh" alt="menu">Chinese, Mandarin</language>
+```
+However, when `localePattern`s are used, the names start to get complicated. There is an additional `menu` attribute, with two values: `core` and `extension`.For example:
 
-Then follow each of the following steps for the subtags in L, building a base name LDN and a list of qualifying strings LQS.
+```xml
+<language type="ckb">Central Kurdish</language>
+<language type="ckb" menu="core">Kurdish</language>
+<language type="ckb" menu="extension">Central</language>
+…
+<language type="ku">Kurdish</language>
+<language type="ku" menu="core">Kurdish</language>
+<language type="ku" menu="extension">Kurmanji</language>
+…
+<language type="sdh">Southern Kurdish</language>
+<language type="sdh" menu="core">Kurdish</language>
+<language type="sdh" menu="extension">Southern</language>
+```
 
-Where there is a match for a subtag, disregard that subtag from L and add the element value to LDN or LQS as described below. If there is no match for a subtag, use the fallback pattern with the subtag instead.
+The core part can be used as the language name, with the extension going into the `localePattern`, such as in the following illustration of part of a menu:
 
-Once LDN and LQS are built, return the following based on the length of LQS.
+| Language |
+| ---- |
+| … |
+| Kashmiri |
+| Kurdish (Kurmanji, Latin) |
+| Kurdish (Central, Arabic) |
+| Kurdish (Southern, Arabic) |
+| Kyrgyz |
+| … |
 
-<!-- HTML: no header -->
-<table><tbody>
-<tr><td>0</td><td>return LDN</td></tr>
-<tr><td>1</td><td>use the &lt;localePattern&gt; to compose the result LDN from LDN and LQS[0], and return it.</td></tr>
-<tr><td>&gt;1</td><td>use the &lt;localeSeparator&gt; element value to join the elements of the list into LDN2, then use the &lt;localePattern&gt; to compose the result LDN from LDN and LDN2, and return it.</td></tr>
-</tbody></table>
+### <a name="locale_display_name_algorithm" href="#locale_display_name_algorithm">Locale Display Name Algorithm</a>
+
+A locale display name LDN is generated for a locale identifier L in the following way. 
+1. Convert the locale identifier to *canonical syntax* per **[Part 1, Canonical Unicode Locale Identifiers](tr35.md#Canonical_Unicode_Locale_Identifiers)**.
+That will put the subtags in a defined order, and replace aliases by their canonical counterparts. (That defined order is followed in the processing below.)
+2. Build a base name LDN from the language, possibly also some other subtags, taking into account the parameters listed below.
+    * The language name uses the longest match, dropping all fields that match. For example:
+        * With L = "nl_Cyrl_BE", if there is a `<language type="nl_BE">`Flemish`</language>`, the language name is set to "Flemish", and the "BE" is ignored in step 4.
+        * With L = "ca_fonipa_valencia", if there is a `<language type="ca_valencia">`Valencian`</language>`, the language name is set to "Valencian", and the subtag "valencia" is ignored in step 4.
+4. Build a list of qualifying strings LQS.
+    1. For each remaining subtag language identifier (script, region, or variant):  
+        1. Where there is a match for a subtag, disregard that subtag from L and add the name of the subtag to LDN or LQS as described below.
+        2. If there is no match for a subtag, use the fallback pattern with the subtag instead.
+    2. For any remaining `-u` or `t` key-value pairs, there are two options (based on the parameters; the first is the default)
+        1. `WholeKeyValue`: Add the formatted key-value, OR
+        2. `SeparateKeyValue` Add a string created from the formatted key and the formatted value using `scope="core"`
+5. Once LDN and LQS are built, return the following based on the length of LQS.
+
+| Length | Processing |
+| :---- | :---- |
+| 0 | return LDN |
+| 1 | use the \<localePattern\> to compose the result LDN from LDN and LQS\[0\], and return it. |
+| \>1 | use the \<localeSeparator\> element value to join the elements of the list into LDN2, then use the \<localePattern\> to compose the result LDN from LDN and LDN2, and return it. |
 
-The processing can be controlled via the following parameters.
+The processing can be controlled via the following parameters (the names of the parameters are only illustrative):
 
 *   `CombineLanguage`: boolean
     *   Example: the `CombineLanguage = true`, picking the bold value below.
-    *   `<language type="nl">Dutch</language>`
+    *   `<language type="nl">`Dutch`</language>`
     *   **`<language type="nl_BE">Flemish</language>`**
 *   `PreferAlt`: map from element to preferred alt value, picking the bold value below.
     *   Example: the `PreferAlt` contains `{"language"="short"}`:
-    *   `<language type="az">Azerbaijani</language>`
+    *   `<language type="az">`Azerbaijani`</language>`
     *   **`<language type="az" alt="short">Azeri</language>`**
+*  `CoreAndExtension`: if there is a `menu="core"` and a `menu="extension"` value:
+    1.  Use the `menu=core` variant for the name in question.
+    2.  Add the `menu=extension` variant to the head of the LQS before it is formatted.
+*  `WholeKeyValue`: for `-u` or `t` key-value pairs
+    1.  Format with combined key-value, if available; otherwise format with `SeparateKeyValue`
+        *  For example, using `…_ca_buddhist`
+        *  `<type key="calendar" type="buddhist">`Buddhist Calendar`</type>`
+		* ⇒ "Buddhist Calendar"
+*  `SeparateKeyValue`: for `-u` or `t` key-value pairs
+    1.  Format with separate key and value using `scope="core"`, if available; otherwise format with `WholeKeyValue`
+        *  For example, using `…_ca_buddhist`
+         * `<key type="calendar">`Calendar`</key>` +
+         * `<type key="calendar" type="buddhist" scope="core">`Buddhist`</type>` +
+         * `<localeKeyTypePattern>`{0}: {1}`</localeKeyTypePattern>`
+		 * ⇒ "Calendar: Buddhist"
 
 In addition, the input locale display name could be minimized (see [Part 1: Likely Subtags](tr35.md#Likely_Subtags)) before generating the LDN. Selective minimization is often the best choice. For example, in a menu list it is often clearer to show the region if there are any regional variants. Thus the user would just see \["Spanish"\] for es if the latter is the only supported Spanish, but where es-MX is also listed, then see \["Spanish (Spain)", "Spanish (Mexico)"\].
 
+The key-type `scope="core"` is also useful in menus. For example, if a menu or pull-down is offering different choices of calendars, it is cleaner to use the key value for the name of the menu (eg, "Calendar"), and use the `scope="core"` values for the choices. Thus:
+
+| Calendar |
+| ---- |
+| Buddhist |
+| Chinese |
+| Gregorian |
+| Hijri |
+
 * * *
 
 **Processing types of locale identifier subtags**
diff --git a/docs/ldml/tr35-modifications.md b/docs/ldml/tr35-modifications.md
@@ -57,7 +57,9 @@ The LDML specification is divided into the following parts:
 
 **Changes in LDML Version 48 (Differences from Version 47)**
 
-### Locale Identifiers
+### Locale Identifiers and Names
+* [Display Name Elements](tr35-general.md#display-name-elements) Described the usage of the `language` element `menu` values `core` and `extension`, and `alt="menu"`.
+Also revamped the description of how to construct names for locale IDs, for clarity.
 * [Special Script Codes](tr35.md#special-script-codes) Added the `Hntl` compound script. (This is also reflected in the `<scriptData>` elements in supplementalData.xml.)
 * [Likely Subtags](tr35.md#likely-subtags) Changed the Canonicalize step to point to the section on canonicalization.
 * [Unicode Locale Identifier](tr35.md#unicode-locale-identifier) Changed the `attribute` component in the EBNF to be `uattribute` for consistency with `ufield`, etc.
@@ -84,8 +86,9 @@ There is also now a mechanism for finding the region code from short timezone id
 * [Plural rules syntax](tr35-numbers.md#plural-rules-syntax) Added substantial clarifications and new examples.
 The order of execution is also clearly specified.
 * [Compact Number Formats](tr35-numbers.md#compact-number-formats) Specified the mechanism for formatting compact numbers more precisely.
-* [Rule-Based Number Formatting]() The rules are also now represented by a new XML structure with a “flat” format,
-which is easier for clients to handle (the old format will be retained for one more release).
+* [Rule-Based Number Formatting](tr35-numbers.md#) Added a full specification.
+The rules have been converted to a “flat” format, which is easier for clients to handle (the old format will be retained for one more release).
+* [Rational Numbers](tr35-numbers.md#rational-numbers) Added support for formatting fractions like 5½.
 
 ### Units of Measurement
 * [Unit Syntax](tr35-general.md#unit-syntax) Simplified the EBNF `product_unit` and added an additional well-formedness constraint for mixed units.
diff --git a/docs/ldml/tr35-numbers.md b/docs/ldml/tr35-numbers.md
@@ -852,6 +852,62 @@ To specify a rounding increment in a pattern, include the increment in the patte
 
 Single quotes (**'**) enclose bits of the pattern that should be treated literally. Inside a quoted string, two single quotes ('') are replaced with a single one ('). For example: `'X '`#`' Q '` -> **X 1939 Q** (Literal strings `shaded`.)
 
+## Rational Numbers
+
+```xml
+<!ELEMENT rationalFormats ( alias | ( rationalPattern*, integerAndRationalPattern*, rationalUsage*, special* ) ) >
+<!ATTLIST rationalFormats numberSystem CDATA #REQUIRED >
+
+<!ELEMENT rationalPattern ( #PCDATA ) >
+<!ATTLIST rationalPattern alt NMTOKENS #IMPLIED >
+<!ATTLIST rationalPattern draft (approved | contributed | provisional | unconfirmed) #IMPLIED >
+
+<!ELEMENT integerAndRationalPattern ( #PCDATA ) >
+<!ATTLIST integerAndRationalPattern alt NMTOKENS #IMPLIED >
+<!ATTLIST integerAndRationalPattern draft (approved | contributed | provisional | unconfirmed) #IMPLIED >
+
+<!ELEMENT rationalUsage ( #PCDATA ) >
+<!ATTLIST rationalUsage alt NMTOKENS #IMPLIED >
+<!ATTLIST rationalUsage draft (approved | contributed | provisional | unconfirmed) #IMPLIED >
+
+```
+For example:
+
+```xml
+<rationalFormats numberSystem="latn">
+			<rationalPattern>{0}⁄{1}</rationalPattern>
+			<integerAndRationalPattern>{0} {1}</integerAndRationalPattern>
+			<integerAndRationalPattern alt="superSub">{0}⁠{1}</integerAndRationalPattern>
+			<rationalUsage>sometimes</rationalUsage>
+</rationalFormats>
+```
+
+The rational number patterns specify the formatting of rational fractions in different languages.
+Rational fractions contain a numerator and denominator, such as ½, and may also have an integer, such a 5½.
+There are two different “combination patterns”, needed because sometimes fonts and rendering systems don’t properly support fractions (such as displaying 5 1/2),
+and need two patterns: one with a space and one without.
+The choice of which to use depends on the rendering system and font support available, as described below.
+
+Here are the the English values for example, and a short description of the purpose of each field:
+
+| Code | Default Value | Description |
+| :---- | :---: | :---- |
+| `rationalPattern` | {0}⁄{1} | The format for a rational fraction with arbitrary numerator and denominator; the English pattern uses the Unicode character ‘⁄’ U+2044 FRACTION SLASH which causes composition of fractions such as 22⁄7, when supported properly by rendering systems and fonts. |
+| `integerAndRationalPattern` | {0} {1} | The format for combining an integer with a rational fraction that is composed using the `Rational` pattern; the English pattern uses U+202F NARROW NO-BREAK SPACE (NNBSP) to produce a _non-breaking thin space_. |
+| `integerAndRationalPattern-superSub` | {0}⁠{1} | The format for combining an integer with a rational fraction that is composed using the `Rational` pattern; the English pattern uses U+2060 WORD JOINER, a _zero-width no-break space_. |
+| `rationalUsage` | sometimes | An indication of the extent to which rational fractions are used in the locale; either `never` or `sometimes`. |
+
+The `integerAndRationalPattern-superSub` is used for an integer with fraction. However, some fonts and rendering systems don’t properly handle the fraction slash, and the user would see something like **51/2** (fifty-one halves) when **5½** is desired\!
+Therefore, the `integerAndRationalPattern` is available also, which forces a visible space between the integer and fraction (**5 ½**).
+(In some languages, there there may always be a space: in that case the patterns for `integerAndRationalPattern` and `integerAndRationalPattern-superSub` will be identical. )
+
+In environments where the rendering system and font can't be trusted to handle U+2044 FRACTION SLASH properly, there are a few techniques available to have a better rendering than 22/7:
+- Use markup such as HTML `<super>` and `<sub>` for the numerator and denominator.
+- Where markup is not available and the numbering system is `latn` (ASCII digits 0..9), there are two other choices:
+    - If the fraction happens to match the precomposed fractions available in Unicode, those can be used (eg, ½ ⅔ ⅗ ⅐ ⅝ ¾ …)
+    - The Latin superscript (¹ ² ³ …) and subscript digits (₁ ₂ ₃ …) digits can be used with the U+2044 FRACTION SLASH, such as ²²⁄₇.
+    - In both cases, some fonts don't have consistent support for these characters, and so the sizes and positioning may vary.
+
 ## <a name="Currencies" href="#Currencies">Currencies</a>
 
 ```dtd
diff --git a/docs/site/downloads/cldr-48.md b/docs/site/downloads/cldr-48.md
@@ -91,6 +91,7 @@ See the [Modifications section](https://www.unicode.org/reports/tr35/dev/tr35-mo
 #### General
 - Languages that reached Basic in the last release have their names translated at Modern Coverage in this release.
 - Compound language names now have "core" and "extension" variants for more uniform formats in menus and lists.
+The description of how to format names for locale IDs has been extended and clarified.
    - For example, that allows the Kurdish variants to have a uniform format where more than Kurmanji is displayed.
        - Kashmiri
        - Kurdish (Kurmanji, Latin)
@@ -286,15 +287,27 @@ The following files are new in the release:
 
 - TBD
 
+----
+
 ## Migration
 
 - Number patterns that did not have a specific numberSystem (such as `latn` or `arab`) had been deprecated for many releases, and were finally removed.
 - Additionally, language and territory data in `languageData` and `territoryInfo` data received significant updates to improve accuracy and maintainability [CLDR-18087]
 - The likely language for Belarus changed to Russian [CLDR-14479]
+- The unit identifiers for the following changed for consistency.
+As with all such changes, aliases are available to permit parsing and formatting to work across versions.
+    - `permillion` changed to `part-per-1e6`; English values remain “parts per million”, “{0} part per million”, etc.
+    - `portion-per-1e9` changed to `part-per-1e9`; English values remain “parts per billion”, “{0} part per billion”, etc.
+    - `part` used for constructing arbitrary concentrations such as “parts per 100,000”; English values “parts”, “{0} part”, etc.
+- English and/or root names of many exemplar cities and some metazones changed.
+This was typically to move towards the official spelling in the country in question, such as retaining accents, or to add landscape terms such as “Island”.
+For example: El Aaiun → El Aaiún; Casey → Casey Station; Hovd Time → Khovd Time.
+- A few additional availableFormat and interval format patterns have been added, such as GyMEd and Hv, to fill some gaps.
+- The metazone for Hawaii has changed.
 - **TBD Additional items plus future guidance will be added before the spec-beta.**
 
-
 ### V49 advance warnings
+
 The following changes are planned for CLDR 49. Please plan accordingly to avoid disruption.
 - [CLDR-18303][] H24 will be deprecated. If it is encountered, it will have H23 behavior. There is no known intentional usage of H24. If you have a current need for H24 instead of H23, please comment on [CLDR-18303][].
 - The default week numbering changes to ISO instead being based on the calendar week starting in CLDR 48 [CLDR-18275]. The calendar week will be more clearly targeted at matching usage in displayed month calendars.