Skip to content

Commit 6263161

Browse files
authored
CLDR-19021 Misc items missing from spec (#5102)
1 parent 3996a17 commit 6263161

File tree

4 files changed

+155
-18
lines changed

4 files changed

+155
-18
lines changed

docs/ldml/tr35-general.md

Lines changed: 79 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -167,36 +167,101 @@ For example, for the locale identifier zh_Hant_CN_co_pinyin_cu_USD, the display
167167
<type type="pinyin" key="collation">Pinyin Sort Order</type>
168168
```
169169

170-
### <a name="locale_display_name_algorithm" href="#locale_display_name_algorithm">Locale Display Name Algorithm</a>
170+
The `language` element has the additional `alt="menu"` option, that allows for related languages to be sorted together.
171171

172-
A locale display name LDN is generated for a locale identifier L in the following way. First, convert the locale identifier to *canonical syntax* per **[Part 1, Canonical Unicode Locale Identifiers](tr35.md#Canonical_Unicode_Locale_Identifiers)**. That will put the subtags in a defined order, and replace aliases by their canonical counterparts. (That defined order is followed in the processing below.)
172+
```xml
173+
<language type="yue" alt="menu">Chinese, Cantonese</language>
174+
<language type="zh" alt="menu">Chinese, Mandarin</language>
175+
```
176+
However, when `localePattern`s are used, the names start to get complicated. There is an additional `menu` attribute, with two values: `core` and `extension`.For example:
173177

174-
Then follow each of the following steps for the subtags in L, building a base name LDN and a list of qualifying strings LQS.
178+
```xml
179+
<language type="ckb">Central Kurdish</language>
180+
<language type="ckb" menu="core">Kurdish</language>
181+
<language type="ckb" menu="extension">Central</language>
182+
183+
<language type="ku">Kurdish</language>
184+
<language type="ku" menu="core">Kurdish</language>
185+
<language type="ku" menu="extension">Kurmanji</language>
186+
187+
<language type="sdh">Southern Kurdish</language>
188+
<language type="sdh" menu="core">Kurdish</language>
189+
<language type="sdh" menu="extension">Southern</language>
190+
```
175191

176-
Where there is a match for a subtag, disregard that subtag from L and add the element value to LDN or LQS as described below. If there is no match for a subtag, use the fallback pattern with the subtag instead.
192+
The core part can be used as the language name, with the extension going into the `localePattern`, such as in the following illustration of part of a menu:
177193

178-
Once LDN and LQS are built, return the following based on the length of LQS.
194+
| Language |
195+
| ---- |
196+
||
197+
| Kashmiri |
198+
| Kurdish (Kurmanji, Latin) |
199+
| Kurdish (Central, Arabic) |
200+
| Kurdish (Southern, Arabic) |
201+
| Kyrgyz |
202+
||
179203

180-
<!-- HTML: no header -->
181-
<table><tbody>
182-
<tr><td>0</td><td>return LDN</td></tr>
183-
<tr><td>1</td><td>use the &lt;localePattern&gt; to compose the result LDN from LDN and LQS[0], and return it.</td></tr>
184-
<tr><td>&gt;1</td><td>use the &lt;localeSeparator&gt; element value to join the elements of the list into LDN2, then use the &lt;localePattern&gt; to compose the result LDN from LDN and LDN2, and return it.</td></tr>
185-
</tbody></table>
204+
### <a name="locale_display_name_algorithm" href="#locale_display_name_algorithm">Locale Display Name Algorithm</a>
205+
206+
A locale display name LDN is generated for a locale identifier L in the following way.
207+
1. Convert the locale identifier to *canonical syntax* per **[Part 1, Canonical Unicode Locale Identifiers](tr35.md#Canonical_Unicode_Locale_Identifiers)**.
208+
That will put the subtags in a defined order, and replace aliases by their canonical counterparts. (That defined order is followed in the processing below.)
209+
2. Build a base name LDN from the language, possibly also some other subtags, taking into account the parameters listed below.
210+
* The language name uses the longest match, dropping all fields that match. For example:
211+
* With L = "nl_Cyrl_BE", if there is a `<language type="nl_BE">`Flemish`</language>`, the language name is set to "Flemish", and the "BE" is ignored in step 4.
212+
* With L = "ca_fonipa_valencia", if there is a `<language type="ca_valencia">`Valencian`</language>`, the language name is set to "Valencian", and the subtag "valencia" is ignored in step 4.
213+
4. Build a list of qualifying strings LQS.
214+
1. For each remaining subtag language identifier (script, region, or variant):
215+
1. Where there is a match for a subtag, disregard that subtag from L and add the name of the subtag to LDN or LQS as described below.
216+
2. If there is no match for a subtag, use the fallback pattern with the subtag instead.
217+
2. For any remaining `-u` or `t` key-value pairs, there are two options (based on the parameters; the first is the default)
218+
1. `WholeKeyValue`: Add the formatted key-value, OR
219+
2. `SeparateKeyValue` Add a string created from the formatted key and the formatted value using `scope="core"`
220+
5. Once LDN and LQS are built, return the following based on the length of LQS.
221+
222+
| Length | Processing |
223+
| :---- | :---- |
224+
| 0 | return LDN |
225+
| 1 | use the \<localePattern\> to compose the result LDN from LDN and LQS\[0\], and return it. |
226+
| \>1 | use the \<localeSeparator\> element value to join the elements of the list into LDN2, then use the \<localePattern\> to compose the result LDN from LDN and LDN2, and return it. |
186227

187-
The processing can be controlled via the following parameters.
228+
The processing can be controlled via the following parameters (the names of the parameters are only illustrative):
188229

189230
* `CombineLanguage`: boolean
190231
* Example: the `CombineLanguage = true`, picking the bold value below.
191-
* `<language type="nl">Dutch</language>`
232+
* `<language type="nl">`Dutch`</language>`
192233
* **`<language type="nl_BE">Flemish</language>`**
193234
* `PreferAlt`: map from element to preferred alt value, picking the bold value below.
194235
* Example: the `PreferAlt` contains `{"language"="short"}`:
195-
* `<language type="az">Azerbaijani</language>`
236+
* `<language type="az">`Azerbaijani`</language>`
196237
* **`<language type="az" alt="short">Azeri</language>`**
238+
* `CoreAndExtension`: if there is a `menu="core"` and a `menu="extension"` value:
239+
1. Use the `menu=core` variant for the name in question.
240+
2. Add the `menu=extension` variant to the head of the LQS before it is formatted.
241+
* `WholeKeyValue`: for `-u` or `t` key-value pairs
242+
1. Format with combined key-value, if available; otherwise format with `SeparateKeyValue`
243+
* For example, using `…_ca_buddhist`
244+
* `<type key="calendar" type="buddhist">`Buddhist Calendar`</type>`
245+
* ⇒ "Buddhist Calendar"
246+
* `SeparateKeyValue`: for `-u` or `t` key-value pairs
247+
1. Format with separate key and value using `scope="core"`, if available; otherwise format with `WholeKeyValue`
248+
* For example, using `…_ca_buddhist`
249+
* `<key type="calendar">`Calendar`</key>` +
250+
* `<type key="calendar" type="buddhist" scope="core">`Buddhist`</type>` +
251+
* `<localeKeyTypePattern>`{0}: {1}`</localeKeyTypePattern>`
252+
* ⇒ "Calendar: Buddhist"
197253

198254
In addition, the input locale display name could be minimized (see [Part 1: Likely Subtags](tr35.md#Likely_Subtags)) before generating the LDN. Selective minimization is often the best choice. For example, in a menu list it is often clearer to show the region if there are any regional variants. Thus the user would just see \["Spanish"\] for es if the latter is the only supported Spanish, but where es-MX is also listed, then see \["Spanish (Spain)", "Spanish (Mexico)"\].
199255

256+
The key-type `scope="core"` is also useful in menus. For example, if a menu or pull-down is offering different choices of calendars, it is cleaner to use the key value for the name of the menu (eg, "Calendar"), and use the `scope="core"` values for the choices. Thus:
257+
258+
| Calendar |
259+
| ---- |
260+
| Buddhist |
261+
| Chinese |
262+
| Gregorian |
263+
| Hijri |
264+
200265
* * *
201266

202267
**Processing types of locale identifier subtags**

docs/ldml/tr35-modifications.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,9 @@ The LDML specification is divided into the following parts:
5757

5858
**Changes in LDML Version 48 (Differences from Version 47)**
5959

60-
### Locale Identifiers
60+
### Locale Identifiers and Names
61+
* [Display Name Elements](tr35-general.md#display-name-elements) Described the usage of the `language` element `menu` values `core` and `extension`, and `alt="menu"`.
62+
Also revamped the description of how to construct names for locale IDs, for clarity.
6163
* [Special Script Codes](tr35.md#special-script-codes) Added the `Hntl` compound script. (This is also reflected in the `<scriptData>` elements in supplementalData.xml.)
6264
* [Likely Subtags](tr35.md#likely-subtags) Changed the Canonicalize step to point to the section on canonicalization.
6365
* [Unicode Locale Identifier](tr35.md#unicode-locale-identifier) Changed the `attribute` component in the EBNF to be `uattribute` for consistency with `ufield`, etc.
@@ -84,8 +86,9 @@ There is also now a mechanism for finding the region code from short timezone id
8486
* [Plural rules syntax](tr35-numbers.md#plural-rules-syntax) Added substantial clarifications and new examples.
8587
The order of execution is also clearly specified.
8688
* [Compact Number Formats](tr35-numbers.md#compact-number-formats) Specified the mechanism for formatting compact numbers more precisely.
87-
* [Rule-Based Number Formatting]() The rules are also now represented by a new XML structure with a “flat” format,
88-
which is easier for clients to handle (the old format will be retained for one more release).
89+
* [Rule-Based Number Formatting](tr35-numbers.md#) Added a full specification.
90+
The rules have been converted to a “flat” format, which is easier for clients to handle (the old format will be retained for one more release).
91+
* [Rational Numbers](tr35-numbers.md#rational-numbers) Added support for formatting fractions like 5½.
8992

9093
### Units of Measurement
9194
* [Unit Syntax](tr35-general.md#unit-syntax) Simplified the EBNF `product_unit` and added an additional well-formedness constraint for mixed units.

docs/ldml/tr35-numbers.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -852,6 +852,62 @@ To specify a rounding increment in a pattern, include the increment in the patte
852852

853853
Single quotes (**'**) enclose bits of the pattern that should be treated literally. Inside a quoted string, two single quotes ('') are replaced with a single one ('). For example: `'X '`#`' Q '` -> **X 1939 Q** (Literal strings `shaded`.)
854854

855+
## Rational Numbers
856+
857+
```xml
858+
<!ELEMENT rationalFormats ( alias | ( rationalPattern*, integerAndRationalPattern*, rationalUsage*, special* ) ) >
859+
<!ATTLIST rationalFormats numberSystem CDATA #REQUIRED >
860+
861+
<!ELEMENT rationalPattern ( #PCDATA ) >
862+
<!ATTLIST rationalPattern alt NMTOKENS #IMPLIED >
863+
<!ATTLIST rationalPattern draft (approved | contributed | provisional | unconfirmed) #IMPLIED >
864+
865+
<!ELEMENT integerAndRationalPattern ( #PCDATA ) >
866+
<!ATTLIST integerAndRationalPattern alt NMTOKENS #IMPLIED >
867+
<!ATTLIST integerAndRationalPattern draft (approved | contributed | provisional | unconfirmed) #IMPLIED >
868+
869+
<!ELEMENT rationalUsage ( #PCDATA ) >
870+
<!ATTLIST rationalUsage alt NMTOKENS #IMPLIED >
871+
<!ATTLIST rationalUsage draft (approved | contributed | provisional | unconfirmed) #IMPLIED >
872+
873+
```
874+
For example:
875+
876+
```xml
877+
<rationalFormats numberSystem="latn">
878+
<rationalPattern>{0}⁄{1}</rationalPattern>
879+
<integerAndRationalPattern>{0} {1}</integerAndRationalPattern>
880+
<integerAndRationalPattern alt="superSub">{0}⁠{1}</integerAndRationalPattern>
881+
<rationalUsage>sometimes</rationalUsage>
882+
</rationalFormats>
883+
```
884+
885+
The rational number patterns specify the formatting of rational fractions in different languages.
886+
Rational fractions contain a numerator and denominator, such as ½, and may also have an integer, such a 5½.
887+
There are two different “combination patterns”, needed because sometimes fonts and rendering systems don’t properly support fractions (such as displaying 5 1/2),
888+
and need two patterns: one with a space and one without.
889+
The choice of which to use depends on the rendering system and font support available, as described below.
890+
891+
Here are the the English values for example, and a short description of the purpose of each field:
892+
893+
| Code | Default Value | Description |
894+
| :---- | :---: | :---- |
895+
| `rationalPattern` | {0}⁄{1} | The format for a rational fraction with arbitrary numerator and denominator; the English pattern uses the Unicode character ‘⁄’ U+2044 FRACTION SLASH which causes composition of fractions such as 22⁄7, when supported properly by rendering systems and fonts. |
896+
| `integerAndRationalPattern` | {0} {1} | The format for combining an integer with a rational fraction that is composed using the `Rational` pattern; the English pattern uses U+202F NARROW NO-BREAK SPACE (NNBSP) to produce a _non-breaking thin space_. |
897+
| `integerAndRationalPattern-superSub` | {0}⁠{1} | The format for combining an integer with a rational fraction that is composed using the `Rational` pattern; the English pattern uses U+2060 WORD JOINER, a _zero-width no-break space_. |
898+
| `rationalUsage` | sometimes | An indication of the extent to which rational fractions are used in the locale; either `never` or `sometimes`. |
899+
900+
The `integerAndRationalPattern-superSub` is used for an integer with fraction. However, some fonts and rendering systems don’t properly handle the fraction slash, and the user would see something like **51/2** (fifty-one halves) when **** is desired\!
901+
Therefore, the `integerAndRationalPattern` is available also, which forces a visible space between the integer and fraction (**5 ½**).
902+
(In some languages, there there may always be a space: in that case the patterns for `integerAndRationalPattern` and `integerAndRationalPattern-superSub` will be identical. )
903+
904+
In environments where the rendering system and font can't be trusted to handle U+2044 FRACTION SLASH properly, there are a few techniques available to have a better rendering than 22/7:
905+
- Use markup such as HTML `<super>` and `<sub>` for the numerator and denominator.
906+
- Where markup is not available and the numbering system is `latn` (ASCII digits 0..9), there are two other choices:
907+
- If the fraction happens to match the precomposed fractions available in Unicode, those can be used (eg, ½ ⅔ ⅗ ⅐ ⅝ ¾ …)
908+
- The Latin superscript (¹ ² ³ …) and subscript digits (₁ ₂ ₃ …) digits can be used with the U+2044 FRACTION SLASH, such as ²²⁄₇.
909+
- In both cases, some fonts don't have consistent support for these characters, and so the sizes and positioning may vary.
910+
855911
## <a name="Currencies" href="#Currencies">Currencies</a>
856912

857913
```dtd

docs/site/downloads/cldr-48.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,7 @@ See the [Modifications section](https://www.unicode.org/reports/tr35/dev/tr35-mo
9191
#### General
9292
- Languages that reached Basic in the last release have their names translated at Modern Coverage in this release.
9393
- Compound language names now have "core" and "extension" variants for more uniform formats in menus and lists.
94+
The description of how to format names for locale IDs has been extended and clarified.
9495
- For example, that allows the Kurdish variants to have a uniform format where more than Kurmanji is displayed.
9596
- Kashmiri
9697
- Kurdish (Kurmanji, Latin)
@@ -286,15 +287,27 @@ The following files are new in the release:
286287

287288
- TBD
288289

290+
----
291+
289292
## Migration
290293

291294
- Number patterns that did not have a specific numberSystem (such as `latn` or `arab`) had been deprecated for many releases, and were finally removed.
292295
- Additionally, language and territory data in `languageData` and `territoryInfo` data received significant updates to improve accuracy and maintainability [CLDR-18087]
293296
- The likely language for Belarus changed to Russian [CLDR-14479]
297+
- The unit identifiers for the following changed for consistency.
298+
As with all such changes, aliases are available to permit parsing and formatting to work across versions.
299+
- `permillion` changed to `part-per-1e6`; English values remain “parts per million”, “{0} part per million”, etc.
300+
- `portion-per-1e9` changed to `part-per-1e9`; English values remain “parts per billion”, “{0} part per billion”, etc.
301+
- `part` used for constructing arbitrary concentrations such as “parts per 100,000”; English values “parts”, “{0} part”, etc.
302+
- English and/or root names of many exemplar cities and some metazones changed.
303+
This was typically to move towards the official spelling in the country in question, such as retaining accents, or to add landscape terms such as “Island”.
304+
For example: El Aaiun → El Aaiún; Casey → Casey Station; Hovd Time → Khovd Time.
305+
- A few additional availableFormat and interval format patterns have been added, such as GyMEd and Hv, to fill some gaps.
306+
- The metazone for Hawaii has changed.
294307
- **TBD Additional items plus future guidance will be added before the spec-beta.**
295308

296-
297309
### V49 advance warnings
310+
298311
The following changes are planned for CLDR 49. Please plan accordingly to avoid disruption.
299312
- [CLDR-18303][] H24 will be deprecated. If it is encountered, it will have H23 behavior. There is no known intentional usage of H24. If you have a current need for H24 instead of H23, please comment on [CLDR-18303][].
300313
- The default week numbering changes to ISO instead being based on the calendar week starting in CLDR 48 [CLDR-18275]. The calendar week will be more clearly targeted at matching usage in displayed month calendars.

0 commit comments

Comments
 (0)