Skip to content

Commit 0c58aff

Browse files
authored
CLDR-17223 Add nestedBracketReplacement for use in display names (#5240)
1 parent f4eb204 commit 0c58aff

File tree

9 files changed

+52
-7
lines changed

9 files changed

+52
-7
lines changed

common/dtd/ldml.dtd

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -454,7 +454,7 @@ CLDR data files are interpreted according to the LDML specification (http://unic
454454

455455
<!-- ######################################################### -->
456456

457-
<!ELEMENT characters ( alias | ( exemplarCharacters*, ellipsis*, moreInformation*, stopwords*, indexLabels*, mapping*, parseLenients*, special* ) ) >
457+
<!ELEMENT characters ( alias | ( exemplarCharacters*, ellipsis*, nestedBracketReplacement*, moreInformation*, stopwords*, indexLabels*, mapping*, parseLenients*, special* ) ) >
458458
<!ATTLIST characters draft (approved | contributed | provisional | unconfirmed | true | false) #IMPLIED >
459459
<!--@METADATA-->
460460
<!--@DEPRECATED-->
@@ -493,6 +493,16 @@ CLDR data files are interpreted according to the LDML specification (http://unic
493493
<!ATTLIST ellipsis references CDATA #IMPLIED >
494494
<!--@METADATA-->
495495

496+
<!ELEMENT nestedBracketReplacement ( #PCDATA ) >
497+
<!ATTLIST nestedBracketReplacement bracket CDATA #REQUIRED >
498+
<!--@MATCH:any-->
499+
<!ATTLIST nestedBracketReplacement alt NMTOKENS #IMPLIED >
500+
<!--@MATCH:literal/variant-->
501+
<!ATTLIST nestedBracketReplacement draft (approved | contributed | provisional | unconfirmed) #IMPLIED >
502+
<!--@METADATA-->
503+
<!ATTLIST nestedBracketReplacement references CDATA #IMPLIED >
504+
<!--@METADATA-->
505+
496506
<!ELEMENT moreInformation ( #PCDATA ) >
497507
<!ATTLIST moreInformation alt NMTOKENS #IMPLIED >
498508
<!--@MATCH:literal/variant-->

common/main/root.xml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,10 @@ Warnings: All cp values have U+FE0F characters removed. See /annotationsDerived/
5353
<ellipsis type="word-final">{0} …</ellipsis>
5454
<ellipsis type="word-initial">… {0}</ellipsis>
5555
<ellipsis type="word-medial">{0} … {1}</ellipsis>
56+
<nestedBracketReplacement bracket="(">[</nestedBracketReplacement>
57+
<nestedBracketReplacement bracket=")">]</nestedBracketReplacement>
58+
<nestedBracketReplacement bracket="">[</nestedBracketReplacement>
59+
<nestedBracketReplacement bracket="">]</nestedBracketReplacement>
5660
<moreInformation>?</moreInformation>
5761
<parseLenients scope="date" level="lenient">
5862
<parseLenient sample="-">[\- ‑ . /]</parseLenient>

common/supplemental/coverageLevels.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1027,6 +1027,7 @@ For terms of use, see http://www.unicode.org/copyright.html
10271027

10281028
<!-- Moved up as part of change to moderate -->
10291029
<coverageLevel value="moderate" match="characters/ellipsis[@type='%ellipsisTypes']"/>
1030+
<coverageLevel value="moderate" match="characters/nestedBracketReplacement[@bracket='%A']"/>
10301031
<coverageLevel value="moderate" match="characters/moreInformation"/>
10311032
<coverageLevel value="moderate" match="characters/parseLenients[@scope='%anyAttribute'][@level='%anyAttribute']/parseLenient[@sample='%anyAttribute']"/>
10321033

docs/ldml/tr35-general.md

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -266,13 +266,13 @@ The key-type `scope="core"` is also useful in menus. For example, if a menu or p
266266

267267
**Processing types of locale identifier subtags**
268268

269-
When the display name contains "(" or ")" characters (or full-width equivalents), replace them by "\[", "\]" (or full-width equivalents) before adding.
269+
When both the subtag display name and the \<localePattern\> contain bracket characters, replace the brackets in the subtag display name with their nested bracket equivalents according to the [Nested Bracket Replacement](#Character_Nested_Bracket_Replacement) data.
270270

271271
1. **Language.** Match the L subtags against the type values in the `<language>` elements. Pick the element with the most subtags matching. If there is more than one such element, pick the one that has subtypes matching earlier. If there are two such elements, pick the one that is alphabetically less. If there is no match, then further convert L to *canonical form* per **[Part 1, Canonical Unicode Locale Identifiers](tr35.md#Canonical_Unicode_Locale_Identifiers)** and try the preceding steps again. Set LBN to the selected value. Disregard any of the matching subtags in the following processing.
272272
* If CombineLanguage is false, only choose matches with the language subtag matching.
273273
2. **Script, Region, Variants.** Where any of these subtags are in L, append the matching element value to LQS.
274-
3. **T extensions.** Get the value of the `key="h0" type="hybrid"` element, if there is one; otherwise the value of the `<key type="t">` element. Next get the locale display name of the tlang. Join the pair using `<localePattern>` and append to the LQS. Then format and add display names to LQS for any of the remaining tkey-tvalue pairs as described below.
275-
4. **U extensions.** If there is an attribute value A, process the key-value pair <"u", A> as below and append to LQS. Then format and add display names for each of the remaining key-type pairs as described below.
274+
3. **U extensions.** If there is an attribute value A, process the key-value pair <"u", A> as below and append to LQS. Then format and add display names for each of the remaining key-type pairs as described below.
275+
4. **T extensions.** Get the value of the `key="h0" type="hybrid"` element, if there is one; otherwise the value of the `<key type="t">` element. Next get the locale display name of the tlang. Do not use `<localePattern>`; instead, append the subtag display names directly to the LQS. Then format and add display names to LQS for any of the remaining tkey-tvalue pairs as described below.
276276
5. **Other extensions.** There are currently no such extensions defined. Until such time as there are formats defined for them, append each of the extensions’ subtags to LQS.
277277
6. **Private Use extensions.** Get the value
278278

@@ -298,9 +298,9 @@ When the display name contains "(" or ")" characters (or full-width equivalents)
298298
| es-Cyrl-MX | Spanish (Cyrillic, Mexico) |
299299
| en-Latn-GB-fonipa-scouse | English (Latin, United Kingdom, IPA Phonetics, Scouse) |
300300
| en-u-nu-thai-ca-islamic-civil | English (Calendar: islamic-civil, Thai Digits) |
301-
| hi-u-nu-latn-t-en-h0-hybrid | Hindi (Hybrid: English, Western Digits) |
302-
| en-u-nu-deva-t-de | English (Transform: German, Devanagari Digits) |
303-
| fr-z-zz-zzz-v-vv-vvv-u-uu-uuu-t-ru-Cyrl-s-ss-sss-a-aa-aaa-x-u-x | French (Transform: Russian \[Cyrillic\], uu: uuu, a: aa-aaa, s: ss-sss, v: vv-vvv, x: u-x, z: zz-zzz) |
301+
| hi-u-nu-latn-t-en-h0-hybrid | Hindi (Western Digits, Hybrid: English) |
302+
| en-u-nu-deva-t-de-mm-fonipa | English (Devanagari Digits, Transform: German, Myanmar \[Burma\], IPA Phonetics) |
303+
| fr-z-zz-zzz-v-vv-vvv-u-uu-uuu-t-ru-Cyrl-s-ss-sss-a-aa-aaa-x-u-x | French (uu: uuu, Transform: Russian, Cyrillic, a: aa-aaa, s: ss-sss, v: vv-vvv, x: u-x, z: zz-zzz) |
304304

305305

306306

@@ -700,6 +700,28 @@ There are alternatives for cases where the breaks are on a word boundary, where
700700
<ellipsis type="word-initial">… {0}</ellipsis>
701701
```
702702

703+
### <a name="Character_Nested_Bracket_Replacement" href="#Character_Nested_Bracket_Replacement">Nested Bracket Replacement</a>
704+
705+
```xml
706+
<!ELEMENT nestedBracketReplacement ( #PCDATA ) >
707+
<!ATTLIST nestedBracketReplacement bracket CDATA #REQUIRED >
708+
```
709+
710+
Example:
711+
712+
```xml
713+
<nestedBracketReplacement bracket="(">[</nestedBracketReplacement>
714+
<nestedBracketReplacement bracket=")">]</nestedBracketReplacement>
715+
<nestedBracketReplacement bracket="">[</nestedBracketReplacement>
716+
<nestedBracketReplacement bracket="">]</nestedBracketReplacement>
717+
```
718+
719+
The `nestedBracketReplacement` element indicates a character to be used when two sets of brackets (parentheses) are nested. This currently supports only one level of nesting.
720+
721+
Clients should replace the inner bracket pair by substituting the bracket string with the value string. For example, in the string "a ( b ( c ) )", the two inner brackets should be replaced according to the replacements data, resulting in "a ( b [ c ] )".
722+
723+
In cases where it is necessary to determine whether the brackets are nested, clients can use the `Bidi_Paired_Bracket_Type` property.
724+
703725
### <a name="Character_More_Info" href="#Character_More_Info">More Information</a>
704726

705727
The moreInformation string is one that can be displayed in an interface to indicate that more information is available. For example:

docs/ldml/tr35-modifications.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@ This is a partial document, describing only the changes to the LDML since the pr
3434

3535
**Changes in LDML Version 49 (Differences from Version 48.1)**
3636

37+
* New section [Nested Bracket Replacement](tr35-general.html#Character_Nested_Bracket_Replacement)
38+
* [Locale Display Name Algorithm](tr35-general.html#locale_display_name_algorithm) updated to use the nested bracket replacement data and avoid nested parentheses by flattening `-t-` (transform) language names
39+
3740
### MessageFormat
3841

3942
* The `:currency` and `:percent` functions are now Stable, with the same implementations as previously.

tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/PathHeader.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@
5353
//ldml/delimiters/quotation%E ; Core Data ; Alphabetic Information ; Quotation Marks ; $1
5454
//ldml/delimiters/alternateQuotation%E ; Core Data ; Alphabetic Information ; Quotation Marks ; embedded-$1
5555
//ldml/characters/moreInformation ; Core Data ; Alphabetic Information ; Symbols ; More Information
56+
//ldml/characters/nestedBracketReplacement[@bracket="%A"] ; Core Data ; Alphabetic Information ; Symbols ; Nested Bracked Replacement: $1
5657

5758
//ldml/numbers/(default)NumberingSystem ; Core Data ; Numbering Systems ; Numbering System ; $1
5859
//ldml/numbers/otherNumberingSystems/(%E) ; Core Data ; Numbering Systems ; Numbering System ; $1

tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestExampleGenerator.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,7 @@ public void testCurrency() {
146146
ImmutableSet.of(
147147
"//ldml/layout/orientation/characterOrder",
148148
"//ldml/layout/orientation/lineOrder",
149+
"//ldml/characters/nestedBracketReplacement[@bracket=\"([^\"]*+)\"]",
149150
"//ldml/characters/moreInformation",
150151
"//ldml/numbers/symbols[@numberSystem=\"([^\"]*+)\"]/infinity",
151152
"//ldml/numbers/symbols[@numberSystem=\"([^\"]*+)\"]/list",
@@ -2197,6 +2198,7 @@ public void testLightSpeed() {
21972198
{
21982199
SKIP,
21992200
"//ldml/characters/moreInformation"
2201+
+ "//ldml/characters/nestedBracketReplacement[@bracket=\"*\"]"
22002202
+ "//ldml/dates/fields/field[@type=\"*\"]/relative[@type=\"*\"]"
22012203
+ "//ldml/dates/timeZoneNames/gmtZeroFormat"
22022204
+ "//ldml/dates/timeZoneNames/gmtUnknownFormat"

tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestPaths.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -739,6 +739,7 @@ public void testForUndefined() {
739739
"pathMatch", "", // no ids
740740
"languageMatch", "", // no ids
741741
"rgPath", "", // no ids
742+
"nestedBracketReplacement", "", // no ids
742743
"mapTimezones", "", // ids checked elsewhere
743744
"mapZone", "" // ids checked elsewhere
744745
);

tools/cldr-code/src/test/resources/org/unicode/cldr/unittest/TestCoverageLevel.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
^//ldml/references ; 100
1111
^//ldml/characters/ellipsis\[@type=".*"] ; 60
1212
^//ldml/characters/moreInformation ; 60
13+
^//ldml/characters/nestedBracketReplacement\[@bracket=".*"] ; 60
1314
^//ldml/characters/stopwords/stopwordList\[@type=".*"] ; 60
1415
^//ldml/dates/timeZoneNames/fallbackFormat ; 30
1516
^//ldml/dates/timeZoneNames/gmtZeroFormat ; 30

0 commit comments

Comments
 (0)