Skip to content

Commit fb79ac4

Browse files
committed
Merge branch 'main' into testdata-gen-datetime-classical-skel-with-sem
2 parents 1db9d5d + 03f081c commit fb79ac4

File tree

8 files changed

+175
-82
lines changed

8 files changed

+175
-82
lines changed

.github/workflows/gh-pages.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,23 @@ jobs:
6161
# Warn, don't fail yet
6262
run: npx markdownlint-cli *.md {specs,docs}/*.md || (echo Warning, please fix these ; true)
6363
- name: Note any changes
64+
# if the archiver had to update anything, or if anchors changed
6465
run: git status ; git diff
66+
- name: Restore lychee cache
67+
uses: actions/cache@v3
68+
with:
69+
path: .lycheecache
70+
key: cache-lychee-${{ github.sha }}
71+
restore-keys: cache-lychee-
72+
- name: Run lychee
73+
uses: lycheeverse/lychee-action@v1
74+
with:
75+
args: "-n --cache --max-cache-age 10d docs/rfc docs/ldml"
76+
fail: false
77+
format: markdown
78+
output: linkcheck.md
79+
- name: Link Checker Summary
80+
run: cat linkcheck.md >> $GITHUB_STEP_SUMMARY
6581
- uses: ruby/setup-ruby@v1
6682
with:
6783
ruby-version: 3.2

common/dtd/ldmlSupplemental.dtd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ CLDR data files are interpreted according to the LDML specification (http://unic
7171
<!--@DEPRECATED-->
7272

7373
<!ELEMENT currency ( alternate* ) >
74+
<!--@ORDERED-->
7475
<!ATTLIST currency before NMTOKEN #IMPLIED >
7576
<!-- use from and to instead. -->
7677
<!--@VALUE-->

common/dtd/ldmlSupplemental.xsd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,7 @@ Note: DTD @-annotations are not currently converted to .xsd. For full CLDR file
204204

205205

206206

207+
207208
<xs:element name="currency">
208209
<xs:complexType>
209210
<xs:sequence>

docs/ldml/tr35.md

Lines changed: 42 additions & 42 deletions
Large diffs are not rendered by default.

docs/site/downloads/cldr-47.md

Lines changed: 43 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,11 @@ CLDR data is used by all [major software systems](/index#who-uses-cldr)
1717
(including all mobile phones) for their software internationalization and localization,
1818
adapting software to the conventions of different languages.
1919

20+
CLDR 47 focused on MessageFormat 2.0 and tooling for an expansion of DDL support.
21+
It was a closed cycle: locale data changes were limited to bug fixes and the addition of new locales, mostly regional variants.
22+
23+
### Changes
24+
2025
The most significant changes in this release are:
2126

2227
- New locales:
@@ -25,18 +30,20 @@ The most significant changes in this release are:
2530
- Updated time zone data to tzdata 2025a
2631
- [RBNF](#number-spellout-data-changes) (Number Spellout Data Improvements) for multiple languages
2732
- Assorted transforms improvements
28-
- Updated language matching for Afrikaans to English (en) from Dutch (nl) [CLDR-18198](https://unicode-org.atlassian.net/browse/CLDR-18198)
29-
- Ordered scripts in decending order of usage per locale [CLDR-18155](https://unicode-org.atlassian.net/browse/CLDR-18155)
30-
- Fixed invalid codes [CLDR-18129](https://unicode-org.atlassian.net/browse/CLDR-18129)
31-
- Updated population data
33+
- Updated and revised population data
34+
- Incorporates all changes from CLDR v46.1.
35+
- [CLDR v46.1](https://cldr.unicode.org/downloads/cldr-46#461-changes) was a special release, which many users of CLDR (including ICU) have not updated to.
36+
So the listed changes are relative to [CLDR v46.0](https://cldr.unicode.org/downloads/cldr-46). v46.1 included the following:
37+
- Message Format 2.0 (Final Candidate)
38+
- More explicit well-formedness and validity constraints for unit of measurement identifiers
39+
- Addition of derived emoji annotations that were missing: emoji with skin tones facing right
40+
- Fixes to make the ja, ko, yue, zh datetimeSkeletons useful for generating the standard patterns
41+
- Improved date/time test data
3242

3343
For more details, see below.
3444

3545
### Locale Coverage Status
3646

37-
CLDR 47 was a closed cycle which means that locale data changes were limited to addition of new locales, and bug fixes.
38-
This means that coverage levels for existing locales did not change in this release.
39-
4047
#### Current Levels
4148

4249
Count | Level | Usage | Examples
@@ -49,7 +56,9 @@ Count | Level | Usage | Examples
4956

5057
For a full listing, see [Coverage Levels](https://unicode.org/cldr/charts/dev/supplemental/locale_coverage.html)
5158

52-
## [Specification Changes](https://www.unicode.org/reports/tr35/proposed.html)
59+
## Specification Changes
60+
61+
**NOTE: the specification changes will be completed by the specification beta: only a few of them are listed here, and the Modifications section is not yet complete.**
5362

5463
The following are the most significant changes to the specification (LDML).
5564

@@ -59,10 +68,14 @@ There are many more changes that are important to implementations, such as chang
5968
See the [Modifications section](https://www.unicode.org/reports/tr35/proposed.html#Modifications) of the specification for details.
6069

6170
## Data Changes
71+
**TBD: Flesh out overview items**
72+
- Updated language matching for Afrikaans to English (en) from Dutch (nl) [CLDR-18198](https://unicode-org.atlassian.net/browse/CLDR-18198)
73+
- Ordered scripts in `<languageData>` in descending order of usage per locale [CLDR-18155](https://unicode-org.atlassian.net/browse/CLDR-18155)
74+
- Fixed certain invalid codes [CLDR-18129](https://unicode-org.atlassian.net/browse/CLDR-18129)
6275

6376
### DTD Changes
6477

65-
- TBD
78+
Most of the DTD changes were in 46.1. One additional change was to order currency values in **TBD get ticket number**
6679

6780
For a full listing, see [Delta DTDs](https://unicode.org/cldr/charts/dev/supplemental/dtd_deltas.html).
6881

@@ -75,20 +88,13 @@ For a full listing, see [Delta DTDs](https://unicode.org/cldr/charts/dev/supplem
7588

7689
For a full listing, see [¤¤BCP47 Delta](https://unicode.org/cldr/charts/dev/delta/bcp47.html) and [¤¤Supplemental Delta](https://unicode.org/cldr/charts/dev/delta/supplemental-data.html)
7790

78-
### [Locale Changes](https://unicode.org/cldr/charts/dev/delta/index.html)
91+
### Locale Changes
7992

8093
- Cleanups for current pattern variants `alt="alphaNextToNumber"` and `alt="noCurrency"`: These were introduced in CLDR 42
8194
(per [CLDR-14336](https://unicode-org.atlassian.net/browse/CLDR-14336)) to provide a cleaner way of adjusting currency
8295
patterns when an alphabetic currency symbol is used, or when a currency-style pattern is desired without a currency symbol
83-
(as for use in a table). Some further adjustments were needed ([CLDR-17879](https://unicode-org.atlassian.net/browse/CLDR-17879)):
84-
- Adjust coverage so that these variants are at moderate (not comprehensive) coverage for standard/accounting currency formats with
85-
`numberSystem="latn"`, and so that `alt="alphaNextToNumber"` is at modern (not comprehensive) for oither relevant number systems in
86-
in a locale. Coverage was already correct for other combinations of these attributes with various numberSystems.
87-
- Adjust PathHeader so compact currency for relevant non-Latn number systems in a locale will appear in Survey Tool.
88-
- In root, add an `alt="alphaNextToNumber"` variant for the standard/accounting currency patterns.
89-
- Ensure that in the most commonly-used locales. for all relevant number systems in the locale, the standard/accounting currency
90-
patterns have both `alt="alphaNextToNumber"` and `alt="noCurrency"` variants (inherting as necessary), and the compact currency
91-
formats have the `alt="alphaNextToNumber"` variants.
96+
(as for use in a table). Gaps in the data coverage showed up, because the translators weren't shown the right values.
97+
Fixes were made in [CLDR-17879](https://unicode-org.atlassian.net/browse/CLDR-17879).
9298
- As noted below in [Migration](#migration), number `<symbols>` elements and format elements (`<currencyFormats>`, `<decimalFormats>`, `<percentFormats>`, `<scientificFormats>`)
9399
should all have a `numberSystem` attribute, and such elements without a `numberSystem` attribute will be deprecated in CLDR 48. To
94100
prepare for this, in CLDR 47, all such elements were either removed (if redundant) or correct by adding a `numberSystem` attribute.
@@ -103,6 +109,7 @@ For a full listing, see [Delta Data](https://unicode.org/cldr/charts/dev/delta/i
103109
### Collation Data Changes
104110

105111
- Two old `zh` collation variants are removed: big5han and gb2312.
112+
They are no longer typically used, and only cover a fraction of the CJK ideographs.
106113
([CLDR-16062](https://unicode-org.atlassian.net/browse/CLDR-16062))
107114

108115
### Number Spellout Data Changes
@@ -121,15 +128,15 @@ For a full listing, see [Delta Data](https://unicode.org/cldr/charts/dev/delta/i
121128

122129
### Segmentation Data Changes
123130

124-
- The word break tailorings for `fi` and `sv` are removed to align with recent discussions in the UTC
131+
- The word break tailorings for `fi` and `sv` are removed to align with recent changes to the root collation
125132
and recent changes to ICU behavior. ([CLDR-18272](https://unicode-org.atlassian.net/browse/CLDR-18272))
126133

127134
### Transform Data Changes
128135

129-
- A new `Hant-Latn` transform is added, and `Hans-Latn` is added as an alias for the existing `Hani-Latn`
130-
transform. When the Unihan data `kMandarin` field has two values, the first is preferred for a `CN`/`Hans`
131-
context, and is used by the `Hani-Latn`/`Hans-Latn` transform; the second is preferred for a `TW`/`Hant`
132-
context, and is now used by the new `Hant-Latn` transform.
136+
- A new `Hant-Latn` transform is added, and `Hans-Latn` is added as an alias for the existing `Hani-Latn` transform.
137+
When the Unihan data `kMandarin` field has two values,
138+
the first is preferred for a `CN`/`Hans` context, and is used by the `Hani-Latn`/`Hans-Latn` transform;
139+
the second is preferred for a `TW`/`Hant` context, and is now used by the new `Hant-Latn` transform.
133140
([CLDR-18080](https://unicode-org.atlassian.net/browse/CLDR-18080))
134141

135142
### JSON Data Changes
@@ -166,33 +173,29 @@ In 46.0, but not in 47.0:
166173

167174
### Tooling Changes
168175

169-
- Assorted SurveyTool improvements including:
176+
There were various SurveyTool improvements targeting expansion of DDL support and error detection, such as the following:
170177
- Added a CLA check
171-
-
172-
- Improved validity checks for codes [CLDR-18129](https://unicode-org.atlassian.net/browse/CLDR-18129)
173-
- Improved ability to detect invalid URLs in the site and spec
178+
- Improved validity checks for codes [CLDR-18129](https://unicode-org.atlassian.net/browse/CLDR-18129)
179+
- Improved ability to detect invalid URLs in the site and spec
174180

175181
### Keyboard Changes
176182

177183
- TBD
178184

179185
## Migration
180186

181-
- Number `<symbols>` elements and format elements (`<currencyFormats>`, `<decimalFormats>`, `<percentFormats>`, `<scientificFormats>`)
182-
should all have a `numberSystem` attribute. In CLDR v48 such elements without a `numberSystem` attribute will be deprecated, and the
183-
corresponding entries in root will be removed; these were only intended as a long-ago migration aid. See the relevant sections of the
184-
LDML specification: [Number Symbols](https://www.unicode.org/reports/tr35/dev/tr35-numbers.html#Number_Symbols) and
185-
[Number Formats](https://www.unicode.org/reports/tr35/dev/tr35-numbers.html#number-formats).
186-
- Any locales that are missing Core data by the end of the CLDR 48 cycle will be removed [CLDR-16004](https://unicode-org.atlassian.net/browse/CLDR-16004)
187-
- The default week numbering will change to ISO instead being based on the calendar week starting in CLDR 48 [CLDR-18275](https://unicode-org.atlassian.net/browse/CLDR-18275).
187+
- Removal of number data without `numberSystem` attributes.
188+
- Number `<symbols>` elements and format elements (`<currencyFormats>`, `<decimalFormats>`, `<percentFormats>`, `<scientificFormats>`)
189+
should all have a `numberSystem` attribute. In CLDR v48 such elements without a `numberSystem` attribute will be deprecated, and the
190+
corresponding entries in root will be removed; these were only intended as a long-ago migration aid. See the relevant sections of the
191+
LDML specification: [Number Symbols](https://www.unicode.org/reports/tr35/dev/tr35-numbers.html#Number_Symbols) and
192+
[Number Formats](https://www.unicode.org/reports/tr35/dev/tr35-numbers.html#number-formats).
193+
- V48 advance warnings
194+
- Any locales that are missing Core data by the end of the CLDR 48 cycle will be removed [CLDR-16004](https://unicode-org.atlassian.net/browse/CLDR-16004)
195+
- The default week numbering will change to ISO instead being based on the calendar week starting in CLDR 48 [CLDR-18275](https://unicode-org.atlassian.net/browse/CLDR-18275).
188196

189197
## Known Issues
190198

191-
1. [CLDR-17095] The region-based firstDay value (see weekData) is currently used for several different purposes. In the future, some of these functions will be separated out:
192-
- The day that should be shown as the first day of the week in a calendar view.
193-
- The first day of the week (day 1) for weekday numbering.
194-
- The first day of the week for week-of-year calendar calculations.
195-
196199
## Acknowledgments
197200

198201
Many people have made significant contributions to CLDR and LDML;

tools/cldr-code/src/main/java/org/unicode/cldr/json/Ldml2JsonConverter.java

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -966,6 +966,7 @@ private int convertCldrItems(
966966
}
967967
}
968968
}
969+
postprocessAfterAdd(out, item);
969970
}
970971

971972
resolveSortingItems(out, nodesForLastItem, sortingItems);
@@ -1045,6 +1046,43 @@ private int convertCldrItems(
10451046
return totalItemsInFile;
10461047
}
10471048

1049+
/**
1050+
* Provide an opportunity to fix up the JsonObject before write, after items were added.
1051+
*
1052+
* @param out the JsonObject which already reflects 'item'
1053+
* @param item the original CLDR item
1054+
*/
1055+
private void postprocessAfterAdd(JsonObject out, CldrItem item) {
1056+
if (item.getUntransformedPath().contains("timeZoneNames/zone")) {
1057+
// add _type values into the time zone tree
1058+
try {
1059+
JsonObject sub = out;
1060+
for (final CldrNode n : item.getNodesInPath()) {
1061+
if (n.getNodeKeyName().equals("cldr")) {
1062+
continue; // skip the top 'cldr' node
1063+
}
1064+
if (!n.getName().equals("zone") && n.getParent().equals("zone")) {
1065+
// child of zone, but not a zone - add the type.
1066+
sub.addProperty("_type", "zone");
1067+
break;
1068+
} else {
1069+
JsonElement je = sub.get(n.getNodeKeyName());
1070+
if (je == null) {
1071+
// then add it! Because we run before the sorting,
1072+
// we can run where the parent isn't added yet.
1073+
je = new JsonObject();
1074+
sub.add(n.getNodeKeyName(), je);
1075+
}
1076+
sub = je.getAsJsonObject(); // traverse into the JSON DOM..
1077+
}
1078+
}
1079+
} catch (ParseException e) {
1080+
System.err.println("Error adding _type in tree for " + item.getUntransformedPath());
1081+
e.printStackTrace();
1082+
}
1083+
}
1084+
}
1085+
10481086
/**
10491087
* Fixup an XPathParts with a specific transform element
10501088
*

tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestDtdData.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -585,7 +585,14 @@ public void TestNewDtdData() {
585585
new HashSet<>(
586586
Arrays.asList("name", "reorder", "row", "settings", "transform")));
587587

588+
/**
589+
* This function has the purpose of validating that the DTD doesn't change without updating this
590+
* test. "old" means "expected" (and so throughout this test)
591+
*/
588592
public static boolean isOrderedOld(String element, DtdType type) {
593+
// currency is ordered in ldmlSupplemental, but not in ldml, so handle it here.
594+
if (type == DtdType.supplementalData && element.equals("currency")) return true;
595+
589596
switch (type) {
590597
case keyboardTest3:
591598
return orderedKeyboardTestElements.contains(element);

tools/cldr-code/src/test/java/org/unicode/cldr/util/TestCLDRFile.java

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
package org.unicode.cldr.util;
22

33
import static org.junit.jupiter.api.Assertions.assertAll;
4+
import static org.junit.jupiter.api.Assertions.assertArrayEquals;
45
import static org.junit.jupiter.api.Assertions.assertEquals;
56
import static org.junit.jupiter.api.Assertions.assertFalse;
67
import static org.junit.jupiter.api.Assertions.assertNotEquals;
@@ -13,6 +14,7 @@
1314
import java.nio.file.Path;
1415
import java.util.Comparator;
1516
import java.util.Iterator;
17+
import java.util.LinkedList;
1618
import java.util.List;
1719
import java.util.Set;
1820
import org.junit.jupiter.api.BeforeAll;
@@ -396,4 +398,29 @@ public void TestTypeNameToCode() {
396398
assertEquals(NameType.KEY_TYPE, NameType.typeNameToCode("key|type"));
397399
assertEquals(NameType.SUBDIVISION, NameType.typeNameToCode("subdivision"));
398400
}
401+
402+
@Test
403+
public void TestDTDOrder() {
404+
CLDRFile file = factory.make("supplementalData", false);
405+
406+
// This is to simulate what is in the LDML2JsonConverter
407+
final Comparator<String> comparator =
408+
DtdData.getInstance(file.getDtdType()).getDtdComparator(null);
409+
final List<String> curr = new LinkedList<>();
410+
for (Iterator<String> it =
411+
file.iterator(
412+
"//supplementalData/currencyData/region[@iso3166=\"BY\"]",
413+
comparator);
414+
it.hasNext(); ) {
415+
final String xpath = it.next();
416+
final XPathParts xpp = XPathParts.getFrozenInstance(xpath);
417+
final String iso4217 = xpp.getAttributeValue(-1, "iso4217");
418+
curr.add(iso4217);
419+
}
420+
final String expect[] = {"BYN", "BYR", "BYB", "RUR", "SUR"};
421+
assertArrayEquals(
422+
expect,
423+
curr.toArray(new String[0]),
424+
"Expected currencies in XML order (will break if BY's currency changes)");
425+
}
399426
}

0 commit comments

Comments
 (0)