You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ldml/tr35-dates.md
+47-25Lines changed: 47 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1068,44 +1068,66 @@ As in other cases, **narrow** may be ambiguous out of context.
1068
1068
<!ATTLIST era aliases NMTOKENS #IMPLIED >
1069
1069
```
1070
1070
1071
-
The `<calendarData>` element now provides only locale-independent data about calendar behaviors via its `<calendar>` subelements, which for each calendar can specify the astronomical basis of the calendar (solar, lunar, etc.) and the date ranges for its eras.
1072
-
1073
-
Era start or end dates are specified in terms of the equivalent proleptic Gregorian date (in "y-M-d" format). Eras may be open-ended, with unspecified start or end dates. For example, here are the eras for the Gregorian calendar:
1071
+
The `<calendarData>` element provides locale-independent data about calendar behaviors via its `<calendar>` subelements,
1072
+
which for each calendar can specify the astronomical basis of the calendar (solar, lunar, etc.) and the date ranges for its eras.
<eratype="0"end="0-12-31"code="bce"aliases="bc"/> <!-- Before Common Era, Before Christ -->
1080
+
<eratype="1"start="1-01-01"code="ce"aliases="ad"/> <!-- Common Era, Anno Domini -->
1081
+
</eras>
1082
+
</calendar>
1078
1083
```
1079
1084
1080
-
For a sequence of eras with specified start dates, the end of each era need not be explicitly specified (it is assumed to match the start of the subsequent era). For example, here are the first few eras for the Japanese calendar:
1085
+
If a `<calendar>` contains an `<inheritEras/>` element, all eras from the specified calendar should be inserted in order into the sequence of eras for the current calendar, as described below.
1086
+
For example, the following means that the two eras from calendar "gregorian" should be inserted into the era list for "japanese" for calculations and formatting.
1081
1087
1082
1088
```xml
1083
-
<eratype="0"start="645-6-19" />
1084
-
<eratype="1"start="650-2-15" />
1085
-
<eratype="2"start="672-1-1" />
1086
-
…
1089
+
<calendartype="japanese">
1090
+
<inheritErascalendar="gregorian" />
1091
+
<eras>
1092
+
<eratype="232"start="1868-10-23"code="meiji"/>
1093
+
<eratype="233"start="1912-07-30"code="taisho"/>
1094
+
<eratype="234"start="1926-12-25"code="showa"/>
1095
+
<eratype="235"start="1989-01-08"code="heisei"/>
1096
+
<eratype="236"start="2019-05-01"code="reiwa"/>
1097
+
</eras>
1098
+
</calendar>
1087
1099
```
1088
1100
1089
-
Some eras have additional `code` and `aliases` attributes that define invariant strings for identifying the eras. The `code` is a single globally unique identifier, and `aliases` are space-separated identifiers unique within the calendar. The code and aliases follow the following rules:
1101
+
Each `era` element has a `code` attribute and optional `aliases` attributes that define invariant strings for identifying the eras. These are more mnemonic than the `type` identifiers (see below).
1102
+
The `code` is unique within the calendar, and the `aliases` are space-separated identifiers, each also unique within the calendar.
1103
+
1104
+
The `start` date is specified in terms of the equivalent _proleptic_ Gregorian date in the format "yyyy-MM-dd", such as 1842-01-01.
1105
+
An omitted start date behaves as if start=-∞.
1106
+
1107
+
The order for the eras is given by the following algorithm:
1108
+
- Include all eras from the inheritEras calendar, if there is one.
1109
+
- An omitted start date behaves as if start=-∞
1110
+
- All elements are ordered by their start dates.
1111
+
- No two elements can have the same start date (otherwise the data is invalid).
1090
1112
1091
-
1. Every calendar has either an era with a `code` that is the same as the BCP-47 name of that calendar or an `inheritEras` element pointing to another calendar with such an era. This era should be used for anchoring the "extended year" in the calendar (`u` in the date format pattern).
1092
-
2. Eras that count backwards (larger numbers for older years) are suffixed with `-inverse`.
1093
-
3. If the same era code is used in multiple calendars, then the calculations for year, month, and day in that era must be the same in all calendars in which it is used. For example, the `ethioaa` era is used in two calendar systems.
1113
+
Note that the order of the eras is _not_ necessarily the order in the XML file, nor is it based on the numeric value of the `type`s.
1094
1114
1095
-
If a `<calendar>` contains an `<inheritEras/>` element, all eras from the specified calendar should be inserted in order into the sequence of eras for the current calendar and follow the same start and end date rules. For example:
1115
+
For a given _proleptic_ Gregorian date D and calendar C, the era code for D is in the `era` element in C with the greatest start date ≤ the given date.
1116
+
It is also the _first_`era` element with start date ≤ the given date in C, given the above ordering for `era` elements.
1117
+
1118
+
The `type` has an integer value.
1119
+
The type values do not have to start at 0, nor do they need to be in chronological order.
1120
+
They are used to access the era names in locale files.
1121
+
For example:
1096
1122
1097
1123
```xml
1098
-
<calendartype="japanese">
1099
-
<inheritErascalendar="gregorian" />
1100
-
<eras>
1101
-
<eratype="0"start="645-6-19"/>
1102
-
<eratype="1"start="650-2-15"/>
1103
-
<!-- ... -->
1104
-
</eras>
1105
-
</calendar>
1106
-
```
1124
+
<eratype="232">Meiji</era>
1125
+
<eratype="233">Taishō</era>
1126
+
<eratype="234">Shōwa</era>
1127
+
<eratype="235">Heisei</era>
1128
+
<eratype="236">Reiwa</era>
1107
1129
1108
-
This means that the two eras from calendar "gregorian" should be inserted into the era list for "japanese" for calculations and formatting.
1130
+
The `end` attribute is unused, and is slated for deprecation in the future.
1109
1131
1110
1132
**Note:** The `territories` attribute in the `calendar` element is deprecated. It was formerly used to indicate calendar preference by territory, but this is now given by the _[Calendar Preference Data](#Calendar_Preference_Data)_ below.
Copy file name to clipboardExpand all lines: docs/ldml/tr35-modifications.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,6 +59,9 @@ The LDML specification is divided into the following parts:
59
59
60
60
### Locale Identifiers
61
61
*[Special Script Codes](tr35.md#special-script-codes) Added the `Hntl` compound script. (This is also reflected in the `<scriptData>` elements in supplementalData.xml.)
62
+
*[Likely Subtags](tr35.md#likely-subtags) Changed the Canonicalize step to point to the section on canonicalization.
63
+
*[Unicode Locale Identifier](tr35.md#unicode-locale-identifier) Changed the `attribute` component in the EBNF to be `uattribute` for consistency with `ufield`, etc.
64
+
and to reduce confusion with XML attributes.
62
65
63
66
### Misc.
64
67
*[Character Elements](tr35-general.md#character-elements) Added new exemplar types.
@@ -75,6 +78,7 @@ and updated the guidelines for using the different `dateTimeFormat` types.
75
78
*[Time Zone Format Terminology](tr35-dates.md#time-zone-format-terminology) Added the **Localized GMT format** (replacing the **Specific location format**).
76
79
This affects the behavior of the `z` timezone format symbol.
77
80
There is also now a mechanism for finding the region code from short timezone identifier, which is used for the _non-location formats (generic or specific)_
81
+
*[Calendar Data](tr35-dates.md#calendar-data) Specified more precisely the meaning of the `era` attributes in supplemental data, and how to determine the transition point in time between eras.
78
82
79
83
### Numbers
80
84
*[Plural rules syntax](tr35-numbers.md#plural-rules-syntax) Added substantial clarifications and new examples.
| <aname="ufield"href="#ufield">`ufield`</a><br/>(Also known as `keyword`) |`= ukey (sep uvalue)? ;`|
519
519
| <aname="ukey"href="#ukey">`ukey`</a><br/>(Also known as `key`) |`= alphanum alpha ;`|[`validity`](#Key_Type_Definitions)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/maint/maint-47/common/bcp47) <br/>(Note that this is narrower than in [[RFC6067](https://www.ietf.org/rfc/rfc6067.txt)], so that it is disjoint with `tkey`.) |
520
520
| <aname="uvalue"href="#uvalue">`uvalue`</a><br/>(Also known as `type`) |`= alphanum{3,8}`<br/>` (sep alphanum{3,8})* ;`|[`validity`](#Key_Type_Definitions)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/maint/maint-47/common/bcp47)|
521
-
|`attribute`|`= alphanum{3,8} ;`|
521
+
|<aname="uattribute"href="#uattribute">`uattribute`</a><br/>(Also known as `attribute`)|`= alphanum{3,8} ;`|
@@ -2507,18 +2507,14 @@ A subtag is called _empty_ if it is a missing script or region subtag, or it is
2507
2507
This operation is performed in the following way.
2508
2508
2509
2509
1.**Canonicalize.**
2510
-
1. Make sure the input locale is in canonical form: uses the right separator, and has the right casing.
2511
-
2. Replace any deprecated subtags with their canonical values using the `<alias>` data in supplemental metadata. Use the first value in the replacement list, if it exists.
2512
-
Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is
2513
-
one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ".
2514
-
* There are certain exceptions to this: some implementations still use three obsolete language subtags: iw, in, and yi.
2515
-
The likely subtags data currently supports those implementations by providing elements that handle them,
2516
-
with the deprecated code on both sides: `<likelySubtag from="iw"to="iw_Hebr_IL"/>`
2517
-
Such implementations may refrain from replacing those deprecated tags.
2518
-
3. If the tag is a legacy language tag (marked as “Type: grandfathered” in BCP 47; see `<variable id="$grandfathered" type="choice">` in the supplemental data), then return it.
2519
-
4. Remove the script code 'Zzzz' and the region code 'ZZ' if they occur.
2520
-
5. Get the components of the cleaned-up source tag _(language<sub>s</sub>, script<sub>s</sub>,_ and _region<sub>s</sub>_), plus any variants and extensions.
2521
-
6. If the language is not 'und' and the other two components are not empty, return the language tag composed of _language<sub>s</sub>\_script<sub>s</sub>\_region<sub>s</sub>_ + variants + extensions.
2510
+
1. Canonicalize the locale ID, according to [LocaleID Canonicalization](#annex-c-localeid-canonicalization).
2511
+
* Some implementations still use three obsolete language subtags: iw, in, and yi.
2512
+
The likely subtags data currently supports those implementations by providing elements that handle them, with the deprecated code on both sides:
2513
+
`<likelySubtag from="iw" to="iw_Hebr_IL"/>`.
2514
+
Such implementations may refrain from replacing those deprecated tags while canonicalizing.
2515
+
2. Remove the script code 'Zzzz' and the region code 'ZZ' if they occur.
2516
+
3. Get the components of the cleaned-up source tag _(language<sub>s</sub>, script<sub>s</sub>,_ and _region<sub>s</sub>_), plus any variants and extensions.
2517
+
4. If the language is not 'und' and the other two components are not empty, return the language tag composed of _language<sub>s</sub>\_script<sub>s</sub>\_region<sub>s</sub>_ + variants + extensions.
2522
2518
2.**Lookup.** Look up each of the following in order, and stop on the first match:
0 commit comments