Skip to content

Commit 62d34bd

Browse files
CLDR-18108 Give all languages a primary script: trivial cases
This change adds "primary" scripts to many languages in language_script.tsv. This won't change likely subtags, rather this just future-proofs our data by recognizing a singular primary script, avoiding issues where ambiguities served customers the wrong script. I also added scripts for languages in country_language_population.tsv that were missing.
1 parent b895643 commit 62d34bd

File tree

4 files changed

+90
-63
lines changed

4 files changed

+90
-63
lines changed

common/supplemental/supplementalData.xml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1315,7 +1315,9 @@ XXX Code for transations where no currency is involved
13151315
<language type="ann" scripts="Latn"/>
13161316
<language type="anp" scripts="Deva"/>
13171317
<language type="aoz" scripts="Latn"/>
1318+
<language type="apc" scripts="Arab"/>
13181319
<language type="apc" territories="IL JO LB PS SY TR" alt="secondary"/>
1320+
<language type="apd" scripts="Arab"/>
13191321
<language type="apd" territories="SD" alt="secondary"/>
13201322
<language type="ar" scripts="Arab" territories="AE BH DJ DZ EG EH ER IL IQ JO KM KW LB LY MA MR OM PS QA SA SD SO SY TD TN YE"/>
13211323
<language type="ar" scripts="Syrc" territories="IR SS" alt="secondary"/>
@@ -1397,6 +1399,7 @@ XXX Code for transations where no currency is involved
13971399
<language type="bjj" territories="IN" alt="secondary"/>
13981400
<language type="bjn" scripts="Latn"/>
13991401
<language type="bjn" territories="ID" alt="secondary"/>
1402+
<language type="bjt" scripts="Latn"/>
14001403
<language type="bjt" territories="SN" alt="secondary"/>
14011404
<language type="bkm" scripts="Latn"/>
14021405
<language type="bku" scripts="Latn"/>
@@ -1422,6 +1425,7 @@ XXX Code for transations where no currency is involved
14221425
<language type="brx" scripts="Deva"/>
14231426
<language type="brx" territories="IN" alt="secondary"/>
14241427
<language type="bs" scripts="Cyrl Latn" territories="BA"/>
1428+
<language type="bsc" scripts="Latn"/>
14251429
<language type="bsc" territories="SN" alt="secondary"/>
14261430
<language type="bss" scripts="Latn"/>
14271431
<language type="bto" scripts="Latn"/>
@@ -1445,6 +1449,7 @@ XXX Code for transations where no currency is involved
14451449
<language type="cay" scripts="Latn"/>
14461450
<language type="cch" scripts="Latn"/>
14471451
<language type="ccp" scripts="Beng Cakm"/>
1452+
<language type="ccr" scripts="Latn" alt="secondary"/>
14481453
<language type="ce" scripts="Cyrl"/>
14491454
<language type="ce" territories="RU" alt="secondary"/>
14501455
<language type="ceb" scripts="Latn"/>
@@ -1712,6 +1717,7 @@ XXX Code for transations where no currency is involved
17121717
<language type="ilo" territories="PH" alt="secondary"/>
17131718
<language type="inh" scripts="Cyrl"/>
17141719
<language type="inh" scripts="Arab Latn" territories="RU" alt="secondary"/>
1720+
<language type="io" scripts="Latn" alt="secondary"/>
17151721
<language type="is" scripts="Latn" territories="IS"/>
17161722
<language type="it" scripts="Latn" territories="CH IT SM VA"/>
17171723
<language type="it" territories="DE FR HR MT US" alt="secondary"/>
@@ -1721,6 +1727,7 @@ XXX Code for transations where no currency is involved
17211727
<language type="ja" scripts="Jpan" territories="JP"/>
17221728
<language type="jam" scripts="Latn"/>
17231729
<language type="jam" territories="JM" alt="secondary"/>
1730+
<language type="jbo" scripts="Latn" alt="secondary"/>
17241731
<language type="jgo" scripts="Latn"/>
17251732
<language type="jmc" scripts="Latn"/>
17261733
<language type="jml" scripts="Deva"/>
@@ -1749,6 +1756,7 @@ XXX Code for transations where no currency is involved
17491756
<language type="kdt" scripts="Thai"/>
17501757
<language type="kea" scripts="Latn"/>
17511758
<language type="kea" territories="CV" alt="secondary"/>
1759+
<language type="ken" scripts="Latn"/>
17521760
<language type="kfo" scripts="Latn"/>
17531761
<language type="kfr" scripts="Deva"/>
17541762
<language type="kfr" territories="IN" alt="secondary"/>
@@ -1786,6 +1794,7 @@ XXX Code for transations where no currency is involved
17861794
<language type="kmb" territories="AO" alt="secondary"/>
17871795
<language type="kn" scripts="Knda"/>
17881796
<language type="kn" territories="IN" alt="secondary"/>
1797+
<language type="knf" scripts="Latn"/>
17891798
<language type="knf" territories="SN" alt="secondary"/>
17901799
<language type="knn" scripts="Deva"/>
17911800
<language type="knn" territories="IN" alt="secondary"/>
@@ -1845,6 +1854,7 @@ XXX Code for transations where no currency is involved
18451854
<language type="lbe" territories="RU" alt="secondary"/>
18461855
<language type="lbw" scripts="Latn"/>
18471856
<language type="lcp" scripts="Thai"/>
1857+
<language type="len" scripts="Latn" alt="secondary"/>
18481858
<language type="lep" scripts="Lepc"/>
18491859
<language type="lez" scripts="Cyrl"/>
18501860
<language type="lez" scripts="Aghb" territories="RU" alt="secondary"/>
@@ -1925,6 +1935,7 @@ XXX Code for transations where no currency is involved
19251935
<language type="mfa" territories="TH" alt="secondary"/>
19261936
<language type="mfe" scripts="Latn"/>
19271937
<language type="mfe" territories="MU" alt="secondary"/>
1938+
<language type="mfv" scripts="Latn"/>
19281939
<language type="mfv" territories="SN" alt="secondary"/>
19291940
<language type="mg" scripts="Latn" territories="MG"/>
19301941
<language type="mgh" scripts="Latn"/>
@@ -2087,6 +2098,7 @@ XXX Code for transations where no currency is involved
20872098
<language type="pnt" scripts="Cyrl Latn" alt="secondary"/>
20882099
<language type="pon" scripts="Latn"/>
20892100
<language type="pon" territories="FM" alt="secondary"/>
2101+
<language type="ppl" scripts="Latn"/>
20902102
<language type="pqm" scripts="Latn"/>
20912103
<language type="prd" scripts="Arab"/>
20922104
<language type="prg" scripts="Latn" alt="secondary"/>
@@ -2207,6 +2219,7 @@ XXX Code for transations where no currency is involved
22072219
<language type="sms" scripts="Latn"/>
22082220
<language type="sms" territories="FI" alt="secondary"/>
22092221
<language type="sn" scripts="Latn" territories="ZW"/>
2222+
<language type="snf" scripts="Latn"/>
22102223
<language type="snf" territories="SN" alt="secondary"/>
22112224
<language type="snk" scripts="Latn"/>
22122225
<language type="snk" territories="ML" alt="secondary"/>
@@ -2295,6 +2308,7 @@ XXX Code for transations where no currency is involved
22952308
<language type="tmh" territories="NE" alt="secondary"/>
22962309
<language type="tn" scripts="Latn" territories="BW"/>
22972310
<language type="tn" territories="ZA" alt="secondary"/>
2311+
<language type="tnr" scripts="Latn"/>
22982312
<language type="tnr" territories="SN" alt="secondary"/>
22992313
<language type="to" scripts="Latn" territories="TO"/>
23002314
<language type="tog" scripts="Latn"/>

docs/site/development/updating-codes/update-language-script-info/language-script-description.md

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,16 @@ title: Language Script Description
44

55
# Language Script Description
66

7-
The language\_script spreadsheet should list all of the language / script combinations that are in common modern use. The countries are not important, since their function has been overtaken by the country\_language\_population spreadsheet.
7+
The [`language\_script.tsv`](https://github.com/unicode-org/cldr/blob/main/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/language_script.tsv) data file should list all of the language / script combinations that are in common use. Usage by country is indicated in the [`country\_language\_population.tsv`](https://github.com/unicode-org/cldr/blob/main/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/country_language_population.tsv) spreadsheet.
88

9-
1. If the language and script are both modern, and the script is a major way to write the language in some country, then we should see that line marked as **primary**.
10-
2. Otherwise it should be marked **secondary**.
11-
12-
Every language that is in official use in any country according to country\_language\_population  should have at least one primary script in the language\_script spreadsheet.
9+
1. Every language needs at least 1 script considered the **primary** script.
10+
1. This data is used to determine [the most Likely language and region](likelysubtags-and-default-content) so there needs to be at least 1 primary value.
11+
2. [Changed in v47] Include a primary script for historical languages (eg. Ancient Greek, Coptic). The primary script should reflect where the majority of the written corpus originates from.
12+
2. Languages written by significant populations with different scritps in different countries can have multiple **primary** scripts. The [likely subtags](https://www.unicode.org/cldr/charts/latest/supplemental/likely_subtags.html) patterns will use population counts to disambiguate the default script for each locale.
13+
3. Other scripts used for a language should be marked **secondary**.
1314

1415
If a language has multiple primary scripts, then it should not appear without the script tag in the country\_language\_population.tsv. For example, we should not see "az", but rather "az\_Cyrl", "az\_Latn", and so on. For each country where the language is used, we should see figures on the script\-specific values. The values may overlap, that is, we may see az\_Cyrl at 60% and az\_Latn at 55%. However, the combination with the predominantly used script **must** have a larger figure than the others.
1516

1617
This is also reflected in CLDR main: languages with multiple scripts will have that reflected in their structure (eg sr\-Cyrl\-RS), with aliases for the language\-region combinations.
1718

18-
Files in https://github.com/unicode-org/cldr/tree/main/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data
19-
20-
1. country\_language\_population.tsv
21-
2. language\_script.tsv
22-
19+
In order to re-generate the XML data use ConvertLanguageData as written about in [the article about updating the language scripts](.../update-language-script-info.md).

0 commit comments

Comments
 (0)