Skip to content

Commit a19aa17

Browse files
committed
UCA 16.0 delta 10
From Ken: For this delta, focus on all the remaining script-specific punctuation and symbols. 1. Dandas and double dandas Tulu-Tigalari and Kirat Rai both have script-specific dandas. Intercalate them into unidata.txt roughly in code point order. This puts Kirat Rai after Mro in the list of dandas, and it puts Tulu-Tigalari after Khojki in the list of dandas. The Balinese inverted carik siki and inverted carik pareren are also a variety of danda and double danda. Those can be interlaced with the non-inverted danda and double danda for Balinese. Regenerate allkeys.txt, and verify that these six dandas weight as expected. 2. Miscellaneous other punctuation 1B7F BALINESE PANTI BAWAK is a variant of 1B5A BALINESE PANTI. Just intercalate after 1B5A. 10D6E GARAY HYPHEN is another script-specific hyphen. Add to the list of script-specific hyphens in the dashes and hyphens subsection of punctuation. 113D7..113D8, the two Tulu-Tigalari pushpikas can just be intercalated in the miscellaneous punctuation section of unidata.txt, after the Khojki abbreviation sign and before the Newa punctuation. 16D6D KIRAT RAI SIGN YUPI is another Indic abbreviation sign. Add in the script-specific miscellaneous punctuation section of unidata.txt, in code point order. 11BE1 SUNUWAR SIGN PVO is an auspicious mark, similar in some ways to a siddham mark or a Devanagari bhale. These may have particular pronunciations, but are treated as punctuation marks. Just add to unidata.txt in the script-specific section of miscellaneous punctuation in code point order. That will put it right after 119E2 NANDINAGARI SIGN SIDDHAM. 1E5FF OL ONAL ABBREVIATION SIGN. Likewise just add in the script-specific miscellaneous punctuation section in code point order. Regenerate allkeys.txt, and verify that these seven punctuation marks weight as expected and are indicated as variables. 3. Miscellaneous other symbols Garay plus sign (10D8E) and minus sign (10D8F). In the absence of any better information about these, just intercalate in the math symbols section of unidata.txt after 002B PLUS SIGN and 2212 MINUS SIGN, respectively. Regenerate allkeys.txt and verify that these two symbols weight as expected and are indicated as variables. 4. Garay reduplication mark 10D6F GARAY REDUPLICATION MARK has its properties misconstrued. It is explained in the proposal (L2/22-048) under section 4, Punctuation. However, most examples of iteration or reduplication marks are designated as gc=Lm, Extender=True in the UCD. This keeps the reduplication or iteration mark within the context of the word for segmentation purposes, which is usually the desired outcome. A few similar marks have ended up as gc=Po. But the Garay proposal specifies the mark as gc=So, which is clearly wrong. I am updating UnicodeData.txt for 16.0 to change this character from gc=So to gc=Lm, and then updating the underlying library for the sifter accordingly. With the updated interpretation of properties, 10D6F can be intercalated in the extenders section of unidata.txt. I put it between AAF4 MEETEI MAYEK WORD REPETITION MARK and 16B42 PAHAWH HMONG SIGN VOS NRUA. Regenerate allkeys.txt and verify that these 10D6F shows up with a primary weight and is grouped with the extenders. 16 more down, 4 to go. Archive this delta 10: unidata-16.0.0d10.txt (1602884 bytes, 10/08/2023)
1 parent 820b711 commit a19aa17

File tree

2 files changed

+32301
-32265
lines changed

2 files changed

+32301
-32265
lines changed

c/uca/sifter/unidata.txt

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@
99
# Default Unicode Collation Element Table (DUCET) for
1010
# the Unicode Collation Algorithm.
1111
#
12-
# Version 16.0.0 draft 9 (Unicode Version: 16.0.0)
13-
# based on Unicode data file UnicodeData-16.0.0d7.txt
12+
# Version 16.0.0 draft 10 (Unicode Version: 16.0.0)
13+
# based on Unicode data file UnicodeData-16.0.0d8.txt
1414
# Ordering for Unicode 16.0
1515
#
1616
# Fields:
@@ -1971,6 +1971,7 @@ FE63;SMALL HYPHEN-MINUS;Pd;<small> 002D;;;;;
19711971
1B60;BALINESE PAMENENG;Po;;;line-breaking hyphen;;;
19721972
1806;MONGOLIAN TODO SOFT HYPHEN;Pd;;;;;;
19731973
1807;MONGOLIAN SIBE SYLLABLE BOUNDARY MARKER;Po;;;;;;
1974+
10D6E;GARAY HYPHEN;Pd;;;;;;
19741975

19751976
2010;HYPHEN;Pd;;;;;;
19761977

@@ -2209,7 +2210,9 @@ A92F;KAYAH LI SIGN SHYA;Po;;;;;;
22092210
1AAA;TAI THAM SIGN SATKAAN;Po;;;;;;
22102211
1AAB;TAI THAM SIGN SATKAANKUU;Po;;;;;;
22112212
1B5E;BALINESE CARIK SIKI;Po;;;danda;;;
2213+
1B4E;BALINESE INVERTED CARIK SIKI;Po;;;;;;
22122214
1B5F;BALINESE CARIK PAREREN;Po;;;double danda;;;
2215+
1B4F;BALINESE INVERTED CARIK PAREREN;Po;;;;;;
22132216
A9C8;JAVANESE PADA LINGSA;Po;;;;;;
22142217
A9C9;JAVANESE PADA LUNGSI;Po;;;;;;
22152218
AA5D;CHAM PUNCTUATION DANDA;Po;;;;;;
@@ -2229,6 +2232,8 @@ ABEB;MEETEI MAYEK CHEIKHEI;Po;;;;;;
22292232
111C6;SHARADA DOUBLE DANDA;Po;;;;;;
22302233
11238;KHOJKI DANDA;Po;;;;;;
22312234
11239;KHOJKI DOUBLE DANDA;Po;;;;;;
2235+
113D4;TULU-TIGALARI DANDA;Po;;;;;;
2236+
113D5;TULU-TIGALARI DOUBLE DANDA;Po;;;;;;
22322237
1144B;NEWA DANDA;Po;;;;;;
22332238
1144C;NEWA DOUBLE DANDA;Po;;;;;;
22342239
115C2;SIDDHAM DANDA;Po;;;;;;
@@ -2244,6 +2249,8 @@ ABEB;MEETEI MAYEK CHEIKHEI;Po;;;;;;
22442249
11F44;KAWI DOUBLE DANDA;Po;;;;;;
22452250
16A6E;MRO DANDA;Po;;;;;;
22462251
16A6F;MRO DOUBLE DANDA;Po;;;;;;
2252+
16D6E;KIRAT RAI DANDA;Po;;;;;;
2253+
16D6F;KIRAT RAI DOUBLE DANDA;Po;;;;;;
22472254
1C7E;OL CHIKI PUNCTUATION MUCAAD;Po;;;;;;
22482255
1C7F;OL CHIKI PUNCTUATION DOUBLE MUCAAD;Po;;;;;;
22492256

@@ -2256,6 +2263,7 @@ ABEB;MEETEI MAYEK CHEIKHEI;Po;;;;;;
22562263
1A1E;BUGINESE PALLAWA;Po;;;;;;
22572264
1A1F;BUGINESE END OF SECTION;Po;;;;;;
22582265
1B5A;BALINESE PANTI;Po;;;;;;
2266+
1B7F;BALINESE PANTI BAWAK;Po;;;;;;
22592267
1B5B;BALINESE PAMADA;Po;;;;;;
22602268
1B7D;BALINESE PANTI LANTANG;Po;;;;;;
22612269
1B7E;BALINESE PAMADA LANTANG;Po;;;;;;
@@ -2931,6 +2939,8 @@ AA5C;CHAM PUNCTUATION SPIRAL;Po;;;;;;
29312939
1123B;KHOJKI SECTION MARK;Po;;;;;;
29322940
1123C;KHOJKI DOUBLE SECTION MARK;Po;;;;;;
29332941
1123D;KHOJKI ABBREVIATION SIGN;Po;;;;;;
2942+
113D7;TULU-TIGALARI SIGN OM PUSHPIKA;Po;;;;;;
2943+
113D8;TULU-TIGALARI SIGN SHRII PUSHPIKA;Po;;;;;;
29342944
1144D;NEWA COMMA;Po;;;;;;
29352945
1145A;NEWA DOUBLE COMMA;Po;;;;;;
29362946
1144E;NEWA GAP FILLER;Po;;;;;;
@@ -2964,6 +2974,7 @@ AA5C;CHAM PUNCTUATION SPIRAL;Po;;;;;;
29642974
1183B;DOGRA ABBREVIATION SIGN;Po;;;;;;
29652975
11945;DIVES AKURU GAP FILLER;Po;;;;;;
29662976
119E2;NANDINAGARI SIGN SIDDHAM;Po;;;;;;
2977+
11BE1;SUNUWAR SIGN PVO;Po;;;;;;
29672978
11FFF;TAMIL PUNCTUATION END OF TEXT;Po;;;;;;
29682979

29692980
16B37;PAHAWH HMONG SIGN VOS THOM;Po;;;;;;
@@ -2973,6 +2984,8 @@ AA5C;CHAM PUNCTUATION SPIRAL;Po;;;;;;
29732984
16B3B;PAHAWH HMONG SIGN VOS FEEM;Po;;;;;;
29742985
16B44;PAHAWH HMONG SIGN XAUS;Po;;;;;;
29752986

2987+
16D6D;KIRAT RAI SIGN YUPI;Po;;;;;;
2988+
29762989
16E99;MEDEFAIDRIN SYMBOL AIVA;Po;;;;;;
29772990
16E9A;MEDEFAIDRIN EXCLAMATION OH;Po;;;;;;
29782991

@@ -2982,6 +2995,8 @@ AA5C;CHAM PUNCTUATION SPIRAL;Po;;;;;;
29822995
1DA8A;SIGNWRITING COLON;Po;;;;;;
29832996
1DA8B;SIGNWRITING PARENTHESIS;Po;;;;;;
29842997

2998+
1E5FF;OL ONAL ABBREVIATION SIGN;Po;;;;;;
2999+
29853000
# *********************************************************
29863001

29873002
# Section: Symbols
@@ -3686,6 +3701,8 @@ FE62;SMALL PLUS SIGN;Sm;<small> 002B;;;;;
36863701
208A;SUBSCRIPT PLUS SIGN;Sm;<sub> 002B;;;;;
36873702
FB29;HEBREW LETTER ALTERNATIVE PLUS SIGN;Sm;<font> 002B;;;;;
36883703

3704+
10D8E;GARAY PLUS SIGN;Sm;;;;;;
3705+
36893706
00B1;PLUS-MINUS SIGN;Sm;;;;;;
36903707
00F7;DIVISION SIGN;Sm;;;;;;
36913708
00D7;MULTIPLICATION SIGN;Sm;;;;;;
@@ -3724,6 +3741,8 @@ FF5E;FULLWIDTH TILDE;Sm;<wide> 007E;;;;;
37243741
208B;SUBSCRIPT MINUS;Sm;<sub> 2212;;;;;
37253742
2052;COMMERCIAL MINUS SIGN;Sm;;;;;;
37263743

3744+
10D8F;GARAY MINUS SIGN;Sm;;;;;;
3745+
37273746
2213;MINUS-OR-PLUS SIGN;Sm;;;;;;
37283747
2214;DOT PLUS;Sm;;;;;;
37293748
2215;DIVISION SLASH;Sm;;;;;;
@@ -12270,6 +12289,7 @@ AA70;MYANMAR MODIFIER LETTER KHAMTI REDUPLICATION;Lm;;;;;;
1227012289
AADD;TAI VIET SYMBOL SAM;Lm;;;;;;
1227112290
AAF3;MEETEI MAYEK SYLLABLE REPETITION MARK;Lm;;;;;;
1227212291
AAF4;MEETEI MAYEK WORD REPETITION MARK;Lm;;;;;;
12292+
10D6F;GARAY REDUPLICATION MARK;Lm;;;;;;
1227312293
16B42;PAHAWH HMONG SIGN VOS NRUA;Lm;;;;;;
1227412294
16B43;PAHAWH HMONG SIGN IB YAM;Lm;;;;;;
1227512295
1E13C;NYIAKENG PUACHUE HMONG SIGN XW XW;Lm;;;;;;

0 commit comments

Comments
 (0)