Skip to content

Commit 5eaa3a8

Browse files
committed
UCA 16.0 delta 7
From Ken: 1. Todhri Given all the complications of Kirat Rai to start off the day, I'm rewarding myself before lunch by dealing with an easy case: Todhri. This is just a straight unicameral alphabet, with no complications other than two letters that have canonical equivalent sequences. Move the relevant entries for Todhri (105C0..105F3), in code point order, into unidata.txt, right after Vithkuqi. Apply the CONTRACTION pragma to the two decomposed vowels, 105C9 and 105E4. Generate allkeys.txt and verify that Todhri weights are as expected, including the two contractions. 2. Sunuwar This is another simple one, another simple unicameral alphabet with no marks, and with the desired collation order the same as the code point order. Move the relevant entries for Sunuwar (10BC0..11BE0), in code point order, into unidata.txt, right after Tangsa (and ahead of the Kirat Rai I just added). Leave the one punctuation sign to deal with later. Generate allkeys.txt and verify that Sunuwar weights are as expected. 3. Gurung Khema Gurung Khema is a bit more complicated. This one is an abugida, and it has decomposition and contraction issues for the vowel signs. First move all the relevant entries for Gurung Khema (16100..1612F), in code point order, into unidata.txt, right after Sunuwar. The 8 multi-part vowels with decompositions, 16121..16128, need to have the CONTRACTION pragma, as the intent is for the vowels to all get primary weights. 3 of the multi-part vowels, 16126..16128, have full decompositions into sequences of three parts. Because of this, as for Kirat Rai discussed above, those three need to have the full decompositions added in their entries as secondary decompositions. The entries affected are: 16126;GURUNG KHEMA VOWEL SIGN O;Mn;16121 1611F, 1611E 1611E 1611F;;;;; 16127;GURUNG KHEMA VOWEL SIGN OO;Mn;16122 1611F, 1611E 16129 1611F;;;;; 16128;GURUNG KHEMA VOWEL SIGN AU;Mn;16121 16120, 1611E 1611E 16120;;;;; A replication note for when trying to build allkeys.txt with sifter in the unicodetools: Before the sifter will work correctly for weighting of abugidas, the Alphabetic property has to be updated for the repertoire in question. In particular, all gc=Mn or gc=Mc vowel signs, consonant signs, and length marks in abugidas need to be set explicitly to Other_Alphabetic in PropList.txt first (and the relevant derivations be run based on that). Otherwise, during the sift process, the sifter won't see these as alphabetic and branch down the path for primary weights, but rather will identify them as otherwise unaccounted for combining marks, and attempt to give them secondary weights. Anusvaras and visargas also should be set to Other_Alphabetic, but those are already bled off in unidata.txt by being given explicit decompositions to generic marks. Another piece of the puzzle is that nuktas and viramas (including killers) should be given the Diacritic property in PropList.txt, but these are more marginal for sifter behavior. Most nuktas are now bled off with explicit decompositions, and the viramas are almost all picked up in the sifter via their ccc=9 values. This could become a problem in the future if SAH insists on ccc=0 for some newly encoded viramas, at which point the sifter code may need an update to catch any combining mark viramas (or conjoiners and killers) with ccc=0. The example we have for 16.0, in Kirat Rai, is not a problem, because that is gc=Lm, ccc=0, so the sifter gets its Alphabetic status from gc=Lm and assigns it a primary weight. Generate allkeys.txt and verify that Gurung Khema weights are as expected, with special attention to the vowel contractions. 4. Tulu-Tigalari First move all the relevant entries for Tulu-Tigalari (11380..113D0), in code point order, into unidata.txt, right after Grantha. Put in the CONTRACTION pragma for the 3 two-part dependent vowel signs, 113C5, 113C7, 113C8. Do the same for each of the 4 two-part independent vowel signs, 11383, 11385, 1138E, 11391. Those aren't contiguous in code point order, so should use multiple entries for the pragma, to make sure they don't pick up entries that they shouldn't. Now, checking against the detailed specification of the collation order in the proposal (L2/22-031), invert the order of 113B3 LLA and 113B4 RRA, so the collation order is RRA < LLA. That seems to be a deliberate choice in the proposal. Next move 113D1 TULU-TIGALARI REPHA into unidata.txt, immediately after the RA (113AC). The repha is a separately encoded form of ra. Note that for the Tulu-Tigalari vowels, there are deliberate encoding gaps for short e and short o. Those might be added to the encoding later on, in which case they would intercalate neatly in the gaps, and would fit in the same places in the primary collation order. There is one anomaly in the specification of collation in that it specifies the primary order for vowel sign o, even though that is not encoded. There is also a typo indicating: vowel sign vocalic ll << vowel sign ee. That should be a primary distinction, like all the rest. Ignored. The au length mark (113C8) only occurs as the second part of some two-part vowels, and is basically would not be weighted alone in most text, because it is basically bled by contractions that form the weights for atomically encoded two-part vowels. It makes more sense to give it a primary order *after* the viramas, so I have reversed the position in unidata.txt, as compared to the specification in the proposal. See the treatment for Grantha, which has similar components. The pluta (113D3) is not in L2/22-031. It was added later, based on Srinidhi and Sridatta's L2/22-260. L2/22-260 is silent about its ordering. It is a letter that serves as a different kind of vowel lengthener. I'm giving it a primary order after the au length mark. Again, see the comparable treatment of the same component in Grantha. The gemination mark (113D2) is also not in L2/22-031, but comes from L2/22-260, which is silent about its ordering. However, the comparison is made there to Gurmukhi addak (0A71), Khojki sign shadda (11237), and Soyombo gemination mark (11A98). For DUCET, 0A71 is given a Gurmukhi-specific secondary weight. The Khojki shadda is simply equated to the Arabic shadda. The Soyombo gemination mark is given a Soyombo-specific secondary weight. On balance, it seems best to just add a new secondary weight for the Tulu-Tigalari gemination mark. I defer that to later, along with any other new secondary weight additions required. Remember that Garay is also introducing a gc=Mn gemination mark, so I have to figure out how to deal with that one, too. So later. Regenerate allkeys.txt, and verify that the Tulu-Tigalari weights are as expected, with special attention to the various vowel contractions and to the few other characters that receive primary weight not in code point order, as noted above. 5. Ol Onal Ol Onal is another easy one. It is a simple alphabet. The proposal (L2/22-151R) specifies that the collation order is simply the same as the encoding order, however, the discussion of the use of the two combining marks, MU (nasalization, a dot above) and IKIR (lengthening, a dot below) suggests to me that it makes more sense to give them secondary weights. That is, rather than what is specified in the proposal: A < A+MU < A+IKIR < A+IKIR+MU = A+MU+IKIR what probably makes more sense for ordering is: A << A+MU << A+IKIR << A+IKIR+MU = A+MU+IKIR which would be accomplished better with secondary weights for MU and IKIR. The proposal compares these two marks to the corresponding marks in Ol Chiki, which are spacing and given gc=Lm, and the corresponding marks in Nag Mundari, which are non-spacing diacritics. For best consistency, I think we should follow the pattern of Nag Mundari, which gives the two non-spacing marks secondary weights, rather than the Ol Chiki pattern, where the spacing modifier letters get primary weights. In any case, since the preferred solution involves assigning new secondary weights, I defer the MU and IKIR to a later draft when I deal with those. So for now, for the rest of the alphabet, I move all the letters (1E5D0..1E5ED) and the HODDOND (1E5F0), in code point order, into unidata.txt right after Ol Chiki. Regenerate allkeys.txt, and verify that the Ol Onal weights are as expected. I'll save Garay for the next delta. 233 more down, 771 to go. Archive this delta 7: unidata-16.0.0d7.txt (1569698 bytes, 10/08/2023)
1 parent f286608 commit 5eaa3a8

File tree

2 files changed

+14734
-14159
lines changed

2 files changed

+14734
-14159
lines changed

c/uca/sifter/unidata.txt

Lines changed: 323 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
# Default Unicode Collation Element Table (DUCET) for
1010
# the Unicode Collation Algorithm.
1111
#
12-
# Version 16.0.0 draft 6 (Unicode Version: 16.0.0)
12+
# Version 16.0.0 draft 7 (Unicode Version: 16.0.0)
1313
# based on Unicode data file UnicodeData-16.0.0d7.txt
1414
# Ordering for Unicode 16.0
1515
#
@@ -22260,6 +22260,119 @@ DEFAULT
2226022260
11357;GRANTHA AU LENGTH MARK;Mc;;;;;;
2226122261
1135D;GRANTHA SIGN PLUTA;Lo;;;;;;
2226222262

22263+
# Tulu-Tigalari script starts here
22264+
22265+
11380;TULU-TIGALARI LETTER A;Lo;;;;;;
22266+
11381;TULU-TIGALARI LETTER AA;Lo;;;;;;
22267+
11382;TULU-TIGALARI LETTER I;Lo;;;;;;
22268+
22269+
# Tulu-Tigalari two-part vowels collate as units, not
22270+
# by their decompositions.
22271+
22272+
CONTRACTION
22273+
22274+
11383;TULU-TIGALARI LETTER II;Lo;11382 113C9;;;;;
22275+
22276+
DEFAULT
22277+
22278+
11384;TULU-TIGALARI LETTER U;Lo;;;;;;
22279+
22280+
CONTRACTION
22281+
22282+
11385;TULU-TIGALARI LETTER UU;Lo;11384 113BB;;;;;
22283+
22284+
DEFAULT
22285+
22286+
11386;TULU-TIGALARI LETTER VOCALIC R;Lo;;;;;;
22287+
11387;TULU-TIGALARI LETTER VOCALIC RR;Lo;;;;;;
22288+
11388;TULU-TIGALARI LETTER VOCALIC L;Lo;;;;;;
22289+
11389;TULU-TIGALARI LETTER VOCALIC LL;Lo;;;;;;
22290+
1138B;TULU-TIGALARI LETTER EE;Lo;;;;;;
22291+
22292+
CONTRACTION
22293+
22294+
1138E;TULU-TIGALARI LETTER AI;Lo;1138B 113C2;;;;;
22295+
22296+
DEFAULT
22297+
22298+
11390;TULU-TIGALARI LETTER OO;Lo;;;;;;
22299+
22300+
CONTRACTION
22301+
22302+
11391;TULU-TIGALARI LETTER AU;Lo;11390 113C9;;;;;
22303+
22304+
DEFAULT
22305+
22306+
11392;TULU-TIGALARI LETTER KA;Lo;;;;;;
22307+
11393;TULU-TIGALARI LETTER KHA;Lo;;;;;;
22308+
11394;TULU-TIGALARI LETTER GA;Lo;;;;;;
22309+
11395;TULU-TIGALARI LETTER GHA;Lo;;;;;;
22310+
11396;TULU-TIGALARI LETTER NGA;Lo;;;;;;
22311+
11397;TULU-TIGALARI LETTER CA;Lo;;;;;;
22312+
11398;TULU-TIGALARI LETTER CHA;Lo;;;;;;
22313+
11399;TULU-TIGALARI LETTER JA;Lo;;;;;;
22314+
1139A;TULU-TIGALARI LETTER JHA;Lo;;;;;;
22315+
1139B;TULU-TIGALARI LETTER NYA;Lo;;;;;;
22316+
1139C;TULU-TIGALARI LETTER TTA;Lo;;;;;;
22317+
1139D;TULU-TIGALARI LETTER TTHA;Lo;;;;;;
22318+
1139E;TULU-TIGALARI LETTER DDA;Lo;;;;;;
22319+
1139F;TULU-TIGALARI LETTER DDHA;Lo;;;;;;
22320+
113A0;TULU-TIGALARI LETTER NNA;Lo;;;;;;
22321+
113A1;TULU-TIGALARI LETTER TA;Lo;;;;;;
22322+
113A2;TULU-TIGALARI LETTER THA;Lo;;;;;;
22323+
113A3;TULU-TIGALARI LETTER DA;Lo;;;;;;
22324+
113A4;TULU-TIGALARI LETTER DHA;Lo;;;;;;
22325+
113A5;TULU-TIGALARI LETTER NA;Lo;;;;;;
22326+
113A6;TULU-TIGALARI LETTER PA;Lo;;;;;;
22327+
113A7;TULU-TIGALARI LETTER PHA;Lo;;;;;;
22328+
113A8;TULU-TIGALARI LETTER BA;Lo;;;;;;
22329+
113A9;TULU-TIGALARI LETTER BHA;Lo;;;;;;
22330+
113AA;TULU-TIGALARI LETTER MA;Lo;;;;;;
22331+
113AB;TULU-TIGALARI LETTER YA;Lo;;;;;;
22332+
113AC;TULU-TIGALARI LETTER RA;Lo;;;;;;
22333+
113D1;TULU-TIGALARI REPHA;Lo;;;;;;
22334+
113AD;TULU-TIGALARI LETTER LA;Lo;;;;;;
22335+
113AE;TULU-TIGALARI LETTER VA;Lo;;;;;;
22336+
113AF;TULU-TIGALARI LETTER SHA;Lo;;;;;;
22337+
113B0;TULU-TIGALARI LETTER SSA;Lo;;;;;;
22338+
113B1;TULU-TIGALARI LETTER SA;Lo;;;;;;
22339+
113B2;TULU-TIGALARI LETTER HA;Lo;;;;;;
22340+
# Note that the traditional order of RRA and LLA is
22341+
# typically RRA < LLA < LLLA, as shown here. That
22342+
# seems to be a deliberate choice specified in the
22343+
# proposal, even though the code point order is LLA < RRA.
22344+
113B4;TULU-TIGALARI LETTER RRA;Lo;;;;;;
22345+
113B3;TULU-TIGALARI LETTER LLA;Lo;;;;;;
22346+
113B5;TULU-TIGALARI LETTER LLLA;Lo;;;;;;
22347+
113B7;TULU-TIGALARI SIGN AVAGRAHA;Lo;;;;;;
22348+
113B8;TULU-TIGALARI VOWEL SIGN AA;Mc;;;;;;
22349+
113B9;TULU-TIGALARI VOWEL SIGN I;Mc;;;;;;
22350+
113BA;TULU-TIGALARI VOWEL SIGN II;Mc;;;;;;
22351+
113BB;TULU-TIGALARI VOWEL SIGN U;Mn;;;;;;
22352+
113BC;TULU-TIGALARI VOWEL SIGN UU;Mn;;;;;;
22353+
113BD;TULU-TIGALARI VOWEL SIGN VOCALIC R;Mn;;;;;;
22354+
113BE;TULU-TIGALARI VOWEL SIGN VOCALIC RR;Mn;;;;;;
22355+
113BF;TULU-TIGALARI VOWEL SIGN VOCALIC L;Mn;;;;;;
22356+
113C0;TULU-TIGALARI VOWEL SIGN VOCALIC LL;Mn;;;;;;
22357+
113C2;TULU-TIGALARI VOWEL SIGN EE;Mc;;;;;;
22358+
22359+
# Tulu-Tigalari two-part vowels collate as units, not
22360+
# by their decompositions.
22361+
22362+
CONTRACTION
22363+
22364+
113C5;TULU-TIGALARI VOWEL SIGN AI;Mc;113C2 113C2;;;;;
22365+
113C7;TULU-TIGALARI VOWEL SIGN OO;Mc;113C2 113B8;;;;;
22366+
113C8;TULU-TIGALARI VOWEL SIGN AU;Mc;113C2 113C9;;;;;
22367+
22368+
DEFAULT
22369+
22370+
113CE;TULU-TIGALARI SIGN VIRAMA;Mn;;;;;;
22371+
113CF;TULU-TIGALARI SIGN LOOPED VIRAMA;Mc;;;;;;
22372+
113D0;TULU-TIGALARI CONJOINER;Mn;;;;;;
22373+
113C9;TULU-TIGALARI AU LENGTH MARK;Mc;;;;;;
22374+
113D3;TULU-TIGALARI SIGN PLUTA;Lo;;;;;;
22375+
2226322376
# Newa script starts here
2226422377

2226522378
11449;NEWA OM;Lo;;;;;;
@@ -25668,6 +25781,41 @@ A9C0;JAVANESE PANGKON;Mc;;;;;;
2566825781
1C7C;OL CHIKI PHAARKAA;Lm;;;;;;
2566925782
1C7D;OL CHIKI AHAD;Lm;;;;;;
2567025783

25784+
# Ol Onal script begins here
25785+
25786+
1E5D0;OL ONAL LETTER O;Lo;;;;;;
25787+
1E5D1;OL ONAL LETTER OM;Lo;;;;;;
25788+
1E5D2;OL ONAL LETTER ONG;Lo;;;;;;
25789+
1E5D3;OL ONAL LETTER ORR;Lo;;;;;;
25790+
1E5D4;OL ONAL LETTER OO;Lo;;;;;;
25791+
1E5D5;OL ONAL LETTER OY;Lo;;;;;;
25792+
1E5D6;OL ONAL LETTER A;Lo;;;;;;
25793+
1E5D7;OL ONAL LETTER AD;Lo;;;;;;
25794+
1E5D8;OL ONAL LETTER AB;Lo;;;;;;
25795+
1E5D9;OL ONAL LETTER AH;Lo;;;;;;
25796+
1E5DA;OL ONAL LETTER AL;Lo;;;;;;
25797+
1E5DB;OL ONAL LETTER AW;Lo;;;;;;
25798+
1E5DC;OL ONAL LETTER I;Lo;;;;;;
25799+
1E5DD;OL ONAL LETTER IT;Lo;;;;;;
25800+
1E5DE;OL ONAL LETTER IP;Lo;;;;;;
25801+
1E5DF;OL ONAL LETTER ITT;Lo;;;;;;
25802+
1E5E0;OL ONAL LETTER ID;Lo;;;;;;
25803+
1E5E1;OL ONAL LETTER IN;Lo;;;;;;
25804+
1E5E2;OL ONAL LETTER U;Lo;;;;;;
25805+
1E5E3;OL ONAL LETTER UK;Lo;;;;;;
25806+
1E5E4;OL ONAL LETTER UDD;Lo;;;;;;
25807+
1E5E5;OL ONAL LETTER UJ;Lo;;;;;;
25808+
1E5E6;OL ONAL LETTER UNY;Lo;;;;;;
25809+
1E5E7;OL ONAL LETTER UR;Lo;;;;;;
25810+
1E5E8;OL ONAL LETTER E;Lo;;;;;;
25811+
1E5E9;OL ONAL LETTER ES;Lo;;;;;;
25812+
1E5EA;OL ONAL LETTER EH;Lo;;;;;;
25813+
1E5EB;OL ONAL LETTER EC;Lo;;;;;;
25814+
1E5EC;OL ONAL LETTER ENN;Lo;;;;;;
25815+
1E5ED;OL ONAL LETTER EG;Lo;;;;;;
25816+
1E5F0;OL ONAL SIGN HODDOND;Lo;;;;;;
25817+
25818+
2567125819
# Cherokee script begins here
2567225820

2567325821
AB70;CHEROKEE SMALL LETTER A;Ll;;;;13A0;;13A0
@@ -33631,6 +33779,78 @@ A4F7;LISU LETTER OE;Lo;;;;;;
3363133779
105BC;VITHKUQI SMALL LETTER ZE;Ll;;;;10595;;10595
3363233780
10595;VITHKUQI CAPITAL LETTER ZE;Lu;;;;;105BC;
3363333781

33782+
# Todhri script begins here
33783+
33784+
105C0;TODHRI LETTER A;Lo;;;;;;
33785+
105C1;TODHRI LETTER AS;Lo;;;;;;
33786+
105C2;TODHRI LETTER BA;Lo;;;;;;
33787+
105C3;TODHRI LETTER MBA;Lo;;;;;;
33788+
105C4;TODHRI LETTER CA;Lo;;;;;;
33789+
105C5;TODHRI LETTER CHA;Lo;;;;;;
33790+
105C6;TODHRI LETTER DA;Lo;;;;;;
33791+
105C7;TODHRI LETTER NDA;Lo;;;;;;
33792+
105C8;TODHRI LETTER DHA;Lo;;;;;;
33793+
33794+
# Two Todhri letters are encoded atomically
33795+
# but have canonical decompositions to another
33796+
# letter plus a dot above. For collation, provide
33797+
# the weighting for the relevant contractions.
33798+
33799+
CONTRACTION
33800+
33801+
105C9;TODHRI LETTER EI;Lo;105D2 0307;;;;;
33802+
33803+
DEFAULT
33804+
33805+
105CA;TODHRI LETTER E;Lo;;;;;;
33806+
105CB;TODHRI LETTER FA;Lo;;;;;;
33807+
105CC;TODHRI LETTER GA;Lo;;;;;;
33808+
105CD;TODHRI LETTER NGA;Lo;;;;;;
33809+
105CE;TODHRI LETTER GJA;Lo;;;;;;
33810+
105CF;TODHRI LETTER NGJA;Lo;;;;;;
33811+
105D0;TODHRI LETTER HA;Lo;;;;;;
33812+
105D1;TODHRI LETTER HJA;Lo;;;;;;
33813+
105D2;TODHRI LETTER I;Lo;;;;;;
33814+
105D3;TODHRI LETTER JA;Lo;;;;;;
33815+
105D4;TODHRI LETTER KA;Lo;;;;;;
33816+
105D5;TODHRI LETTER LA;Lo;;;;;;
33817+
105D6;TODHRI LETTER LLA;Lo;;;;;;
33818+
105D7;TODHRI LETTER MA;Lo;;;;;;
33819+
105D8;TODHRI LETTER NA;Lo;;;;;;
33820+
105D9;TODHRI LETTER NJAN;Lo;;;;;;
33821+
105DA;TODHRI LETTER O;Lo;;;;;;
33822+
105DB;TODHRI LETTER PA;Lo;;;;;;
33823+
105DC;TODHRI LETTER QA;Lo;;;;;;
33824+
105DD;TODHRI LETTER RA;Lo;;;;;;
33825+
105DE;TODHRI LETTER RRA;Lo;;;;;;
33826+
105DF;TODHRI LETTER SA;Lo;;;;;;
33827+
105E0;TODHRI LETTER SHA;Lo;;;;;;
33828+
105E1;TODHRI LETTER SHTA;Lo;;;;;;
33829+
105E2;TODHRI LETTER TA;Lo;;;;;;
33830+
105E3;TODHRI LETTER THA;Lo;;;;;;
33831+
33832+
CONTRACTION
33833+
33834+
105E4;TODHRI LETTER U;Lo;105DA 0307;;;;;
33835+
33836+
DEFAULT
33837+
33838+
105E5;TODHRI LETTER VA;Lo;;;;;;
33839+
105E6;TODHRI LETTER XA;Lo;;;;;;
33840+
105E7;TODHRI LETTER NXA;Lo;;;;;;
33841+
105E8;TODHRI LETTER XHA;Lo;;;;;;
33842+
105E9;TODHRI LETTER NXHA;Lo;;;;;;
33843+
105EA;TODHRI LETTER Y;Lo;;;;;;
33844+
105EB;TODHRI LETTER JY;Lo;;;;;;
33845+
105EC;TODHRI LETTER ZA;Lo;;;;;;
33846+
105ED;TODHRI LETTER ZHA;Lo;;;;;;
33847+
105EE;TODHRI LETTER GHA;Lo;;;;;;
33848+
105EF;TODHRI LETTER STA;Lo;;;;;;
33849+
105F0;TODHRI LETTER SKAN;Lo;;;;;;
33850+
105F1;TODHRI LETTER KHA;Lo;;;;;;
33851+
105F2;TODHRI LETTER PSA;Lo;;;;;;
33852+
105F3;TODHRI LETTER OO;Lo;;;;;;
33853+
3363433854
# Sora Sompeng script begins here
3363533855

3363633856
110D0;SORA SOMPENG LETTER SAH;Lo;;;;;;
@@ -33775,6 +33995,108 @@ A4F7;LISU LETTER OE;Lo;;;;;;
3377533995
16ABD;TANGSA LETTER CHA;Lo;;;;;;
3377633996
16ABE;TANGSA LETTER ZA;Lo;;;;;;
3377733997

33998+
# Sunuwar script starts here
33999+
34000+
11BC0;SUNUWAR LETTER DEVI;Lo;;;;;;
34001+
11BC1;SUNUWAR LETTER TASLA;Lo;;;;;;
34002+
11BC2;SUNUWAR LETTER EKO;Lo;;;;;;
34003+
11BC3;SUNUWAR LETTER IMAR;Lo;;;;;;
34004+
11BC4;SUNUWAR LETTER REU;Lo;;;;;;
34005+
11BC5;SUNUWAR LETTER UTTHI;Lo;;;;;;
34006+
11BC6;SUNUWAR LETTER KIK;Lo;;;;;;
34007+
11BC7;SUNUWAR LETTER MA;Lo;;;;;;
34008+
11BC8;SUNUWAR LETTER APPHO;Lo;;;;;;
34009+
11BC9;SUNUWAR LETTER PIP;Lo;;;;;;
34010+
11BCA;SUNUWAR LETTER GIL;Lo;;;;;;
34011+
11BCB;SUNUWAR LETTER HAMSO;Lo;;;;;;
34012+
11BCC;SUNUWAR LETTER CARMI;Lo;;;;;;
34013+
11BCD;SUNUWAR LETTER NAH;Lo;;;;;;
34014+
11BCE;SUNUWAR LETTER BUR;Lo;;;;;;
34015+
11BCF;SUNUWAR LETTER JYAH;Lo;;;;;;
34016+
11BD0;SUNUWAR LETTER LOACHA;Lo;;;;;;
34017+
11BD1;SUNUWAR LETTER OTTHI;Lo;;;;;;
34018+
11BD2;SUNUWAR LETTER SHYELE;Lo;;;;;;
34019+
11BD3;SUNUWAR LETTER VARCA;Lo;;;;;;
34020+
11BD4;SUNUWAR LETTER YAT;Lo;;;;;;
34021+
11BD5;SUNUWAR LETTER AVA;Lo;;;;;;
34022+
11BD6;SUNUWAR LETTER AAL;Lo;;;;;;
34023+
11BD7;SUNUWAR LETTER DONGA;Lo;;;;;;
34024+
11BD8;SUNUWAR LETTER THARI;Lo;;;;;;
34025+
11BD9;SUNUWAR LETTER PHAR;Lo;;;;;;
34026+
11BDA;SUNUWAR LETTER NGAR;Lo;;;;;;
34027+
11BDB;SUNUWAR LETTER KHA;Lo;;;;;;
34028+
11BDC;SUNUWAR LETTER SHYER;Lo;;;;;;
34029+
11BDD;SUNUWAR LETTER CHELAP;Lo;;;;;;
34030+
11BDE;SUNUWAR LETTER TENTU;Lo;;;;;;
34031+
11BDF;SUNUWAR LETTER THELE;Lo;;;;;;
34032+
11BE0;SUNUWAR LETTER KLOKO;Lo;;;;;;
34033+
34034+
# Gurung Khema script starts here
34035+
34036+
16100;GURUNG KHEMA LETTER A;Lo;;;;;;
34037+
16101;GURUNG KHEMA LETTER KA;Lo;;;;;;
34038+
16102;GURUNG KHEMA LETTER KHA;Lo;;;;;;
34039+
16103;GURUNG KHEMA LETTER GA;Lo;;;;;;
34040+
16104;GURUNG KHEMA LETTER GHA;Lo;;;;;;
34041+
16105;GURUNG KHEMA LETTER NGA;Lo;;;;;;
34042+
16106;GURUNG KHEMA LETTER CA;Lo;;;;;;
34043+
16107;GURUNG KHEMA LETTER CHA;Lo;;;;;;
34044+
16108;GURUNG KHEMA LETTER JA;Lo;;;;;;
34045+
16109;GURUNG KHEMA LETTER JHA;Lo;;;;;;
34046+
1610A;GURUNG KHEMA LETTER HA;Lo;;;;;;
34047+
1610B;GURUNG KHEMA LETTER TTA;Lo;;;;;;
34048+
1610C;GURUNG KHEMA LETTER TTHA;Lo;;;;;;
34049+
1610D;GURUNG KHEMA LETTER DDA;Lo;;;;;;
34050+
1610E;GURUNG KHEMA LETTER DDHA;Lo;;;;;;
34051+
1610F;GURUNG KHEMA LETTER VA;Lo;;;;;;
34052+
16110;GURUNG KHEMA LETTER TA;Lo;;;;;;
34053+
16111;GURUNG KHEMA LETTER THA;Lo;;;;;;
34054+
16112;GURUNG KHEMA LETTER DA;Lo;;;;;;
34055+
16113;GURUNG KHEMA LETTER DHA;Lo;;;;;;
34056+
16114;GURUNG KHEMA LETTER NA;Lo;;;;;;
34057+
16115;GURUNG KHEMA LETTER PA;Lo;;;;;;
34058+
16116;GURUNG KHEMA LETTER PHA;Lo;;;;;;
34059+
16117;GURUNG KHEMA LETTER BA;Lo;;;;;;
34060+
16118;GURUNG KHEMA LETTER BHA;Lo;;;;;;
34061+
16119;GURUNG KHEMA LETTER MA;Lo;;;;;;
34062+
1611A;GURUNG KHEMA LETTER YA;Lo;;;;;;
34063+
1611B;GURUNG KHEMA LETTER RA;Lo;;;;;;
34064+
1611C;GURUNG KHEMA LETTER LA;Lo;;;;;;
34065+
1611D;GURUNG KHEMA LETTER SA;Lo;;;;;;
34066+
1611E;GURUNG KHEMA VOWEL SIGN AA;Mn;;;;;;
34067+
1611F;GURUNG KHEMA VOWEL SIGN I;Mn;;;;;;
34068+
16120;GURUNG KHEMA VOWEL SIGN II;Mn;;;;;;
34069+
34070+
# Gurung Khema two-part and three-part vowels collate as units, not
34071+
# by their decompositions. Unlike most cases of multi-part
34072+
# vowels, Gurung Khema vowels are formed of non-spacing
34073+
# pieces that stack vertically.
34074+
34075+
CONTRACTION
34076+
34077+
16121;GURUNG KHEMA VOWEL SIGN U;Mn;1611E 1611E;;;;;
34078+
16122;GURUNG KHEMA VOWEL SIGN UU;Mn;1611E 16129;;;;;
34079+
16123;GURUNG KHEMA VOWEL SIGN E;Mn;1611E 1611F;;;;;
34080+
16124;GURUNG KHEMA VOWEL SIGN EE;Mn;16129 1611F;;;;;
34081+
16125;GURUNG KHEMA VOWEL SIGN AI;Mn;1611E 16120;;;;;
34082+
# The three-part vowels need to have secondary
34083+
# decompositions added, for canonical closure.
34084+
# 16126;GURUNG KHEMA VOWEL SIGN O;Mn;16121 1611F;;;;;
34085+
# 16127;GURUNG KHEMA VOWEL SIGN OO;Mn;16122 1611F;;;;;
34086+
# 16128;GURUNG KHEMA VOWEL SIGN AU;Mn;16121 16120;;;;;
34087+
16126;GURUNG KHEMA VOWEL SIGN O;Mn;16121 1611F, 1611E 1611E 1611F;;;;;
34088+
16127;GURUNG KHEMA VOWEL SIGN OO;Mn;16122 1611F, 1611E 16129 1611F;;;;;
34089+
16128;GURUNG KHEMA VOWEL SIGN AU;Mn;16121 16120, 1611E 1611E 16120;;;;;
34090+
34091+
DEFAULT
34092+
34093+
16129;GURUNG KHEMA VOWEL LENGTH MARK;Mn;;;;;;
34094+
1612A;GURUNG KHEMA CONSONANT SIGN MEDIAL YA;Mc;;;;;;
34095+
1612B;GURUNG KHEMA CONSONANT SIGN MEDIAL VA;Mc;;;;;;
34096+
1612C;GURUNG KHEMA CONSONANT SIGN MEDIAL HA;Mc;;;;;;
34097+
1612E;GURUNG KHEMA CONSONANT SIGN MEDIAL RA;Mn;;;;;;
34098+
1612F;GURUNG KHEMA SIGN THOLHOMA;Mn;;;;;;
34099+
3377834100
# Kirat Rai script starts here
3377934101

3378034102
16D40;KIRAT RAI SIGN ANUSVARA;Lm;;;;;;

0 commit comments

Comments
 (0)