Skip to content

Commit 1b1899d

Browse files
committed
UCA 16.0 delta 8
From Ken: 1. Garay I've saved Garay for a separate delta, because it is the only new bicameral script in the bunch, and that introduces intercalation complications for it in unidata.txt. It also has a sukun, a gemination mark, and a reduplication mark. The proposal implies it needs a syllabic ordering, rather than a simpler ordering. It also has a couple variant letters, which need special handling. Garay goes into unidata.txt after Medefaidrin, another West African bicameral script and before Adlam, another West African bicameral script. The first step is to move all the capital and small letters into unidata.txt. Then I rearrange them into case pairs, in the same manner as for Medefaidrin and Adlam. Note that this rearrangement is not strictly necessary to get weight assignments correct for the case pairs, but it is better to keep maintaining new bicameral scripts in the same way as for existing ones already in unidata.txt. This consistency helps in understanding what is going on. Next, deal with OLD KA (10D64/10D84) and OLD NA (10D65/10D85). These are claimed to just variant forms of KA and NA, respectively, and are claimed explicitly to sort equal to them. I merge them into the case pair bundles for KA and NA, with a <sort> decomposition to make them sort similarly to KA and NA, but with a case-distinguished tertiary weight difference. It is completely unclear how to weight the vowels. There is an extensive discussion of collation in L2/22-048, but the main upshot of that seems to be that collation is conceived of in terms of a syllabic grid, rather than in terms of the actual string used to represent each node of the syllabic grid. I default to putting the vowels in code point order ahead of all the consonants. Any attempt to implement the actual syllabic ordering would require an extensive tailoring. Just putting the vowels first, in roughly the order specified for vowels in syllables, would seem to suffice for the default ordering in DUCET. The gemination mark (10D6A) will get a script-specific secondary, so I defer that for now. Regenerate allkeys.txt, and verify that the Garay weights are as expected, including all the case pairs and the equivalents for the two pairs of variant letters. 51 more down, 720 to go. Archive this delta 8: unidata-16.0.0d8.txt (1572176 bytes, 10/08/2023)
1 parent 5eaa3a8 commit 1b1899d

File tree

2 files changed

+8289
-8160
lines changed

2 files changed

+8289
-8160
lines changed

c/uca/sifter/unidata.txt

Lines changed: 79 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
# Default Unicode Collation Element Table (DUCET) for
1010
# the Unicode Collation Algorithm.
1111
#
12-
# Version 16.0.0 draft 7 (Unicode Version: 16.0.0)
12+
# Version 16.0.0 draft 8 (Unicode Version: 16.0.0)
1313
# based on Unicode data file UnicodeData-16.0.0d7.txt
1414
# Ordering for Unicode 16.0
1515
#
@@ -28538,6 +28538,84 @@ A6EF;BAMUM LETTER KOGHOM;Nl;;0;;;;
2853828538
16E5F;MEDEFAIDRIN CAPITAL LETTER Y;Lu;;;;;16E7F;
2853928539
16E7F;MEDEFAIDRIN SMALL LETTER Y;Ll;;;;16E5F;;16E5F
2854028540

28541+
# Garay script begins here
28542+
28543+
10D4A;GARAY VOWEL SIGN A;Lo;;;;;;
28544+
10D4B;GARAY VOWEL SIGN I;Lo;;;;;;
28545+
10D4C;GARAY VOWEL SIGN O;Lo;;;;;;
28546+
10D4D;GARAY VOWEL SIGN EE;Lo;;;;;;
28547+
10D4E;GARAY VOWEL LENGTH MARK;Lo;;;;;;
28548+
10D4F;GARAY SUKUN;Lo;;;;;;
28549+
10D69;GARAY VOWEL SIGN E;Mn;;;;;;
28550+
28551+
10D70;GARAY SMALL LETTER A;Ll;;;;10D50;;10D50
28552+
10D50;GARAY CAPITAL LETTER A;Lu;;;;;10D70;
28553+
28554+
10D71;GARAY SMALL LETTER CA;Ll;;;;10D51;;10D51
28555+
10D51;GARAY CAPITAL LETTER CA;Lu;;;;;10D71;
28556+
28557+
10D72;GARAY SMALL LETTER MA;Ll;;;;10D52;;10D52
28558+
10D52;GARAY CAPITAL LETTER MA;Lu;;;;;10D72;
28559+
28560+
# The variant old forms of KA are artificially equated to KA.
28561+
28562+
10D73;GARAY SMALL LETTER KA;Ll;;;;10D53;;10D53
28563+
10D84;GARAY SMALL LETTER OLD KA;Ll;<sort> 10D73;;;10D64;;10D64
28564+
10D53;GARAY CAPITAL LETTER KA;Lu;;;;;10D73;
28565+
10D64;GARAY CAPITAL LETTER OLD KA;Lu;<sort> 10D53;;;;10D84;
28566+
28567+
10D74;GARAY SMALL LETTER BA;Ll;;;;10D54;;10D54
28568+
10D54;GARAY CAPITAL LETTER BA;Lu;;;;;10D74;
28569+
28570+
10D75;GARAY SMALL LETTER JA;Ll;;;;10D55;;10D55
28571+
10D55;GARAY CAPITAL LETTER JA;Lu;;;;;10D75;
28572+
28573+
10D76;GARAY SMALL LETTER SA;Ll;;;;10D56;;10D56
28574+
10D56;GARAY CAPITAL LETTER SA;Lu;;;;;10D76;
28575+
28576+
10D77;GARAY SMALL LETTER WA;Ll;;;;10D57;;10D57
28577+
10D57;GARAY CAPITAL LETTER WA;Lu;;;;;10D77;
28578+
28579+
10D78;GARAY SMALL LETTER LA;Ll;;;;10D58;;10D58
28580+
10D58;GARAY CAPITAL LETTER LA;Lu;;;;;10D78;
28581+
28582+
10D79;GARAY SMALL LETTER GA;Ll;;;;10D59;;10D59
28583+
10D59;GARAY CAPITAL LETTER GA;Lu;;;;;10D79;
28584+
28585+
10D7A;GARAY SMALL LETTER DA;Ll;;;;10D5A;;10D5A
28586+
10D5A;GARAY CAPITAL LETTER DA;Lu;;;;;10D7A;
28587+
28588+
10D7B;GARAY SMALL LETTER XA;Ll;;;;10D5B;;10D5B
28589+
10D5B;GARAY CAPITAL LETTER XA;Lu;;;;;10D7B;
28590+
28591+
10D7C;GARAY SMALL LETTER YA;Ll;;;;10D5C;;10D5C
28592+
10D5C;GARAY CAPITAL LETTER YA;Lu;;;;;10D7C;
28593+
28594+
10D7D;GARAY SMALL LETTER TA;Ll;;;;10D5D;;10D5D
28595+
10D5D;GARAY CAPITAL LETTER TA;Lu;;;;;10D7D;
28596+
28597+
10D7E;GARAY SMALL LETTER RA;Ll;;;;10D5E;;10D5E
28598+
10D5E;GARAY CAPITAL LETTER RA;Lu;;;;;10D7E;
28599+
28600+
10D7F;GARAY SMALL LETTER NYA;Ll;;;;10D5F;;10D5F
28601+
10D5F;GARAY CAPITAL LETTER NYA;Lu;;;;;10D7F;
28602+
28603+
10D80;GARAY SMALL LETTER FA;Ll;;;;10D60;;10D60
28604+
10D60;GARAY CAPITAL LETTER FA;Lu;;;;;10D80;
28605+
28606+
# The variant old forms of NA are artificially equated to NA.
28607+
28608+
10D81;GARAY SMALL LETTER NA;Ll;;;;10D61;;10D61
28609+
10D85;GARAY SMALL LETTER OLD NA;Ll;<sort> 10D81;;;10D65;;10D65
28610+
10D61;GARAY CAPITAL LETTER NA;Lu;;;;;10D81;
28611+
10D65;GARAY CAPITAL LETTER OLD NA;Lu;<sort> 10D61;;;;10D85;
28612+
28613+
10D82;GARAY SMALL LETTER PA;Ll;;;;10D62;;10D62
28614+
10D62;GARAY CAPITAL LETTER PA;Lu;;;;;10D82;
28615+
28616+
10D83;GARAY SMALL LETTER HA;Ll;;;;10D63;;10D63
28617+
10D63;GARAY CAPITAL LETTER HA;Lu;;;;;10D83;
28618+
2854128619
# Adlam script begins here
2854228620

2854328621
1E922;ADLAM SMALL LETTER ALIF;Ll;;;;1E900;;1E900

0 commit comments

Comments
 (0)