Skip to content

Commit 7851aac

Browse files
committed
Sifter updates for 17.0
From Ken on feb18: The changes to the source code are minor -- just the usual updates for new repertoire and date, plus some tweaks for the split for Tangut implicit weighting. I had to explicitly touch cttfooter.txt for the Tangut changes and for the addition of Extension J, as well as the range for Extension C. From Ken on feb19: These were to fix the output for the CTT to include a time stamp on the generation and to include the CTT table name *inside* the generated file. The output now is just named "ctt.txt". From Ken on apr24: I've made the small updates to the sifter code and to the CTT template text for the changes in CJK ranges.
1 parent d2aa67b commit 7851aac

File tree

3 files changed

+155
-67
lines changed

3 files changed

+155
-67
lines changed

c/uca/sifter/cttfooter.txt

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,40 +13,50 @@
1313
% Weights for unified Han characters follow the Unified Repertoire and
1414
% Ordering, which is a language-neutral, traditional radical-stroke order.
1515

16-
% The original URO and Extensions A through I, plus the 12 unified Han characters
16+
% The original URO and Extensions A through J, plus the 12 unified Han characters
1717
% in the CJK compatibility area are weighted implicitly as defined here.
1818

1919
% WEIGHT_BASE = 0xFB40 for original URO and 12 unified Han from CJK compat area.
2020
% cp >= 0x04E00 && cp <= 0x09FFF % URO
2121
% WEIGHT_BASE = 0xFB80 for Extension A through Extension I Han characters.
2222
% cp >= 0x03400 && cp <= 0x04DBF % Ext. A
2323
% cp >= 0x20000 && cp <= 0x2A6DF % Ext. B
24-
% cp >= 0x2A700 && cp <= 0x2B739 % Ext. C
24+
% cp >= 0x2A700 && cp <= 0x2B73F % Ext. C
2525
% cp >= 0x2B740 && cp <= 0x2B81D % Ext. D
26-
% cp >= 0x2B820 && cp <= 0x2CEA1 % Ext. E
26+
% cp >= 0x2B820 && cp <= 0x2CEAD % Ext. E
2727
% cp >= 0x2CEB0 && cp <= 0x2EBE0 % Ext. F
2828
% cp >= 0x2EBF0 && cp <= 0x2EE5D % Ext. I
2929
% cp >= 0x30000 && cp <= 0x3134A % Ext. G
3030
% cp >= 0x31350 && cp <= 0x323AF % Ext. H
31+
% cp >= 0x323B0 && cp <= 0x33479 % Ext. J
3132
% For a given Han character at code point cp:
3233
% base1 = WEIGHT_BASE + ( cp >> 15 )
3334
% base2 = ( cp & 0x7FFF ) | 0x8000
3435
% Then weight the character as: <U{cp}> "<R{base1}><T{base2}>";<BASE>;<MIN>;<SFFFF>
3536

36-
% Tangut ideographic and component characters are weighted implicitly as defined here.
37+
% Tangut ideographic characters are weighted implicitly as defined here.
3738

3839
% WEIGHT_BASE = 0xFB00
39-
% cp >= 0x17000 && cp <= 0x187F7 % Tangut ideographs
40+
% cp >= 0x17000 && cp <= 0x187FF % Tangut ideographs
41+
% cp >= 0x18D00 && cp <= 0x18D1E % Tangut ideograph supplement
42+
% For a given Tangut character at code point cp:
43+
% base1 = WEIGHT_BASE
44+
% base2 = ( cp - 0x17000 ) | 0x8000
45+
% Then weight the character as: <U{cp}> "<R{base1}><T{base2}>";<BASE>;<MIN>;<SFFFF>
46+
47+
% Tangut component characters are weighted implicitly as defined here.
48+
49+
% WEIGHT_BASE = 0xFB01
4050
% cp >= 0x18800 && cp <= 0x18AFF % Tangut components
41-
% cp >= 0x18D00 && cp <= 0x18D08 % Tangut ideograph supplement
51+
% cp >= 0x18D80 && cp <= 0x18DFF % Tangut component supplement
4252
% For a given Tangut character at code point cp:
4353
% base1 = WEIGHT_BASE
4454
% base2 = ( cp - 0x17000 ) | 0x8000
4555
% Then weight the character as: <U{cp}> "<R{base1}><T{base2}>";<BASE>;<MIN>;<SFFFF>
4656

4757
% Nushu ideographic characters are weighted implicitly as defined here.
4858

49-
% WEIGHT_BASE = 0xFB01
59+
% WEIGHT_BASE = 0xFB02
5060
% cp >= 0x1B170 && cp <= 0x1B2FB % Nushu
5161
% For a given Nushu character at code point cp:
5262
% base1 = WEIGHT_BASE
@@ -55,7 +65,7 @@
5565

5666
% Khitan Small Script ideographic characters are weighted implicitly as defined here.
5767

58-
% WEIGHT_BASE = 0xFB02
68+
% WEIGHT_BASE = 0xFB03
5969
% cp >= 0x18B00 && cp <= 0x18CD5 % Khitan Small Script
6070
% For a given Khitan Small Script character at code point cp:
6171
% base1 = WEIGHT_BASE

0 commit comments

Comments
 (0)