Skip to content

Commit 3d978f4

Browse files
eggrobinbabelstone
andauthored
Jurchen (#889)
* UnicodeData.txt lines according to L2/24-139 * UnicodeData.txt lines according to L2/24-140 * lb=ID according to L2/24-139 and L2/24-140 * Jurc * All Jurc for now per unicode-org/sah#256 = L2/24-166 §6 * ShortBlockNames * Blocks * Ideographic * ea=W * vo=U though the proposals forgot that * Regenerate UCD * GenerateEnums * A new Ideographic script. * 0-padding * Regenerate UCD * JurchenSources.txt from Andrew Co-authored-by: babelstone <[email protected]> * Add JurchenSources to IndexUnicodeProperties and ExtraProperty(Value)Aliases * GenerateEnums * A test * The other two properties too * also test the radicals. * Rename the Jurchen properties * GenerateEnums * Pick up the updated JurchenSources. * UTC-184-C7 renaming * Regenerate UCD * Normative sources * Regenerate UCD * Update the names in the tests * GenerateEnums --------- Co-authored-by: babelstone <[email protected]>
1 parent 0ec4052 commit 3d978f4

29 files changed

+3897
-38
lines changed

unicodetools/data/ucd/dev/Blocks.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,8 @@ FFF0..FFFF; Specials
317317
18B00..18CFF; Khitan Small Script
318318
18D00..18D7F; Tangut Supplement
319319
18D80..18DFF; Tangut Components Supplement
320+
18E00..1919F; Jurchen
321+
191A0..191DF; Jurchen Radicals
320322
1AFF0..1AFFF; Kana Extended-B
321323
1B000..1B0FF; Kana Supplement
322324
1B100..1B12F; Kana Extended-A

unicodetools/data/ucd/dev/DerivedAge.txt

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# DerivedAge-18.0.0.txt
2-
# Date: 2025-11-27, 17:49:12 GMT
2+
# Date: 2025-11-27, 18:35:12 GMT
33
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -2142,6 +2142,8 @@ FDC8..FDCE ; 17.0 # [7] ARABIC LIGATURE RAHIMAHU ALLAAH TAAALAA..ARABIC LIG
21422142
16DA0..16DA9 ; 18.0 # [10] CHISOI DIGIT ZERO..CHISOI DIGIT NINE
21432143
18CD6..18CDA ; 18.0 # [5] KHITAN SMALL SCRIPT CHARACTER-18CD6..KHITAN SMALL SCRIPT CHARACTER-18CDA
21442144
18D1F..18D20 ; 18.0 # [2] TANGUT IDEOGRAPH-18D1F..TANGUT IDEOGRAPH-18D20
2145+
18E00..19191 ; 18.0 # [914] JURCHEN CHARACTER-18E00..JURCHEN CHARACTER-19191
2146+
191A0..191D2 ; 18.0 # [51] JURCHEN RADICAL-01..JURCHEN RADICAL-51
21452147
1DF1F..1DF24 ; 18.0 # [6] LATIN SMALL LETTER D-ETH DIGRAPH..LATIN SMALL LETTER T-THETA DIGRAPH
21462148
1DF2B..1DF56 ; 18.0 # [44] LATIN SMALL LETTER DEZH DIGRAPH WITH CURL..LATIN LETTER GLOTTAL STOP WITH DOUBLE STROKE
21472149
1DFD0..1DFFF ; 18.0 # [48] LATIN SUBSCRIPT SMALL LETTER GAMMA..MODIFIER LETTER SMALL T WITH HOOK AND RETROFLEX HOOK
@@ -2150,6 +2152,6 @@ FDC8..FDCE ; 17.0 # [7] ARABIC LIGATURE RAHIMAHU ALLAAH TAAALAA..ARABIC LIG
21502152
2B81E ; 18.0 # CJK UNIFIED IDEOGRAPH-2B81E
21512153
3D000..3FC3F ; 18.0 # [11328] SEAL CHARACTER-3D000..SEAL CHARACTER-3FC3F
21522154

2153-
# Total code points: 11860
2155+
# Total code points: 12825
21542156

21552157
# EOF

unicodetools/data/ucd/dev/DerivedCoreProperties.txt

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# DerivedCoreProperties-18.0.0.txt
2-
# Date: 2025-11-27, 17:49:36 GMT
2+
# Date: 2025-11-27, 18:35:35 GMT
33
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -1347,6 +1347,8 @@ FFDA..FFDC ; Alphabetic # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANG
13471347
17000..18CDA ; Alphabetic # Lo [7387] TANGUT IDEOGRAPH-17000..KHITAN SMALL SCRIPT CHARACTER-18CDA
13481348
18CFF..18D20 ; Alphabetic # Lo [34] KHITAN SMALL SCRIPT CHARACTER-18CFF..TANGUT IDEOGRAPH-18D20
13491349
18D80..18DF2 ; Alphabetic # Lo [115] TANGUT COMPONENT-769..TANGUT COMPONENT-883
1350+
18E00..19191 ; Alphabetic # Lo [914] JURCHEN CHARACTER-18E00..JURCHEN CHARACTER-19191
1351+
191A0..191D2 ; Alphabetic # Lo [51] JURCHEN RADICAL-01..JURCHEN RADICAL-51
13501352
1AFF0..1AFF3 ; Alphabetic # Lm [4] KATAKANA LETTER MINNAN TONE-2..KATAKANA LETTER MINNAN TONE-5
13511353
1AFF5..1AFFB ; Alphabetic # Lm [7] KATAKANA LETTER MINNAN TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-5
13521354
1AFFD..1AFFE ; Alphabetic # Lm [2] KATAKANA LETTER MINNAN NASALIZED TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-8
@@ -1476,7 +1478,7 @@ FFDA..FFDC ; Alphabetic # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANG
14761478
31350..33479 ; Alphabetic # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
14771479
3D000..3FC3F ; Alphabetic # Lo [11328] SEAL CHARACTER-3D000..SEAL CHARACTER-3FC3F
14781480

1479-
# Total code points: 159244
1481+
# Total code points: 160209
14801482

14811483
# ================================================
14821484

@@ -6984,6 +6986,8 @@ FFDA..FFDC ; ID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL
69846986
17000..18CDA ; ID_Start # Lo [7387] TANGUT IDEOGRAPH-17000..KHITAN SMALL SCRIPT CHARACTER-18CDA
69856987
18CFF..18D20 ; ID_Start # Lo [34] KHITAN SMALL SCRIPT CHARACTER-18CFF..TANGUT IDEOGRAPH-18D20
69866988
18D80..18DF2 ; ID_Start # Lo [115] TANGUT COMPONENT-769..TANGUT COMPONENT-883
6989+
18E00..19191 ; ID_Start # Lo [914] JURCHEN CHARACTER-18E00..JURCHEN CHARACTER-19191
6990+
191A0..191D2 ; ID_Start # Lo [51] JURCHEN RADICAL-01..JURCHEN RADICAL-51
69876991
1AFF0..1AFF3 ; ID_Start # Lm [4] KATAKANA LETTER MINNAN TONE-2..KATAKANA LETTER MINNAN TONE-5
69886992
1AFF5..1AFFB ; ID_Start # Lm [7] KATAKANA LETTER MINNAN TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-5
69896993
1AFFD..1AFFE ; ID_Start # Lm [2] KATAKANA LETTER MINNAN NASALIZED TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-8
@@ -7098,7 +7102,7 @@ FFDA..FFDC ; ID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL
70987102
31350..33479 ; ID_Start # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
70997103
3D000..3FC3F ; ID_Start # Lo [11328] SEAL CHARACTER-3D000..SEAL CHARACTER-3FC3F
71007104

7101-
# Total code points: 157736
7105+
# Total code points: 158701
71027106

71037107
# ================================================
71047108

@@ -8386,6 +8390,8 @@ FFDA..FFDC ; ID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HAN
83868390
17000..18CDA ; ID_Continue # Lo [7387] TANGUT IDEOGRAPH-17000..KHITAN SMALL SCRIPT CHARACTER-18CDA
83878391
18CFF..18D20 ; ID_Continue # Lo [34] KHITAN SMALL SCRIPT CHARACTER-18CFF..TANGUT IDEOGRAPH-18D20
83888392
18D80..18DF2 ; ID_Continue # Lo [115] TANGUT COMPONENT-769..TANGUT COMPONENT-883
8393+
18E00..19191 ; ID_Continue # Lo [914] JURCHEN CHARACTER-18E00..JURCHEN CHARACTER-19191
8394+
191A0..191D2 ; ID_Continue # Lo [51] JURCHEN RADICAL-01..JURCHEN RADICAL-51
83898395
1AFF0..1AFF3 ; ID_Continue # Lm [4] KATAKANA LETTER MINNAN TONE-2..KATAKANA LETTER MINNAN TONE-5
83908396
1AFF5..1AFFB ; ID_Continue # Lm [7] KATAKANA LETTER MINNAN TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-5
83918397
1AFFD..1AFFE ; ID_Continue # Lm [2] KATAKANA LETTER MINNAN NASALIZED TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-8
@@ -8542,7 +8548,7 @@ FFDA..FFDC ; ID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HAN
85428548
3D000..3FC3F ; ID_Continue # Lo [11328] SEAL CHARACTER-3D000..SEAL CHARACTER-3FC3F
85438549
E0100..E01EF ; ID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
85448550

8545-
# Total code points: 161082
8551+
# Total code points: 162047
85468552

85478553
# ================================================
85488554

@@ -9227,6 +9233,8 @@ FFDA..FFDC ; XID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGU
92279233
17000..18CDA ; XID_Start # Lo [7387] TANGUT IDEOGRAPH-17000..KHITAN SMALL SCRIPT CHARACTER-18CDA
92289234
18CFF..18D20 ; XID_Start # Lo [34] KHITAN SMALL SCRIPT CHARACTER-18CFF..TANGUT IDEOGRAPH-18D20
92299235
18D80..18DF2 ; XID_Start # Lo [115] TANGUT COMPONENT-769..TANGUT COMPONENT-883
9236+
18E00..19191 ; XID_Start # Lo [914] JURCHEN CHARACTER-18E00..JURCHEN CHARACTER-19191
9237+
191A0..191D2 ; XID_Start # Lo [51] JURCHEN RADICAL-01..JURCHEN RADICAL-51
92309238
1AFF0..1AFF3 ; XID_Start # Lm [4] KATAKANA LETTER MINNAN TONE-2..KATAKANA LETTER MINNAN TONE-5
92319239
1AFF5..1AFFB ; XID_Start # Lm [7] KATAKANA LETTER MINNAN TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-5
92329240
1AFFD..1AFFE ; XID_Start # Lm [2] KATAKANA LETTER MINNAN NASALIZED TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-8
@@ -9341,7 +9349,7 @@ FFDA..FFDC ; XID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGU
93419349
31350..33479 ; XID_Start # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
93429350
3D000..3FC3F ; XID_Start # Lo [11328] SEAL CHARACTER-3D000..SEAL CHARACTER-3FC3F
93439351

9344-
# Total code points: 157713
9352+
# Total code points: 158678
93459353

93469354
# ================================================
93479355

@@ -10630,6 +10638,8 @@ FFDA..FFDC ; XID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HA
1063010638
17000..18CDA ; XID_Continue # Lo [7387] TANGUT IDEOGRAPH-17000..KHITAN SMALL SCRIPT CHARACTER-18CDA
1063110639
18CFF..18D20 ; XID_Continue # Lo [34] KHITAN SMALL SCRIPT CHARACTER-18CFF..TANGUT IDEOGRAPH-18D20
1063210640
18D80..18DF2 ; XID_Continue # Lo [115] TANGUT COMPONENT-769..TANGUT COMPONENT-883
10641+
18E00..19191 ; XID_Continue # Lo [914] JURCHEN CHARACTER-18E00..JURCHEN CHARACTER-19191
10642+
191A0..191D2 ; XID_Continue # Lo [51] JURCHEN RADICAL-01..JURCHEN RADICAL-51
1063310643
1AFF0..1AFF3 ; XID_Continue # Lm [4] KATAKANA LETTER MINNAN TONE-2..KATAKANA LETTER MINNAN TONE-5
1063410644
1AFF5..1AFFB ; XID_Continue # Lm [7] KATAKANA LETTER MINNAN TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-5
1063510645
1AFFD..1AFFE ; XID_Continue # Lm [2] KATAKANA LETTER MINNAN NASALIZED TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-8
@@ -10786,7 +10796,7 @@ FFDA..FFDC ; XID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HA
1078610796
3D000..3FC3F ; XID_Continue # Lo [11328] SEAL CHARACTER-3D000..SEAL CHARACTER-3FC3F
1078710797
E0100..E01EF ; XID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
1078810798

10789-
# Total code points: 161063
10799+
# Total code points: 162028
1079010800

1079110801
# ================================================
1079210802

@@ -12871,6 +12881,8 @@ FFFC..FFFD ; Grapheme_Base # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEME
1287112881
17000..18CDA ; Grapheme_Base # Lo [7387] TANGUT IDEOGRAPH-17000..KHITAN SMALL SCRIPT CHARACTER-18CDA
1287212882
18CFF..18D20 ; Grapheme_Base # Lo [34] KHITAN SMALL SCRIPT CHARACTER-18CFF..TANGUT IDEOGRAPH-18D20
1287312883
18D80..18DF2 ; Grapheme_Base # Lo [115] TANGUT COMPONENT-769..TANGUT COMPONENT-883
12884+
18E00..19191 ; Grapheme_Base # Lo [914] JURCHEN CHARACTER-18E00..JURCHEN CHARACTER-19191
12885+
191A0..191D2 ; Grapheme_Base # Lo [51] JURCHEN RADICAL-01..JURCHEN RADICAL-51
1287412886
1AFF0..1AFF3 ; Grapheme_Base # Lm [4] KATAKANA LETTER MINNAN TONE-2..KATAKANA LETTER MINNAN TONE-5
1287512887
1AFF5..1AFFB ; Grapheme_Base # Lm [7] KATAKANA LETTER MINNAN TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-5
1287612888
1AFFD..1AFFE ; Grapheme_Base # Lm [2] KATAKANA LETTER MINNAN NASALIZED TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-8
@@ -13086,7 +13098,7 @@ FFFC..FFFD ; Grapheme_Base # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEME
1308613098
31350..33479 ; Grapheme_Base # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
1308713099
3D000..3FC3F ; Grapheme_Base # Lo [11328] SEAL CHARACTER-3D000..SEAL CHARACTER-3FC3F
1308813100

13089-
# Total code points: 169342
13101+
# Total code points: 170307
1309013102

1309113103
# ================================================
1309213104

unicodetools/data/ucd/dev/EastAsianWidth.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# EastAsianWidth-18.0.0.txt
2-
# Date: 2025-11-27, 17:49:44 GMT
2+
# Date: 2025-11-27, 18:35:42 GMT
33
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -2394,6 +2394,8 @@ FFFD ; A # So REPLACEMENT CHARACTER
23942394
18CFF ; W # Lo KHITAN SMALL SCRIPT CHARACTER-18CFF
23952395
18D00..18D20 ; W # Lo [33] TANGUT IDEOGRAPH-18D00..TANGUT IDEOGRAPH-18D20
23962396
18D80..18DF2 ; W # Lo [115] TANGUT COMPONENT-769..TANGUT COMPONENT-883
2397+
18E00..19191 ; W # Lo [914] JURCHEN CHARACTER-18E00..JURCHEN CHARACTER-19191
2398+
191A0..191D2 ; W # Lo [51] JURCHEN RADICAL-01..JURCHEN RADICAL-51
23972399
1AFF0..1AFF3 ; W # Lm [4] KATAKANA LETTER MINNAN TONE-2..KATAKANA LETTER MINNAN TONE-5
23982400
1AFF5..1AFFB ; W # Lm [7] KATAKANA LETTER MINNAN TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-5
23992401
1AFFD..1AFFE ; W # Lm [2] KATAKANA LETTER MINNAN NASALIZED TONE-7..KATAKANA LETTER MINNAN NASALIZED TONE-8

0 commit comments

Comments
 (0)