Skip to content

Commit b0c8783

Browse files
authored
CJK extension J (#883)
* Java changes for CJK Extension J * UnicodeData.txt lines according to L2/24-165 * Script=Han * Blocks.txt and ShortBlockNames.txt for Extension J * Ideographic, Unified_Ideograph * Do not refer to versions from the future (also fix a typo) * Regenerate UCD * GenerateEnums * Regenerate UCD again * drop 2, for Unicode Version 0x11 * Regenerate UCD * Remove stray 3347B * Regenerate UCD * More remnants * Regenerate UCD
1 parent 603f000 commit b0c8783

22 files changed

+104
-71
lines changed

unicodetools/data/ucd/dev/Blocks.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -367,6 +367,7 @@ FFF0..FFFF; Specials
367367
2F800..2FA1F; CJK Compatibility Ideographs Supplement
368368
30000..3134F; CJK Unified Ideographs Extension G
369369
31350..323AF; CJK Unified Ideographs Extension H
370+
323B0..3347F; CJK Unified Ideographs Extension J
370371
E0000..E007F; Tags
371372
E0100..E01EF; Variation Selectors Supplement
372373
F0000..FFFFF; Supplementary Private Use Area-A

unicodetools/data/ucd/dev/DerivedAge.txt

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# DerivedAge-16.0.0.txt
2-
# Date: 2024-04-30, 21:48:12 GMT
1+
# DerivedAge-17.0.0.txt
2+
# Date: 2024-11-14, 15:19:38 GMT
33
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -2059,4 +2059,14 @@ A7DA..A7DC ; 16.0 # [3] LATIN CAPITAL LETTER LAMBDA..LATIN CAPITAL LETTER L
20592059

20602060
# Total code points: 5185
20612061

2062+
# ================================================
2063+
2064+
# Age=V17_0
2065+
2066+
# Newly assigned in Unicode 17.0.0 (September, 2025)
2067+
2068+
323B0..33479 ; 17.0 # [4298] CJK UNIFIED IDEOGRAPH-323B0..CJK UNIFIED IDEOGRAPH-33479
2069+
2070+
# Total code points: 4298
2071+
20622072
# EOF

unicodetools/data/ucd/dev/DerivedCoreProperties.txt

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# DerivedCoreProperties-16.0.0.txt
2-
# Date: 2024-05-31, 18:09:32 GMT
1+
# DerivedCoreProperties-17.0.0.txt
2+
# Date: 2024-11-14, 15:19:55 GMT
33
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -1439,9 +1439,9 @@ FFDA..FFDC ; Alphabetic # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANG
14391439
2EBF0..2EE5D ; Alphabetic # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
14401440
2F800..2FA1D ; Alphabetic # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
14411441
30000..3134A ; Alphabetic # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
1442-
31350..323AF ; Alphabetic # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
1442+
31350..33479 ; Alphabetic # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
14431443

1444-
# Total code points: 142759
1444+
# Total code points: 147057
14451445

14461446
# ================================================
14471447

@@ -6960,9 +6960,9 @@ FFDA..FFDC ; ID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL
69606960
2EBF0..2EE5D ; ID_Start # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
69616961
2F800..2FA1D ; ID_Start # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
69626962
30000..3134A ; ID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
6963-
31350..323AF ; ID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
6963+
31350..33479 ; ID_Start # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
69646964

6965-
# Total code points: 141269
6965+
# Total code points: 145567
69666966

69676967
# ================================================
69686968

@@ -8367,10 +8367,10 @@ FFDA..FFDC ; ID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HAN
83678367
2EBF0..2EE5D ; ID_Continue # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
83688368
2F800..2FA1D ; ID_Continue # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
83698369
30000..3134A ; ID_Continue # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
8370-
31350..323AF ; ID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
8370+
31350..33479 ; ID_Continue # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
83718371
E0100..E01EF ; ID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
83728372

8373-
# Total code points: 144541
8373+
# Total code points: 148839
83748374

83758375
# ================================================
83768376

@@ -9146,9 +9146,9 @@ FFDA..FFDC ; XID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGU
91469146
2EBF0..2EE5D ; XID_Start # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
91479147
2F800..2FA1D ; XID_Start # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
91489148
30000..3134A ; XID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
9149-
31350..323AF ; XID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
9149+
31350..33479 ; XID_Start # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
91509150

9151-
# Total code points: 141246
9151+
# Total code points: 145544
91529152

91539153
# ================================================
91549154

@@ -10554,10 +10554,10 @@ FFDA..FFDC ; XID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HA
1055410554
2EBF0..2EE5D ; XID_Continue # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
1055510555
2F800..2FA1D ; XID_Continue # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
1055610556
30000..3134A ; XID_Continue # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
10557-
31350..323AF ; XID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
10557+
31350..33479 ; XID_Continue # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
1055810558
E0100..E01EF ; XID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
1055910559

10560-
# Total code points: 144522
10560+
# Total code points: 148820
1056110561

1056210562
# ================================================
1056310563

@@ -12810,9 +12810,9 @@ FFFC..FFFD ; Grapheme_Base # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEME
1281012810
2EBF0..2EE5D ; Grapheme_Base # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
1281112811
2F800..2FA1D ; Grapheme_Base # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
1281212812
30000..3134A ; Grapheme_Base # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
12813-
31350..323AF ; Grapheme_Base # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
12813+
31350..33479 ; Grapheme_Base # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
1281412814

12815-
# Total code points: 152730
12815+
# Total code points: 157028
1281612816

1281712817
# ================================================
1281812818

unicodetools/data/ucd/dev/EastAsianWidth.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# EastAsianWidth-16.0.0.txt
2-
# Date: 2024-04-30, 21:48:20 GMT
1+
# EastAsianWidth-17.0.0.txt
2+
# Date: 2024-11-14, 15:19:59 GMT
33
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -2675,8 +2675,8 @@ FFFD ; A # So REPLACEMENT CHARACTER
26752675
2FA20..2FFFD ; W # Cn [1502] <reserved-2FA20>..<reserved-2FFFD>
26762676
30000..3134A ; W # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
26772677
3134B..3134F ; W # Cn [5] <reserved-3134B>..<reserved-3134F>
2678-
31350..323AF ; W # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
2679-
323B0..3FFFD ; W # Cn [56398] <reserved-323B0>..<reserved-3FFFD>
2678+
31350..33479 ; W # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
2679+
3347A..3FFFD ; W # Cn [52100] <reserved-3347A>..<reserved-3FFFD>
26802680
E0001 ; N # Cf LANGUAGE TAG
26812681
E0020..E007F ; N # Cf [96] TAG SPACE..CANCEL TAG
26822682
E0100..E01EF ; A # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

unicodetools/data/ucd/dev/LineBreak.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# LineBreak-16.0.0.txt
2-
# Date: 2024-07-29, 16:26:55 GMT
1+
# LineBreak-17.0.0.txt
2+
# Date: 2024-11-14, 15:20:00 GMT
33
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -3659,8 +3659,8 @@ FFFD ; AI # So REPLACEMENT CHARACTER
36593659
2FA20..2FFFD ; ID # Cn [1502] <reserved-2FA20>..<reserved-2FFFD>
36603660
30000..3134A ; ID # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
36613661
3134B..3134F ; ID # Cn [5] <reserved-3134B>..<reserved-3134F>
3662-
31350..323AF ; ID # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
3663-
323B0..3FFFD ; ID # Cn [56398] <reserved-323B0>..<reserved-3FFFD>
3662+
31350..33479 ; ID # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
3663+
3347A..3FFFD ; ID # Cn [52100] <reserved-3347A>..<reserved-3FFFD>
36643664
E0001 ; CM # Cf LANGUAGE TAG
36653665
E0020..E007F ; CM # Cf [96] TAG SPACE..CANCEL TAG
36663666
E0100..E01EF ; CM # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

unicodetools/data/ucd/dev/PropList.txt

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# PropList-16.0.0.txt
2-
# Date: 2024-05-31, 18:09:48 GMT
1+
# PropList-17.0.0.txt
2+
# Date: 2024-11-14, 15:24:22 GMT
33
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -883,9 +883,9 @@ FA70..FAD9 ; Ideographic # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COM
883883
2EBF0..2EE5D ; Ideographic # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
884884
2F800..2FA1D ; Ideographic # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
885885
30000..3134A ; Ideographic # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
886-
31350..323AF ; Ideographic # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
886+
31350..33479 ; Ideographic # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
887887

888-
# Total code points: 106477
888+
# Total code points: 110775
889889

890890
# ================================================
891891

@@ -1365,9 +1365,9 @@ FA27..FA29 ; Unified_Ideograph # Lo [3] CJK COMPATIBILITY IDEOGRAPH-FA27..C
13651365
2CEB0..2EBE0 ; Unified_Ideograph # Lo [7473] CJK UNIFIED IDEOGRAPH-2CEB0..CJK UNIFIED IDEOGRAPH-2EBE0
13661366
2EBF0..2EE5D ; Unified_Ideograph # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
13671367
30000..3134A ; Unified_Ideograph # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
1368-
31350..323AF ; Unified_Ideograph # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
1368+
31350..33479 ; Unified_Ideograph # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
13691369

1370-
# Total code points: 97680
1370+
# Total code points: 101978
13711371

13721372
# ================================================
13731373

unicodetools/data/ucd/dev/PropertyValueAliases.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# PropertyValueAliases-17.0.0.txt
2-
# Date: 2024-09-11, 23:38:17 GMT
2+
# Date: 2024-10-16, 13:48:47 GMT
33
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -212,6 +212,7 @@ blk; CJK_Ext_F ; CJK_Unified_Ideographs_Extension_F
212212
blk; CJK_Ext_G ; CJK_Unified_Ideographs_Extension_G
213213
blk; CJK_Ext_H ; CJK_Unified_Ideographs_Extension_H
214214
blk; CJK_Ext_I ; CJK_Unified_Ideographs_Extension_I
215+
blk; CJK_Ext_J ; CJK_Unified_Ideographs_Extension_J
215216
blk; CJK_Radicals_Sup ; CJK_Radicals_Supplement
216217
blk; CJK_Strokes ; CJK_Strokes
217218
blk; CJK_Symbols ; CJK_Symbols_And_Punctuation

unicodetools/data/ucd/dev/Scripts.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# Scripts-16.0.0.txt
2-
# Date: 2024-04-30, 21:48:40 GMT
1+
# Scripts-17.0.0.txt
2+
# Date: 2024-11-14, 15:24:34 GMT
33
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -1602,9 +1602,9 @@ FA70..FAD9 ; Han # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILI
16021602
2EBF0..2EE5D ; Han # Lo [622] CJK UNIFIED IDEOGRAPH-2EBF0..CJK UNIFIED IDEOGRAPH-2EE5D
16031603
2F800..2FA1D ; Han # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
16041604
30000..3134A ; Han # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
1605-
31350..323AF ; Han # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
1605+
31350..33479 ; Han # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
16061606

1607-
# Total code points: 99030
1607+
# Total code points: 103328
16081608

16091609
# ================================================
16101610

unicodetools/data/ucd/dev/UnicodeData.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39773,6 +39773,8 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;;
3977339773
3134A;<CJK Ideograph Extension G, Last>;Lo;0;L;;;;;N;;;;;
3977439774
31350;<CJK Ideograph Extension H, First>;Lo;0;L;;;;;N;;;;;
3977539775
323AF;<CJK Ideograph Extension H, Last>;Lo;0;L;;;;;N;;;;;
39776+
323B0;<CJK Ideograph Extension J, First>;Lo;0;L;;;;;N;;;;;
39777+
33479;<CJK Ideograph Extension J, Last>;Lo;0;L;;;;;N;;;;;
3977639778
E0001;LANGUAGE TAG;Cf;0;BN;;;;;N;;;;;
3977739779
E0020;TAG SPACE;Cf;0;BN;;;;;N;;;;;
3977839780
E0021;TAG EXCLAMATION MARK;Cf;0;BN;;;;;N;;;;;

unicodetools/data/ucd/dev/VerticalOrientation.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# VerticalOrientation-16.0.0.txt
2-
# Date: 2024-04-30, 21:48:42 GMT
1+
# VerticalOrientation-17.0.0.txt
2+
# Date: 2024-11-14, 15:20:20 GMT
33
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -2493,8 +2493,8 @@ FFFC..FFFD ; U # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEMENT CHARA
24932493
2FA20..2FFFD ; U # Cn [1502] <reserved-2FA20>..<reserved-2FFFD>
24942494
30000..3134A ; U # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
24952495
3134B..3134F ; U # Cn [5] <reserved-3134B>..<reserved-3134F>
2496-
31350..323AF ; U # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
2497-
323B0..3FFFD ; U # Cn [56398] <reserved-323B0>..<reserved-3FFFD>
2496+
31350..33479 ; U # Lo [8490] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-33479
2497+
3347A..3FFFD ; U # Cn [52100] <reserved-3347A>..<reserved-3FFFD>
24982498
E0001 ; R # Cf LANGUAGE TAG
24992499
E0020..E007F ; R # Cf [96] TAG SPACE..CANCEL TAG
25002500
E0100..E01EF ; R # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

0 commit comments

Comments
 (0)