Skip to content

Commit 9c9ef12

Browse files
authored
Unicode 15 initial data files (#171)
- Unihan data - core UCD files - short & long block names - script codes Kawi+Nagm - generated data files
1 parent 5cb43cb commit 9c9ef12

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+9618
-626
lines changed

docs/build.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,8 @@ Update `searchPath` in `org.unicode.text.utility.Utility.java`.
285285
If there are new CJK characters
286286
(if there are changes to entries in UnicodeData.txt that are for `<CJK Ideograph ..., First>` etc.),
287287
`UCD.java` and `UCD_Types.java` need to be updated to handle these ranges.
288-
See [PR #47](https://github.com/unicode-org/unicodetools/pull/47) for an example.
288+
See [PR #171](https://github.com/unicode-org/unicodetools/pull/171)
289+
and [PR #47](https://github.com/unicode-org/unicodetools/pull/47) for examples.
289290

290291
For CJK, you'll first need to compute the composite version, as `(major << 16) | (minor << 8) |` update.
291292
E.g. Unicode 14 is 0xe0000.

unicodetools/data/ucd/dev/Blocks.txt

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# Blocks-14.0.0.txt
2-
# Date: 2021-01-22, 23:29:00 GMT [KW]
1+
# Blocks-15.0.0.txt
2+
# Date: 2021-12-01, 23:59:00 GMT [KW]
33
# © 2021 Unicode®, Inc.
44
# For terms of use, see http://www.unicode.org/terms_of_use.html
55
#
@@ -241,6 +241,7 @@ FFF0..FFFF; Specials
241241
10D00..10D3F; Hanifi Rohingya
242242
10E60..10E7F; Rumi Numeral Symbols
243243
10E80..10EBF; Yezidi
244+
10EC0..10EFF; Arabic Extended-C
244245
10F00..10F2F; Old Sogdian
245246
10F30..10F6F; Sogdian
246247
10F70..10FAF; Old Uyghur
@@ -272,11 +273,13 @@ FFF0..FFFF; Specials
272273
11A50..11AAF; Soyombo
273274
11AB0..11ABF; Unified Canadian Aboriginal Syllabics Extended-A
274275
11AC0..11AFF; Pau Cin Hau
276+
11B00..11B5F; Devanagari Extended-A
275277
11C00..11C6F; Bhaiksuki
276278
11C70..11CBF; Marchen
277279
11D00..11D5F; Masaram Gondi
278280
11D60..11DAF; Gunjala Gondi
279281
11EE0..11EFF; Makasar
282+
11F00..11F5F; Kawi
280283
11FB0..11FBF; Lisu Supplement
281284
11FC0..11FFF; Tamil Supplement
282285
12000..123FF; Cuneiform
@@ -309,16 +312,19 @@ FFF0..FFFF; Specials
309312
1D000..1D0FF; Byzantine Musical Symbols
310313
1D100..1D1FF; Musical Symbols
311314
1D200..1D24F; Ancient Greek Musical Notation
315+
1D2C0..1D2DF; Kaktovik Numerals
312316
1D2E0..1D2FF; Mayan Numerals
313317
1D300..1D35F; Tai Xuan Jing Symbols
314318
1D360..1D37F; Counting Rod Numerals
315319
1D400..1D7FF; Mathematical Alphanumeric Symbols
316320
1D800..1DAAF; Sutton SignWriting
317321
1DF00..1DFFF; Latin Extended-G
318322
1E000..1E02F; Glagolitic Supplement
323+
1E030..1E08F; Cyrillic Extended-D
319324
1E100..1E14F; Nyiakeng Puachue Hmong
320325
1E290..1E2BF; Toto
321326
1E2C0..1E2FF; Wancho
327+
1E4D0..1E4FF; Nag Mundari
322328
1E7E0..1E7FF; Ethiopic Extended-B
323329
1E800..1E8DF; Mende Kikakui
324330
1E900..1E95F; Adlam
@@ -348,6 +354,7 @@ FFF0..FFFF; Specials
348354
2CEB0..2EBEF; CJK Unified Ideographs Extension F
349355
2F800..2FA1F; CJK Compatibility Ideographs Supplement
350356
30000..3134F; CJK Unified Ideographs Extension G
357+
31350..323AF; CJK Unified Ideographs Extension H
351358
E0000..E007F; Tags
352359
E0100..E01EF; Variation Selectors Supplement
353360
F0000..FFFFF; Supplementary Private Use Area-A

unicodetools/data/ucd/dev/DerivedAge.txt

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# DerivedAge-15.0.0.txt
2-
# Date: 2021-11-24, 21:43:27 GMT
2+
# Date: 2021-12-02, 19:37:30 GMT
33
# © 2021 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use, see http://www.unicode.org/terms_of_use.html
@@ -1949,4 +1949,38 @@ FDFE..FDFF ; 14.0 # [2] ARABIC LIGATURE SUBHAANAHU WA TAAALAA..ARABIC LIGAT
19491949

19501950
# Total code points: 838
19511951

1952+
# ================================================
1953+
1954+
# Age=V15_0
1955+
1956+
# Newly assigned in Unicode 15.0.0 (September, 2022)
1957+
1958+
0CF3 ; 15.0 # KANNADA SIGN COMBINING ANUSVARA ABOVE RIGHT
1959+
0ECE ; 15.0 # LAO YAMAKKAN
1960+
10EFD..10EFF ; 15.0 # [3] ARABIC SMALL LOW WORD SAKTA..ARABIC SMALL LOW WORD MADDA
1961+
1123F..11241 ; 15.0 # [3] KHOJKI LETTER QA..KHOJKI VOWEL SIGN VOCALIC R
1962+
11B00..11B09 ; 15.0 # [10] DEVANAGARI HEAD MARK..DEVANAGARI SIGN MINDU
1963+
11F00..11F10 ; 15.0 # [17] KAWI SIGN CANDRABINDU..KAWI LETTER O
1964+
11F12..11F3A ; 15.0 # [41] KAWI LETTER KA..KAWI VOWEL SIGN VOCALIC R
1965+
11F3E..11F59 ; 15.0 # [28] KAWI VOWEL SIGN E..KAWI DIGIT NINE
1966+
1B132 ; 15.0 # HIRAGANA LETTER SMALL KO
1967+
1B155 ; 15.0 # KATAKANA LETTER SMALL KO
1968+
1D2C0..1D2D3 ; 15.0 # [20] KAKTOVIK NUMERAL ZERO..KAKTOVIK NUMERAL NINETEEN
1969+
1DF25..1DF2A ; 15.0 # [6] LATIN SMALL LETTER D WITH RAISED LEFT HOOK..LATIN SMALL LETTER T WITH RAISED LEFT HOOK
1970+
1E030..1E06C ; 15.0 # [61] MODIFIER LETTER CYRILLIC SMALL A..MODIFIER LETTER CYRILLIC SMALL YERU WITH BACK YER
1971+
1E4D0..1E4F9 ; 15.0 # [42] NAG MUNDARI LETTER O..NAG MUNDARI DIGIT NINE
1972+
1F6DC ; 15.0 # WIRELESS
1973+
1F7D9 ; 15.0 # NINE POINTED WHITE STAR
1974+
1FA75..1FA77 ; 15.0 # [3] LIGHT BLUE HEART..PINK HEART
1975+
1FA87..1FA88 ; 15.0 # [2] MARACAS..FLUTE
1976+
1FAAD..1FAAF ; 15.0 # [3] FOLDING HAND FAN..KHANDA
1977+
1FABB..1FABF ; 15.0 # [5] HYACINTH..GOOSE
1978+
1FACE..1FACF ; 15.0 # [2] MOOSE FACE..DONKEY
1979+
1FADA..1FADB ; 15.0 # [2] GINGER..PEAPOD
1980+
1FAE8 ; 15.0 # SHAKING FACE
1981+
1FAF7..1FAF8 ; 15.0 # [2] LEFTWARDS PUSHING HAND..RIGHTWARDS PUSHING HAND
1982+
31350..323AF ; 15.0 # [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
1983+
1984+
# Total code points: 4449
1985+
19521986
# EOF

0 commit comments

Comments
 (0)