Skip to content

Commit 5f85cbf

Browse files
authored
Fix references to Core Spec in ArabicShaping and DoNotEmit (#1200)
Also: - Remove old Review Notes from DoNotEmit - Move Sharada data up now that it has its own "Do Not Use" table - Minor editorial changes
1 parent 21d5669 commit 5f85cbf

File tree

2 files changed

+27
-31
lines changed

2 files changed

+27
-31
lines changed

unicodetools/data/ucd/dev/ArabicShaping.txt

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# ArabicShaping-17.0.0.txt
2-
# Date: 2025-03-21, 02:39:00 GMT
2+
# Date: 2025-08-14
33
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -12,22 +12,22 @@
1212
# shaping, repeating in machine readable form the information
1313
# exemplified in various tables of The Unicode Standard core specification.
1414
#
15-
# This file also defines Joining_Type values for
16-
# Mongolian, Phags-pa, Psalter Pahlavi, Sogdian, Old Uyghur, Chorasmian,
17-
# and Adlam positional shaping,
18-
# and Joining_Type and Joining_Group values for Hanifi Rohingya positional shaping,
19-
# which are not listed in tables in the core specification.
15+
# This file also defines Joining_Type values for Mongolian, Phags-pa,
16+
# Psalter Pahlavi, Sogdian, Old Uyghur, Chorasmian, and Adlam positional
17+
# shaping, and Joining_Type and Joining_Group values for Hanifi Rohingya
18+
# positional shaping, which are not listed in tables in the core
19+
# specification.
2020
#
2121
# Script Section Table(s)
2222
#
23-
# Arabic 9.2 9-3, 9-4, 9-5, 9-7, 9-8, 9-9, 9-10, 9-11
23+
# Arabic 9.2 9-3, 9-4, 9-5, 9-7, 9-8, 9-9, 9-10, 9-11, 9-13
2424
# Syriac 9.3 9-15, 9-16, 9-17, 9-18, 9-19
25-
# Mandaic 9.5 9-21, 9-22
25+
# Mandaic 9.5 9-22, 9-23
2626
# Manichaean 10.5 10-4, 10-5, 10-6, 10-7
2727
# Psalter Pahlavi 10.6 --
2828
# Chorasmian 10.8 --
2929
# Mongolian 13.5 --
30-
# Phags-pa 14.4 --
30+
# Phags-pa 14.4 14-7
3131
# Sogdian 14.10 --
3232
# Old Uyghur 14.11 --
3333
# Hanifi Rohingya 16.14 --

unicodetools/data/ucd/dev/DoNotEmit.txt

Lines changed: 18 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# DoNotEmit-17.0.0.txt
2-
# Date: 2025-08-04
2+
# Date: 2025-08-14
33
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -139,8 +139,6 @@
139139
0905 0957; 0977; Indic_Vowel_Letter # DEVANAGARI LETTER A, DEVANAGARI VOWEL SIGN UUE; DEVANAGARI LETTER UUE
140140

141141
# Devanagari, from Table 12-2
142-
# Review Note: Some experts have recommended removing these, while
143-
# others prefer keeping them. They may also be procedurally generated.
144142
0916 094D 093E; 0916; Indic_Atomic_Consonant # DEVANAGARI LETTER KHA, DEVANAGARI SIGN VIRAMA, DEVANAGARI VOWEL SIGN AA; DEVANAGARI LETTER KHA
145143
0916 094D 200D 093E; 0916; Indic_Atomic_Consonant # DEVANAGARI LETTER KHA, DEVANAGARI SIGN VIRAMA, ZERO WIDTH JOINER, DEVANAGARI VOWEL SIGN AA; DEVANAGARI LETTER KHA
146144
0917 094D 093E; 0917; Indic_Atomic_Consonant # DEVANAGARI LETTER GA, DEVANAGARI SIGN VIRAMA, DEVANAGARI VOWEL SIGN AA; DEVANAGARI LETTER GA
@@ -219,8 +217,6 @@
219217
097F 094D 200D 093E; 097F; Indic_Atomic_Consonant # DEVANAGARI LETTER BBA, DEVANAGARI SIGN VIRAMA, ZERO WIDTH JOINER, DEVANAGARI VOWEL SIGN AA; DEVANAGARI LETTER BBA
220218

221219
# Devanagari, from Table 12-3
222-
# Review Note: Some experts have recommended removing these, while
223-
# others prefer keeping them. They may also be procedurally generated.
224220
# Note: This list may be incomplete.
225221
0915 094D 091A 094D 093E; 0915 094D 091A; Indic_Consonant_Conjunct # DEVANAGARI LETTER KA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER CA, DEVANAGARI SIGN VIRAMA, DEVANAGARI VOWEL SIGN AA; DEVANAGARI LETTER KA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER CA
226222
0915 094D 091A 094D 200D 093E; 0915 094D 091A; Indic_Consonant_Conjunct # DEVANAGARI LETTER KA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER CA, DEVANAGARI SIGN VIRAMA, ZERO WIDTH JOINER, DEVANAGARI VOWEL SIGN AA; DEVANAGARI LETTER KA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER CA
@@ -231,7 +227,7 @@
231227
0928 094D 0924 094D 093E; 0928 094D 0924; Indic_Consonant_Conjunct # DEVANAGARI LETTER NA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER TA, DEVANAGARI SIGN VIRAMA, DEVANAGARI VOWEL SIGN AA; DEVANAGARI LETTER NA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER TA
232228
0928 094D 0924 094D 200D 093E; 0928 094D 0924; Indic_Consonant_Conjunct # DEVANAGARI LETTER NA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER TA, DEVANAGARI SIGN VIRAMA, ZERO WIDTH JOINER, DEVANAGARI VOWEL SIGN AA; DEVANAGARI LETTER NA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER TA
233229

234-
# Bengali, from Table 12-11
230+
# Bengali (Bangla), from Table 12-11
235231
0985 09BE; 0986; Indic_Vowel_Letter # BENGALI LETTER A, BENGALI VOWEL SIGN AA; BENGALI LETTER AA
236232
098B 09C3; 09E0; Indic_Vowel_Letter # BENGALI LETTER VOCALIC R, BENGALI VOWEL SIGN VOCALIC R; BENGALI LETTER VOCALIC RR
237233
098C 09E2; 09E1; Indic_Vowel_Letter # BENGALI LETTER VOCALIC L, BENGALI VOWEL SIGN VOCALIC L; BENGALI LETTER VOCALIC LL
@@ -259,27 +255,27 @@
259255
0A85 0ABE 0AC8; 0A94; Indic_Vowel_Letter # GUJARATI LETTER A, GUJARATI VOWEL SIGN AA, GUJARATI VOWEL SIGN AI; GUJARATI LETTER AU
260256
0AC5 0ABE; 0AC9; Indic_Vowel_Letter # GUJARATI VOWEL SIGN CANDRA E, GUJARATI VOWEL SIGN AA; GUJARATI VOWEL SIGN CANDRA O
261257

262-
# Oriya, from Table 12-22
258+
# Oriya (Odia), from Table 12-22
263259
0B05 0B3E; 0B06; Indic_Vowel_Letter # ORIYA LETTER A, ORIYA VOWEL SIGN AA; ORIYA LETTER AA
264260
0B0F 0B57; 0B10; Indic_Vowel_Letter # ORIYA LETTER E, ORIYA AU LENGTH MARK; ORIYA LETTER AI
265261
0B13 0B57; 0B14; Indic_Vowel_Letter # ORIYA LETTER O, ORIYA AU LENGTH MARK; ORIYA LETTER AU
266262

267-
# Tamil, from Table 12-26
263+
# Tamil, from Table 12-27
268264
0B85 0BC2; 0B86; Indic_Vowel_Letter # TAMIL LETTER A, TAMIL VOWEL SIGN UU; TAMIL LETTER AA
269265

270-
# Telugu, from Table 12-30
266+
# Telugu, from Table 12-31
271267
0C12 0C55; 0C13; Indic_Vowel_Letter # TELUGU LETTER O, TELUGU LENGTH MARK; TELUGU LETTER OO
272268
0C12 0C4C; 0C14; Indic_Vowel_Letter # TELUGU LETTER O, TELUGU VOWEL SIGN AU; TELUGU LETTER AU
273269
0C3F 0C55; 0C40; Indic_Vowel_Letter # TELUGU VOWEL SIGN I, TELUGU LENGTH MARK; TELUGU VOWEL SIGN II
274270
0C46 0C55; 0C47; Indic_Vowel_Letter # TELUGU VOWEL SIGN E, TELUGU LENGTH MARK; TELUGU VOWEL SIGN EE
275271
0C4A 0C55; 0C4B; Indic_Vowel_Letter # TELUGU VOWEL SIGN O, TELUGU LENGTH MARK; TELUGU VOWEL SIGN OO
276272

277-
# Kannada, from Table 12-31
273+
# Kannada, from Table 12-32
278274
0C89 0CBE; 0C8A; Indic_Vowel_Letter # KANNADA LETTER U, KANNADA VOWEL SIGN AA; KANNADA LETTER UU
279275
0C92 0CCC; 0C94; Indic_Vowel_Letter # KANNADA LETTER O, KANNADA VOWEL SIGN AU; KANNADA LETTER AU
280276
0C8B 0CBE; 0CE0; Indic_Vowel_Letter # KANNADA LETTER VOCALIC R, KANNADA VOWEL SIGN AA; KANNADA LETTER VOCALIC RR
281277

282-
# Malayalam, from Table 12-32
278+
# Malayalam, from Table 12-34
283279
0D07 0D57; 0D08; Indic_Vowel_Letter # MALAYALAM LETTER I, MALAYALAM AU LENGTH MARK; MALAYALAM LETTER II
284280
0D09 0D57; 0D0A; Indic_Vowel_Letter # MALAYALAM LETTER U, MALAYALAM AU LENGTH MARK; MALAYALAM LETTER UU
285281
0D0E 0D46; 0D10; Indic_Vowel_Letter # MALAYALAM LETTER E, MALAYALAM VOWEL SIGN E; MALAYALAM LETTER AI
@@ -302,13 +298,17 @@
302298
1100B 1103E; 1100C; Indic_Vowel_Letter # BRAHMI LETTER VOCALIC R, BRAHMI VOWEL SIGN VOCALIC R; BRAHMI LETTER VOCALIC RR
303299
1100F 11042; 11010; Indic_Vowel_Letter # BRAHMI LETTER E, BRAHMI VOWEL SIGN E; BRAHMI LETTER AI
304300

305-
# Takri, from Table 15-1
301+
# Sharada, from Table 15-1
302+
1118D 111BC; 1118E; Indic_Vowel_Letter # SHARADA LETTER E, SHARADA VOWEL SIGN E; SHARADA LETTER AI
303+
111C4; 1118F 11180; Discouraged # SHARADA OM; SHARADA LETTER O, SHARADA SIGN CANDRABINDU
304+
305+
# Takri, from Table 15-2
306306
11680 116AD; 11681; Indic_Vowel_Letter # TAKRI LETTER A, TAKRI VOWEL SIGN AA; TAKRI LETTER AA
307307
11686 116B2; 11687; Indic_Vowel_Letter # TAKRI LETTER E, TAKRI VOWEL SIGN E; TAKRI LETTER AI
308308
11680 116B4; 11688; Indic_Vowel_Letter # TAKRI LETTER A, TAKRI VOWEL SIGN O; TAKRI LETTER O
309309
11680 116B5; 11689; Indic_Vowel_Letter # TAKRI LETTER A, TAKRI VOWEL SIGN AU; TAKRI LETTER AU
310310

311-
# Khojki, from Table 15-3
311+
# Khojki, from Table 15-4
312312
11200 1122C; 11201; Indic_Vowel_Letter # KHOJKI LETTER A, KHOJKI VOWEL SIGN AA; KHOJKI LETTER AA
313313
11240 1122E; 11202; Indic_Vowel_Letter # KHOJKI LETTER SHORT I, KHOJKI VOWEL SIGN II; KHOJKI LETTER I
314314
11206 1122C; 11203; Indic_Vowel_Letter # KHOJKI LETTER O, KHOJKI VOWEL SIGN AA; KHOJKI LETTER U
@@ -318,21 +318,21 @@
318318
1122C 11230; 11232; Indic_Vowel_Letter # KHOJKI VOWEL SIGN AA, KHOJKI VOWEL SIGN E; KHOJKI VOWEL SIGN O
319319
1122C 11231; 11233; Indic_Vowel_Letter # KHOJKI VOWEL SIGN AA, KHOJKI VOWEL SIGN AI; KHOJKI VOWEL SIGN AU
320320

321-
# Khudawadi, from Table 15-4
321+
# Khudawadi, from Table 15-5
322322
112B0 112E0; 112B1; Indic_Vowel_Letter # KHUDAWADI LETTER A, KHUDAWADI VOWEL SIGN AA; KHUDAWADI LETTER AA
323323
112B0 112E5; 112B6; Indic_Vowel_Letter # KHUDAWADI LETTER A, KHUDAWADI VOWEL SIGN E; KHUDAWADI LETTER E
324324
112B0 112E6; 112B7; Indic_Vowel_Letter # KHUDAWADI LETTER A, KHUDAWADI VOWEL SIGN AI; KHUDAWADI LETTER AI
325325
112B0 112E7; 112B8; Indic_Vowel_Letter # KHUDAWADI LETTER A, KHUDAWADI VOWEL SIGN O; KHUDAWADI LETTER O
326326
112B0 112E8; 112B9; Indic_Vowel_Letter # KHUDAWADI LETTER A, KHUDAWADI VOWEL SIGN AU; KHUDAWADI LETTER AU
327327

328-
# Tirhuta, from Table 15-6
328+
# Tirhuta, from Table 15-7
329329
11481 114B0; 11482; Indic_Vowel_Letter # TIRHUTA LETTER A, TIRHUTA VOWEL SIGN AA; TIRHUTA LETTER AA
330330
114AA 114B5; 11489; Indic_Vowel_Letter # TIRHUTA LETTER LA, TIRHUTA VOWEL SIGN VOCALIC R; TIRHUTA LETTER VOCALIC L
331331
114AA 114B6; 1148A; Indic_Vowel_Letter # TIRHUTA LETTER LA, TIRHUTA VOWEL SIGN VOCALIC RR; TIRHUTA LETTER VOCALIC LL
332332
1148B 114BA; 1148C; Indic_Vowel_Letter # TIRHUTA LETTER E, TIRHUTA VOWEL SIGN SHORT E; TIRHUTA LETTER AI
333333
1148D 114BA; 1148E; Indic_Vowel_Letter # TIRHUTA LETTER O, TIRHUTA VOWEL SIGN SHORT E; TIRHUTA LETTER AU
334334

335-
# Modi, from Table 15-7
335+
# Modi, from Table 15-8
336336
11600 11639; 1160A; Indic_Vowel_Letter # MODI LETTER A, MODI VOWEL SIGN E; MODI LETTER E
337337
11600 1163A; 1160B; Indic_Vowel_Letter # MODI LETTER A, MODI VOWEL SIGN AI; MODI LETTER AI
338338
11601 11639; 1160C; Indic_Vowel_Letter # MODI LETTER AA, MODI VOWEL SIGN E; MODI LETTER O
@@ -456,7 +456,7 @@
456456
0953; 0300; Discouraged # DEVANAGARI GRAVE ACCENT; COMBINING GRAVE ACCENT
457457
0954; 0301; Discouraged # DEVANAGARI ACUTE ACCENT; COMBINING ACUTE ACCENT
458458

459-
# Bengali, from the "Bengali (Bangla)" section of the core specification
459+
# Bengali (Bangla), from the "Bengali (Bangla)" section of the core specification
460460
09A4 09CD 200D; 09CE; Bengali_Khanda_Ta # BENGALI LETTER TA, BENGALI SIGN VIRAMA, ZERO WIDTH JOINER; BENGALI LETTER KHANDA TA
461461

462462
# Gujarati, from the NamesList
@@ -465,7 +465,7 @@
465465
# Tamil ligature shri
466466
0BB8 0BCD 0BB0 0BC0; 0BB6 0BCD 0BB0 0BC0; Tamil_Shrii # TAMIL LETTER SA, TAMIL SIGN VIRAMA, TAMIL LETTER RA, TAMIL VOWEL SIGN II; TAMIL LETTER SHA, TAMIL SIGN VIRAMA, TAMIL LETTER RA, TAMIL VOWEL SIGN II
467467

468-
# Malayalam Chillus, from Table 12-40
468+
# Malayalam Chillus, from Table 12-42
469469
0D23 0D4D 200D; 0D7A; Malayalam_Chillu # MALAYALAM LETTER NNA, MALAYALAM SIGN VIRAMA, ZERO WIDTH JOINER; MALAYALAM LETTER CHILLU NN
470470
0D28 0D4D 200D; 0D7B; Malayalam_Chillu # MALAYALAM LETTER NA, MALAYALAM SIGN VIRAMA, ZERO WIDTH JOINER; MALAYALAM LETTER CHILLU N
471471
0D30 0D4D 200D; 0D7C; Malayalam_Chillu # MALAYALAM LETTER RA, MALAYALAM SIGN VIRAMA, ZERO WIDTH JOINER; MALAYALAM LETTER CHILLU RR
@@ -482,8 +482,4 @@
482482
17D8; 17D4 179B 17D4; Discouraged # KHMER SIGN BEYYAL; KHMER SIGN KHAN, KHMER LETTER LO, KHMER SIGN KHAN
483483
17E8 17D3; 19E0; Discouraged # KHMER DIGIT EIGHT, KHMER SIGN BATHAMASAT; KHMER SYMBOL PATHAMASAT
484484

485-
# Sharada, from the NamesList, and glyph shape of U+1118E
486-
1118D 111BC; 1118E; Indic_Vowel_Letter # SHARADA LETTER E, SHARADA VOWEL SIGN E; SHARADA LETTER AI
487-
111C4; 1118F 11180; Discouraged # SHARADA OM; SHARADA LETTER O, SHARADA SIGN CANDRABINDU
488-
489485
# EOF

0 commit comments

Comments
 (0)