Skip to content

Commit bca50a4

Browse files
authored
Line breaking changes from UTC-181 (#1046)
* UTC-181-A44 In LineBreak.txt and derived files, change the Line_Break assignment of U+034F COMBINING GRAPHEME JOINER from Line_Break=GL (Glue) to Line_Break=CM (Combining_Mark). For Unicode Version 17.0. [Ref. Section 6.3 of document L2/24-224] * Regenerate UCD * UTC-181-A142 In UCD files LineBreakTest.txt and LineBreakTest.html, add realistic tests exercising the changes to the behaviour of rules LB20a and LB21. For Unicode Version 17.0. See L2/24-224 item 6.1. * LB20a does not work in SP CM HY HL 😿 * Regenerate UCD * UTC-181-A138 In UCD file PropertyValueAliases.txt, add a new Line_Break property value Unambiguous_Hyphen (short alias: HH). For Unicode Version 17.0. See L2/24-224 item 6.1. * Regenerate UCD * GenerateEnums * UTC-181-A139 In UCD file LineBreak.txt and derived files, assign Line_Break=Unambiguous_Hyphen to the eleven characters that have General_Category=Pd and Line_Break=Break_After in Unicode Version 16.0. For Unicode Version 17.0. See L2/24-224 item 6.1. * UTC-181-A141 In UCD files LineBreakTest.txt and LineBreakTest.html, update rules LB12a, LB20a, LB21, and LB21a as described in L2/24-224 item 6.1. For Unicode Version 17.0. * Regenerate UCD
1 parent ed8b6dc commit bca50a4

File tree

10 files changed

+2035
-1880
lines changed

10 files changed

+2035
-1880
lines changed

unicodetools/data/ucd/dev/LineBreak.txt

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# LineBreak-17.0.0.txt
2-
# Date: 2025-01-27, 18:09:16 GMT
2+
# Date: 2025-02-14, 15:13:07 GMT
33
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -157,9 +157,7 @@
157157
02ED ; AL # Sk MODIFIER LETTER UNASPIRATED
158158
02EE ; AL # Lm MODIFIER LETTER DOUBLE APOSTROPHE
159159
02EF..02FF ; AL # Sk [17] MODIFIER LETTER LOW DOWN ARROWHEAD..MODIFIER LETTER LOW LEFT ARROW
160-
0300..034E ; CM # Mn [79] COMBINING GRAVE ACCENT..COMBINING UPWARDS ARROW BELOW
161-
034F ; GL # Mn COMBINING GRAPHEME JOINER
162-
0350..035B ; CM # Mn [12] COMBINING RIGHT ARROWHEAD ABOVE..COMBINING ZIGZAG ABOVE
160+
0300..035B ; CM # Mn [92] COMBINING GRAVE ACCENT..COMBINING ZIGZAG ABOVE
163161
035C..0362 ; GL # Mn [7] COMBINING DOUBLE BREVE BELOW..COMBINING DOUBLE RIGHTWARDS ARROW BELOW
164162
0363..036F ; CM # Mn [13] COMBINING LATIN SMALL LETTER A..COMBINING LATIN SMALL LETTER X
165163
0370..0373 ; AL # L& [4] GREEK CAPITAL LETTER HETA..GREEK SMALL LETTER ARCHAIC SAMPI
@@ -190,11 +188,11 @@
190188
055A..055F ; AL # Po [6] ARMENIAN APOSTROPHE..ARMENIAN ABBREVIATION MARK
191189
0560..0588 ; AL # Ll [41] ARMENIAN SMALL LETTER TURNED AYB..ARMENIAN SMALL LETTER YI WITH STROKE
192190
0589 ; IS # Po ARMENIAN FULL STOP
193-
058A ; BA # Pd ARMENIAN HYPHEN
191+
058A ; HH # Pd ARMENIAN HYPHEN
194192
058D..058E ; AL # So [2] RIGHT-FACING ARMENIAN ETERNITY SIGN..LEFT-FACING ARMENIAN ETERNITY SIGN
195193
058F ; PR # Sc ARMENIAN DRAM SIGN
196194
0591..05BD ; CM # Mn [45] HEBREW ACCENT ETNAHTA..HEBREW POINT METEG
197-
05BE ; BA # Pd HEBREW PUNCTUATION MAQAF
195+
05BE ; HH # Pd HEBREW PUNCTUATION MAQAF
198196
05BF ; CM # Mn HEBREW POINT RAFE
199197
05C0 ; AL # Po HEBREW PUNCTUATION PASEQ
200198
05C1..05C2 ; CM # Mn [2] HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT
@@ -667,7 +665,7 @@
667665
1390..1399 ; AL # So [10] ETHIOPIC TONAL MARK YIZET..ETHIOPIC TONAL MARK KURT
668666
13A0..13F5 ; AL # Lu [86] CHEROKEE LETTER A..CHEROKEE LETTER MV
669667
13F8..13FD ; AL # Ll [6] CHEROKEE SMALL LETTER YE..CHEROKEE SMALL LETTER MV
670-
1400 ; BA # Pd CANADIAN SYLLABICS HYPHEN
668+
1400 ; HH # Pd CANADIAN SYLLABICS HYPHEN
671669
1401..166C ; AL # Lo [620] CANADIAN SYLLABICS E..CANADIAN SYLLABICS CARRIER TTSA
672670
166D ; AL # So CANADIAN SYLLABICS CHI SIGN
673671
166E ; AL # Po CANADIAN SYLLABICS FULL STOP
@@ -899,9 +897,9 @@
899897
200C ; CM # Cf ZERO WIDTH NON-JOINER
900898
200D ; ZWJ# Cf ZERO WIDTH JOINER
901899
200E..200F ; CM # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
902-
2010 ; BA # Pd HYPHEN
900+
2010 ; HH # Pd HYPHEN
903901
2011 ; GL # Pd NON-BREAKING HYPHEN
904-
2012..2013 ; BA # Pd [2] FIGURE DASH..EN DASH
902+
2012..2013 ; HH # Pd [2] FIGURE DASH..EN DASH
905903
2014 ; B2 # Pd EM DASH
906904
2015 ; AI # Pd HORIZONTAL BAR
907905
2016 ; AI # Po DOUBLE VERTICAL LINE
@@ -1365,7 +1363,7 @@
13651363
2E0D ; QU # Pf RIGHT RAISED OMISSION BRACKET
13661364
2E0E..2E15 ; BA # Po [8] EDITORIAL CORONIS..UPWARDS ANCORA
13671365
2E16 ; AL # Po DOTTED RIGHT-POINTING ANGLE
1368-
2E17 ; BA # Pd DOUBLE OBLIQUE HYPHEN
1366+
2E17 ; HH # Pd DOUBLE OBLIQUE HYPHEN
13691367
2E18 ; OP # Po INVERTED INTERROBANG
13701368
2E19 ; BA # Po PALM BRANCH
13711369
2E1A ; AL # Pd HYPHEN WITH DIAERESIS
@@ -1393,7 +1391,7 @@
13931391
2E3A..2E3B ; B2 # Pd [2] TWO-EM DASH..THREE-EM DASH
13941392
2E3C..2E3E ; BA # Po [3] STENOGRAPHIC FULL STOP..WIGGLY VERTICAL LINE
13951393
2E3F ; AL # Po CAPITULUM
1396-
2E40 ; BA # Pd DOUBLE HYPHEN
1394+
2E40 ; HH # Pd DOUBLE HYPHEN
13971395
2E41 ; BA # Po REVERSED COMMA
13981396
2E42 ; OP # Ps DOUBLE LOW-REVERSED-9 QUOTATION MARK
13991397
2E43..2E4A ; BA # Po [8] DASH WITH LEFT UPTURN..DOTTED SOLIDUS
@@ -1412,7 +1410,7 @@
14121410
2E5A ; CP # Pe TOP HALF RIGHT PARENTHESIS
14131411
2E5B ; OP # Ps BOTTOM HALF LEFT PARENTHESIS
14141412
2E5C ; CP # Pe BOTTOM HALF RIGHT PARENTHESIS
1415-
2E5D ; BA # Pd OBLIQUE HYPHEN
1413+
2E5D ; HH # Pd OBLIQUE HYPHEN
14161414
2E80..2E99 ; ID # So [26] CJK RADICAL REPEAT..CJK RADICAL RAP
14171415
2E9B..2EF3 ; ID # So [89] CJK RADICAL CHOKE..CJK RADICAL C-SIMPLIFIED TURTLE
14181416
2F00..2FD5 ; ID # So [214] KANGXI RADICAL ONE..KANGXI RADICAL FLUTE
@@ -2812,14 +2810,14 @@ FFFD ; AI # So REPLACEMENT CHARACTER
28122810
10D4F ; AL # Lo GARAY SUKUN
28132811
10D50..10D65 ; AL # Lu [22] GARAY CAPITAL LETTER A..GARAY CAPITAL LETTER OLD NA
28142812
10D69..10D6D ; CM # Mn [5] GARAY VOWEL SIGN E..GARAY CONSONANT NASALIZATION MARK
2815-
10D6E ; BA # Pd GARAY HYPHEN
2813+
10D6E ; HH # Pd GARAY HYPHEN
28162814
10D6F ; AL # Lm GARAY REDUPLICATION MARK
28172815
10D70..10D85 ; AL # Ll [22] GARAY SMALL LETTER A..GARAY SMALL LETTER OLD NA
28182816
10D8E..10D8F ; AL # Sm [2] GARAY PLUS SIGN..GARAY MINUS SIGN
28192817
10E60..10E7E ; AL # No [31] RUMI DIGIT ONE..RUMI FRACTION TWO THIRDS
28202818
10E80..10EA9 ; AL # Lo [42] YEZIDI LETTER ELIF..YEZIDI LETTER ET
28212819
10EAB..10EAC ; CM # Mn [2] YEZIDI COMBINING HAMZA MARK..YEZIDI COMBINING MADDA MARK
2822-
10EAD ; BA # Pd YEZIDI HYPHENATION MARK
2820+
10EAD ; HH # Pd YEZIDI HYPHENATION MARK
28232821
10EB0..10EB1 ; AL # Lo [2] YEZIDI LETTER LAM WITH DOT ABOVE..YEZIDI LETTER YOT WITH CIRCUMFLEX ABOVE
28242822
10EC2..10EC4 ; AL # Lo [3] ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS VERTICALLY BELOW
28252823
10EC5 ; AL # Lm ARABIC SMALL YEH BARREE WITH TWO DOTS BELOW

unicodetools/data/ucd/dev/PropertyValueAliases.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# PropertyValueAliases-17.0.0.txt
2-
# Date: 2025-01-27, 18:09:29 GMT
2+
# Date: 2025-02-14, 15:50:28 GMT
33
# © 2025 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -1142,6 +1142,7 @@ lb ; EX ; Exclamation
11421142
lb ; GL ; Glue
11431143
lb ; H2 ; H2
11441144
lb ; H3 ; H3
1145+
lb ; HH ; Unambiguous_Hyphen
11451146
lb ; HL ; Hebrew_Letter
11461147
lb ; HY ; Hyphen
11471148
lb ; ID ; Ideographic

unicodetools/data/ucd/dev/auxiliary/LineBreakTest.html

Lines changed: 316 additions & 189 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)