Skip to content

Commit fb2804d

Browse files
committed
Merge remote-tracking branch 'la-vache/main' into 5-modifier-click-letters
2 parents 6b9773d + ef8d616 commit fb2804d

40 files changed

+1966
-519
lines changed

.github/workflows/pipeline.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ jobs:
5353
uses: actions/checkout@v3
5454
with:
5555
sparse-checkout: py/pipeline-workflow
56-
- name: Check L2 document
56+
- name: Check L2 document and WG references
5757
run: |
5858
python3 py/pipeline-workflow/check-l2-document.py
5959
utc-decision:

docs/help/changes.md

Lines changed: 16 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,42 +3,38 @@
33
The Unicode Utilities have been modified to support both properties from the
44
released version of Unicode (via ICU) and from the new Unicode beta.
55

6-
To get the beta version of the property, insert β *after* the property name.
6+
To get the beta version of the property, insert `Uβ:` *before* the property name.
7+
The explicit version number for the β can be used;
8+
the resulting property is then only valid when that specific β is current.
79
Examples:
810

9-
| `\p{Word_Break=ALetter}` | Released version of Unicode |
10-
| `\p{Word_Breakβ=ALetter}` | Beta version of Unicode |
11+
| Query | Result |
12+
|---|---|
13+
| `\p{Word_Break=ALetter}` | Released version of Unicode. |
14+
| `\p{Uβ:Word_Break=ALetter}` | Beta version of Unicode; error outside of beta review. |
15+
| `\p{U16β:Word_Break=ALetter}` | Beta version of Unicode 16.0; error during the beta review of any other version. |
1116

1217

1318
For example, to see additions to that property value in the beta version, use:
1419

1520
<center>
1621

17-
[`\p{Word_Breakβ=ALetter}-\\p{Word_Break=ALetter}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BWord_Break%CE%B2%3DALetter%7D-%5Cp%7BWord_Break%3DALetter%7D&g=&i=)
22+
[`\p{Uβ:Word_Break=ALetter}-\p{Word_Break=ALetter}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BU%CE%B2%3AWord_Break%3DALetter%7D-%5Cp%7BWord_Break%3DALetter%7D&g=&i=)
1823

1924
</center>
2025

2126

2227
## Caveats
2328

24-
The support is not complete done, and there are some known problems.
25-
26-
1. Some properties are not supported in beta versions. See
27-
<https://util.unicode.org/UnicodeJsps/properties.jsp>
28-
for the list.
29-
2. When characters are listed, the new blocks and subheads don't show up.
30-
3. If you use a property that has a β version but no ICU version, you get no
31-
error: just an empty listing.
32-
4. The beta properties don't yet have the "shorthands" for cases like \\p{Lu}.
33-
So make sure the property is listed, eg \\p{gcβ=Lu}
34-
1. Example:
35-
[`\p{gcβ=Lu}-\\p{gc=Lu}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7Bgc%CE%B2%3DLu%7D-%5Cp%7Bgc%3DLu%7D&g=&i=)
36-
5. Tools for segmentation, etc. use the release properties; there isn't a way
29+
The support is not completely done, and there are some known problems.
30+
31+
1. The General_Category groupings such as \\p{Uβ:L} are not correctly implemented.
32+
Only actual values, such as \\p{Uβ:Lu} etc., work.
33+
2. Tools for segmentation, etc. use the release properties; there isn't a way
3734
to have them use the beta properties.
38-
6. There are probably others...
35+
3. There are probably others...
3936

4037
If you find a problem, please file a ticket at
41-
<https://cldr.unicode.org/index/bug-reports>: make sure to start the summary with
42-
"Unicode Utilities: "
38+
https://github.com/unicode-org/unicodetools/issues.
4339

4440
[Back to Unicode Utilities Help Home](index)

docs/pipeline.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ PR preparation:
6161
- [ ] If from SAH — Link SAH issue
6262
- [ ] If from ESC or CJK — Mention ESC or CJK in the PR description
6363
- [ ] When for a UTC decision — Cite in the format UTC-\d\d\d-[MC]\d+ or with a link.
64+
- [ ] Link RMG issue
6465
- [ ] Whenever there is a Proposal document — Cite L2 number in the format L2/yy-nnn
6566
- [ ] data-for-new — Set label
6667
- [ ] pipeline-* — Set label to **pipeline-recommended-to-UTC** if the characters are not yet in the pipeline, and **pipeline-provisionally-assigned**, or **pipeline-`<version>`** depending on their status in [the Pipeline](https://unicode.org/alloc/Pipeline.html#future).

py/pipeline-workflow/check-l2-document.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,9 @@
1818
"PRs for character additions must include a link to the SAH issue, or "
1919
"the mention ESC or CJK.")
2020
errors += 1
21+
if not re.search(r"(unicode-org/utc-release-management(#|/issues/)\d)", pr_body):
22+
print("::error title=Need RMG reference::"
23+
"PRs for character additions must include a link to the corresponding "
24+
"RMG issue.")
25+
errors += 1
2126
exit(errors)

unicodetools/data/emoji/dev/internal/emoji-ordering-rules.txt

Lines changed: 441 additions & 0 deletions
Large diffs are not rendered by default.

unicodetools/data/ucd/dev/DerivedAge.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# DerivedAge-16.0.0.txt
2-
# Date: 2024-07-25, 16:09:29 GMT
2+
# Date: 2024-07-25, 16:15:32 GMT
33
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html

unicodetools/data/ucd/dev/DerivedCoreProperties.txt

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# DerivedCoreProperties-16.0.0.txt
2-
# Date: 2024-07-25, 16:10:14 GMT
2+
# Date: 2024-07-25, 16:16:16 GMT
33
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
@@ -10699,8 +10699,11 @@ E01F0..E0FFF ; Default_Ignorable_Code_Point # Cn [3600] <reserved-E01F0>..<rese
1069910699
0C81 ; Grapheme_Extend # Mn KANNADA SIGN CANDRABINDU
1070010700
0CBC ; Grapheme_Extend # Mn KANNADA SIGN NUKTA
1070110701
0CBF ; Grapheme_Extend # Mn KANNADA VOWEL SIGN I
10702+
0CC0 ; Grapheme_Extend # Mc KANNADA VOWEL SIGN II
1070210703
0CC2 ; Grapheme_Extend # Mc KANNADA VOWEL SIGN UU
1070310704
0CC6 ; Grapheme_Extend # Mn KANNADA VOWEL SIGN E
10705+
0CC7..0CC8 ; Grapheme_Extend # Mc [2] KANNADA VOWEL SIGN EE..KANNADA VOWEL SIGN AI
10706+
0CCA..0CCB ; Grapheme_Extend # Mc [2] KANNADA VOWEL SIGN O..KANNADA VOWEL SIGN OO
1070410707
0CCC..0CCD ; Grapheme_Extend # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
1070510708
0CD5..0CD6 ; Grapheme_Extend # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
1070610709
0CE2..0CE3 ; Grapheme_Extend # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
@@ -10780,9 +10783,11 @@ E01F0..E0FFF ; Default_Ignorable_Code_Point # Cn [3600] <reserved-E01F0>..<rese
1078010783
1B34 ; Grapheme_Extend # Mn BALINESE SIGN REREKAN
1078110784
1B35 ; Grapheme_Extend # Mc BALINESE VOWEL SIGN TEDUNG
1078210785
1B36..1B3A ; Grapheme_Extend # Mn [5] BALINESE VOWEL SIGN ULU..BALINESE VOWEL SIGN RA REPA
10786+
1B3B ; Grapheme_Extend # Mc BALINESE VOWEL SIGN RA REPA TEDUNG
1078310787
1B3C ; Grapheme_Extend # Mn BALINESE VOWEL SIGN LA LENGA
10788+
1B3D ; Grapheme_Extend # Mc BALINESE VOWEL SIGN LA LENGA TEDUNG
1078410789
1B42 ; Grapheme_Extend # Mn BALINESE VOWEL SIGN PEPET
10785-
1B44 ; Grapheme_Extend # Mc BALINESE ADEG ADEG
10790+
1B43..1B44 ; Grapheme_Extend # Mc [2] BALINESE VOWEL SIGN PEPET TEDUNG..BALINESE ADEG ADEG
1078610791
1B6B..1B73 ; Grapheme_Extend # Mn [9] BALINESE MUSICAL SYMBOL COMBINING TEGEH..BALINESE MUSICAL SYMBOL COMBINING GONG
1078710792
1B80..1B81 ; Grapheme_Extend # Mn [2] SUNDANESE SIGN PANYECEK..SUNDANESE SIGN PANGLAYAR
1078810793
1BA2..1BA5 ; Grapheme_Extend # Mn [4] SUNDANESE CONSONANT SIGN PANYAKRA..SUNDANESE VOWEL SIGN PANYUKU
@@ -11024,7 +11029,7 @@ FF9E..FF9F ; Grapheme_Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK.
1102411029
E0020..E007F ; Grapheme_Extend # Cf [96] TAG SPACE..CANCEL TAG
1102511030
E0100..E01EF ; Grapheme_Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
1102611031

11027-
# Total code points: 2185
11032+
# Total code points: 2193
1102811033

1102911034
# ================================================
1103011035

@@ -11316,10 +11321,8 @@ E0100..E01EF ; Grapheme_Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELE
1131611321
0CB5..0CB9 ; Grapheme_Base # Lo [5] KANNADA LETTER VA..KANNADA LETTER HA
1131711322
0CBD ; Grapheme_Base # Lo KANNADA SIGN AVAGRAHA
1131811323
0CBE ; Grapheme_Base # Mc KANNADA VOWEL SIGN AA
11319-
0CC0..0CC1 ; Grapheme_Base # Mc [2] KANNADA VOWEL SIGN II..KANNADA VOWEL SIGN U
11324+
0CC1 ; Grapheme_Base # Mc KANNADA VOWEL SIGN U
1132011325
0CC3..0CC4 ; Grapheme_Base # Mc [2] KANNADA VOWEL SIGN VOCALIC R..KANNADA VOWEL SIGN VOCALIC RR
11321-
0CC7..0CC8 ; Grapheme_Base # Mc [2] KANNADA VOWEL SIGN EE..KANNADA VOWEL SIGN AI
11322-
0CCA..0CCB ; Grapheme_Base # Mc [2] KANNADA VOWEL SIGN O..KANNADA VOWEL SIGN OO
1132311326
0CDD..0CDE ; Grapheme_Base # Lo [2] KANNADA LETTER NAKAARA POLLU..KANNADA LETTER FA
1132411327
0CE0..0CE1 ; Grapheme_Base # Lo [2] KANNADA LETTER VOCALIC RR..KANNADA LETTER VOCALIC LL
1132511328
0CE6..0CEF ; Grapheme_Base # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
@@ -11526,9 +11529,7 @@ E0100..E01EF ; Grapheme_Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELE
1152611529
1AA8..1AAD ; Grapheme_Base # Po [6] TAI THAM SIGN KAAN..TAI THAM SIGN CAANG
1152711530
1B04 ; Grapheme_Base # Mc BALINESE SIGN BISAH
1152811531
1B05..1B33 ; Grapheme_Base # Lo [47] BALINESE LETTER AKARA..BALINESE LETTER HA
11529-
1B3B ; Grapheme_Base # Mc BALINESE VOWEL SIGN RA REPA TEDUNG
11530-
1B3D..1B41 ; Grapheme_Base # Mc [5] BALINESE VOWEL SIGN LA LENGA TEDUNG..BALINESE VOWEL SIGN TALING REPA TEDUNG
11531-
1B43 ; Grapheme_Base # Mc BALINESE VOWEL SIGN PEPET TEDUNG
11532+
1B3E..1B41 ; Grapheme_Base # Mc [4] BALINESE VOWEL SIGN TALING..BALINESE VOWEL SIGN TALING REPA TEDUNG
1153211533
1B45..1B4C ; Grapheme_Base # Lo [8] BALINESE LETTER KAF SASAK..BALINESE LETTER ARCHAIC JNYA
1153311534
1B4E..1B4F ; Grapheme_Base # Po [2] BALINESE INVERTED CARIK SIKI..BALINESE INVERTED CARIK PAREREN
1153411535
1B50..1B59 ; Grapheme_Base # Nd [10] BALINESE DIGIT ZERO..BALINESE DIGIT NINE
@@ -12811,7 +12812,7 @@ FFFC..FFFD ; Grapheme_Base # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEME
1281112812
30000..3134A ; Grapheme_Base # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
1281212813
31350..323AF ; Grapheme_Base # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
1281312814

12814-
# Total code points: 152743
12815+
# Total code points: 152735
1281512816

1281612817
# ================================================
1281712818

@@ -13026,8 +13027,11 @@ ABED ; Grapheme_Link # Mn MEETEI MAYEK APUN IYEK
1302613027
0C81 ; InCB; Extend # Mn KANNADA SIGN CANDRABINDU
1302713028
0CBC ; InCB; Extend # Mn KANNADA SIGN NUKTA
1302813029
0CBF ; InCB; Extend # Mn KANNADA VOWEL SIGN I
13030+
0CC0 ; InCB; Extend # Mc KANNADA VOWEL SIGN II
1302913031
0CC2 ; InCB; Extend # Mc KANNADA VOWEL SIGN UU
1303013032
0CC6 ; InCB; Extend # Mn KANNADA VOWEL SIGN E
13033+
0CC7..0CC8 ; InCB; Extend # Mc [2] KANNADA VOWEL SIGN EE..KANNADA VOWEL SIGN AI
13034+
0CCA..0CCB ; InCB; Extend # Mc [2] KANNADA VOWEL SIGN O..KANNADA VOWEL SIGN OO
1303113035
0CCC..0CCD ; InCB; Extend # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
1303213036
0CD5..0CD6 ; InCB; Extend # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
1303313037
0CE2..0CE3 ; InCB; Extend # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
@@ -13106,9 +13110,11 @@ ABED ; Grapheme_Link # Mn MEETEI MAYEK APUN IYEK
1310613110
1B34 ; InCB; Extend # Mn BALINESE SIGN REREKAN
1310713111
1B35 ; InCB; Extend # Mc BALINESE VOWEL SIGN TEDUNG
1310813112
1B36..1B3A ; InCB; Extend # Mn [5] BALINESE VOWEL SIGN ULU..BALINESE VOWEL SIGN RA REPA
13113+
1B3B ; InCB; Extend # Mc BALINESE VOWEL SIGN RA REPA TEDUNG
1310913114
1B3C ; InCB; Extend # Mn BALINESE VOWEL SIGN LA LENGA
13115+
1B3D ; InCB; Extend # Mc BALINESE VOWEL SIGN LA LENGA TEDUNG
1311013116
1B42 ; InCB; Extend # Mn BALINESE VOWEL SIGN PEPET
13111-
1B44 ; InCB; Extend # Mc BALINESE ADEG ADEG
13117+
1B43..1B44 ; InCB; Extend # Mc [2] BALINESE VOWEL SIGN PEPET TEDUNG..BALINESE ADEG ADEG
1311213118
1B6B..1B73 ; InCB; Extend # Mn [9] BALINESE MUSICAL SYMBOL COMBINING TEGEH..BALINESE MUSICAL SYMBOL COMBINING GONG
1311313119
1B80..1B81 ; InCB; Extend # Mn [2] SUNDANESE SIGN PANYECEK..SUNDANESE SIGN PANGLAYAR
1311413120
1BA2..1BA5 ; InCB; Extend # Mn [4] SUNDANESE CONSONANT SIGN PANYAKRA..SUNDANESE VOWEL SIGN PANYUKU
@@ -13351,6 +13357,6 @@ FF9E..FF9F ; InCB; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HA
1335113357
E0020..E007F ; InCB; Extend # Cf [96] TAG SPACE..CANCEL TAG
1335213358
E0100..E01EF ; InCB; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256
1335313359

13354-
# Total code points: 2184
13360+
# Total code points: 2192
1335513361

1335613362
# EOF

unicodetools/data/ucd/dev/DerivedNormalizationProps.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# DerivedNormalizationProps-16.0.0.txt
2-
# Date: 2024-07-25, 16:10:21 GMT
2+
# Date: 2024-07-25, 16:16:24 GMT
33
# © 2024 Unicode®, Inc.
44
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
55
# For terms of use and license, see https://www.unicode.org/terms_of_use.html

0 commit comments

Comments
 (0)