You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From Ken:
For this delta, focus on all the remaining script-specific punctuation and symbols.
1. Dandas and double dandas
Tulu-Tigalari and Kirat Rai both have script-specific dandas. Intercalate them
into unidata.txt roughly in code point order. This puts Kirat Rai
after Mro in the list of dandas, and it puts Tulu-Tigalari after Khojki
in the list of dandas. The Balinese inverted carik siki and inverted carik
pareren are also a variety of danda and double danda. Those can be interlaced
with the non-inverted danda and double danda for Balinese.
Regenerate allkeys.txt, and verify that these six dandas weight as expected.
2. Miscellaneous other punctuation
1B7F BALINESE PANTI BAWAK is a variant of 1B5A BALINESE PANTI. Just intercalate
after 1B5A.
10D6E GARAY HYPHEN is another script-specific hyphen. Add to the list of
script-specific hyphens in the dashes and hyphens subsection of punctuation.
113D7..113D8, the two Tulu-Tigalari pushpikas can just be intercalated in
the miscellaneous punctuation section of unidata.txt, after the Khojki
abbreviation sign and before the Newa punctuation.
16D6D KIRAT RAI SIGN YUPI is another Indic abbreviation sign. Add in
the script-specific miscellaneous punctuation section of unidata.txt,
in code point order.
11BE1 SUNUWAR SIGN PVO is an auspicious mark, similar in some ways to
a siddham mark or a Devanagari bhale. These may have particular
pronunciations, but are treated as punctuation marks. Just add to
unidata.txt in the script-specific section of miscellaneous punctuation
in code point order. That will put it right after 119E2 NANDINAGARI
SIGN SIDDHAM.
1E5FF OL ONAL ABBREVIATION SIGN. Likewise just add in the script-specific
miscellaneous punctuation section in code point order.
Regenerate allkeys.txt, and verify that these seven punctuation marks
weight as expected and are indicated as variables.
3. Miscellaneous other symbols
Garay plus sign (10D8E) and minus sign (10D8F). In the absence of any
better information about these, just intercalate in the math symbols
section of unidata.txt after 002B PLUS SIGN and 2212 MINUS SIGN,
respectively.
Regenerate allkeys.txt and verify that these two symbols weight
as expected and are indicated as variables.
4. Garay reduplication mark
10D6F GARAY REDUPLICATION MARK has its properties misconstrued. It is
explained in the proposal (L2/22-048) under section 4, Punctuation.
However, most examples of iteration or reduplication marks are
designated as gc=Lm, Extender=True in the UCD. This keeps the
reduplication or iteration mark within the context of the word for
segmentation purposes, which is usually the desired outcome. A few
similar marks have ended up as gc=Po. But the Garay proposal
specifies the mark as gc=So, which is clearly wrong.
I am updating UnicodeData.txt for 16.0 to change this character from gc=So to
gc=Lm, and then updating the underlying library for the sifter
accordingly.
With the updated interpretation of properties, 10D6F can be
intercalated in the extenders section of unidata.txt. I put it
between AAF4 MEETEI MAYEK WORD REPETITION MARK and
16B42 PAHAWH HMONG SIGN VOS NRUA.
Regenerate allkeys.txt and verify that these 10D6F shows up with a
primary weight and is grouped with the extenders.
16 more down, 4 to go.
Archive this delta 10:
unidata-16.0.0d10.txt (1602884 bytes, 10/08/2023)
0 commit comments