You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A combining mark (and ZWJ) usually attach to the preceding character.
That makes sense, an 'a' with an acute accent following it, are
considered a unit.
But marks do not attach to some classes of characters. If you have a
space followed by an acute accent, the accent stands on its own and
doesn't hang over the space.
What Unicode says to do, then is to pretend that the mark is actually an
alphabetic.
The implementation of \b{lb} includes a bunch of DFAs. And in several,
it didn't implement this properly.
This commit fixes this. When parsing backwards in the input to examine
the context, in some DFAs it is supposed to ignore intervening marks.
But when it gets to the end and the character is one the marks don't
attach to, it should return alphabetic instead of the character.
This commit changes to do that.
It required some calls to the backwards parse routine to change to
handle the marks themselves.
The code passed the extensive tests furnished by Unicode for 16.0.
They have provided a new test file for 17.0, which has new tests, and
it failed for one test.
This fix applies to 16.0 as well as 17.0.
0 commit comments