You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
git's diff-words support has a detail that can be a little dangerous:
any text not matched by a given language's tokenization pattern is
treated as whitespace and changes in such text would go unnoticed.
Therefore each of the built-in regexes allows a special token type
consisting of a single non-whitespace character [^[:space:]].
To make sure UTF-8 sequences remain human readable, the builtin
regexes also have a special token type for runs of bytes with the high
bit set. In English, non-ASCII characters are usually isolated so
this is analogous to the [^[:space:]] pattern, except it matches a
single _multibyte_ character despite use of the C locale.
Unfortunately it is easy to make typos or forget entirely to include
these catch-all token types when adding support for new languages (see
v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes,
2010-12-18). Avoid this by including them automatically within the
PATTERNS and IPATTERN macros.
While at it, change the UTF-8 sequence token type to match exactly one
non-ASCII multi-byte character, rather than an arbitrary run of them.
Suggested-by: Thomas Rast <[email protected]>
Signed-off-by: Jonathan Nieder <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
0 commit comments