Commit c0639b8
committed
Rewrite UTF-8 validation in shift-based DFA
This gives plenty of performance increase on validating strings with
many non-ASCII codepoints, which is the normal case for almost every
non-English content.
Shift-based DFA algorithm does not use SIMD instructions and does not
rely on the branch predictor to get a good performance, thus is good as
a general, default, architecture-agnostic implementation.
There is still a bypass for ASCII-only strings to benefits from
auto-vectorization, if the target supports.1 parent 01e4f19 commit c0639b8
1 file changed
+274
-145
lines changed
0 commit comments