Commit e3006b5
committed
Mozilla bug 1997049 - Accelerate more HTML tokenizer states with SIMD. r=smaug
This restores the exact old structure for Java, and only injects the
SIMD acceleration between incrementing pos and checking if pos reached
endPos.
This makes going back to SIMD after a character reference significantly
simpler. The downside is that wholly-non-BMP text (as opposed to
isolated non-BMP emoji or Hanzi) ends up uselessly bouncing to the SIMD
code without benefiting from it when loading from network and counting
column numbers as Unicode scalar values.
If we want to avoid this failure mode, we should change column numbers
to count UTF-16 code units instead of scalars. Either way, the column
is "wrong" in some cases.
Differential Revision: https://phabricator.services.mozilla.com/D2706731 parent 4686aff commit e3006b5
File tree
3 files changed
+137
-237
lines changed- src/nu/validator/htmlparser/impl
- translator-src/nu/validator/htmlparser/cpptranslate
3 files changed
+137
-237
lines changed
0 commit comments