-
-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Previously discussed in #48:
It would be nice to drop WASM [...] However: 85% [of TextMate grammars] is not enough. Needs to be 99.95% or so. And, something like 99% for me to start shipping it as an experimental thing?
The Oniguruma-To-ES library now supports 100% of the 222 TextMate grammars provided by Shiki.
Oniguruma and JS RegExp have a ton of complicated syntax and behavior differences, so this was no easy task. But Oniguruma-To-ES is an extremely sophisticated regex translator. There is detailed documentation.
Although I'd be happy to answer any questions, I won't be able to send an integration PR. However, Shiki's JS engine (which is a relatively thin wrapper for Oniguruma-To-ES) provides a solid example of how it can be used for TextMate grammar highlighting. See also Shiki docs: RegExp Engines.
If you decide for whatever reason not to pursue WASM-free Oniguruma support, you might still be interested in the Oniguruma optimizer I've also built on top of the same parser used by Oniguruma-To-ES. Shiki uses the optimizer to minify and increase the performance of all of the grammars it provides, by pre-running them through the optimizer in the tm-grammars package.
As an example, the optimizer shaves ~40,000 characters off of just the regexes for the C++ grammar (without changing their meaning at all, and despite the C++ grammar not including any insignificant whitespace or comments in its regexes). It also increases the C++ grammar's performance by ~30% (whether running it in Oniguruma via WASM or using native JS RegExp via Oniguruma-To-ES). Performance improvements are explained at a high level in the optimizer's readme.