Skip to content

Use oniguruma-to-es for WASM-free mode #50

@slevithan

Description

@slevithan

Previously discussed in #48:

It would be nice to drop WASM [...] However: 85% [of TextMate grammars] is not enough. Needs to be 99.95% or so. And, something like 99% for me to start shipping it as an experimental thing?

The Oniguruma-To-ES library now supports 100% of the 222 TextMate grammars provided by Shiki.

Oniguruma and JS RegExp have a ton of complicated syntax and behavior differences, so this was no easy task. But Oniguruma-To-ES is an extremely sophisticated regex translator. There is detailed documentation.

Although I'd be happy to answer any questions, I won't be able to send an integration PR. However, Shiki's JS engine (which is a relatively thin wrapper for Oniguruma-To-ES) provides a solid example of how it can be used for TextMate grammar highlighting. See also Shiki docs: RegExp Engines.


If you decide for whatever reason not to pursue WASM-free Oniguruma support, you might still be interested in the Oniguruma optimizer I've also built on top of the same parser used by Oniguruma-To-ES. Shiki uses the optimizer to minify and increase the performance of all of the grammars it provides, by pre-running them through the optimizer in the tm-grammars package.

As an example, the optimizer shaves ~40,000 characters off of just the regexes for the C++ grammar (without changing their meaning at all, and despite the C++ grammar not including any insignificant whitespace or comments in its regexes). It also increases the C++ grammar's performance by ~30% (whether running it in Oniguruma via WASM or using native JS RegExp via Oniguruma-To-ES). Performance improvements are explained at a high level in the optimizer's readme.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions