Migrate RegExp to regonaut engine by auvred · Pull Request #678 · dop251/goja

auvred · 2025-09-13T08:57:00Z

This PR replaces both https://pkg.go.dev/regexp and https://github.com/dlclark/regexp2 usages with https://github.com/auvred/regonaut (an ES2025-compatible RegExp engine).

I've tried to keep diff minimal for easier review. For now, this PR only replaces the RegExp engine and enables some TC39 tests - no new features are introduced yet.

If this PR is merged, I plan to submit a few follow-up PRs to add support for:

Unicode sets (v flag)
Named groups
Match Indices (d flag)

auvred · 2025-09-13T08:58:35Z

builtin_regexp.go

-	if limitValue != _undefined {
-		limit = int(toUint32(limitValue))
+
+	var lim int64


regexpproto_stdSplitter implementation is copied from the regexpproto_stdSplitterGeneric algorithm

auvred · 2025-09-13T09:02:30Z

parser/parser.go

 type Mode uint

 const (
-	IgnoreRegExpErrors Mode = 1 << iota // Ignore RegExp compatibility errors (allow backtracking)


IgnoreRegExpErrors was introduced in the initial commit (2577360). However, even back then it was only used in tests and nowhere else.

I'm not entirely sure about removing it completely, though. It has been part of the public API for 9 years. Nevertheless, I don't think anyone is actually relying on it in production code.

I don't think it would be a problem.

dop251 · 2025-09-14T15:51:02Z

This looks very promising. When I looked into named groups I realised regexp2 was not the right fit as it's a port of the .NET library which has a very limited ECMAScript compatibility, and in order to make it fully compatible some fundamental changes are needed. I even considered forking it at some point but then I realised I didn't have enough time to make those changes and maintain the code.

I've had a quick look and my main concern so far is performance. Even running the very limited set of benchmarks from goja shows a significant degradation in both time and memory:

                                │ regexp_before.txt │           regexp_after.txt            │
                                │      sec/op       │    sec/op     vs base                 │
RegexpSplitWithBackRef-16               3.243µ ± 1%   24.268µ ± 1%  +648.45% (p=0.000 n=10)
RegexpMatch-16                          44.84µ ± 1%   280.40µ ± 1%  +525.40% (p=0.000 n=10)
RegexpMatchCache-16                     1.417m ± 1%    4.315m ± 1%  +204.40% (p=0.000 n=10)
RegexpMatchAll-16                       1.968m ± 1%    5.059m ± 1%  +157.05% (p=0.000 n=10)
RegexpSingleExec/Re-ASCII-16            751.8n ± 1%   2196.0n ± 2%  +192.10% (p=0.000 n=10)
RegexpSingleExec/Re2-ASCII-16           1.104µ ± 1%    2.424µ ± 2%  +119.52% (p=0.000 n=10)
RegexpSingleExec/Re-Unicode-16          1.383µ ± 1%    2.115µ ± 1%   +52.93% (p=0.000 n=10)
RegexpSingleExec/Re2-Unicode-16         1.141µ ± 2%    2.365µ ± 1%  +107.27% (p=0.000 n=10)
geomean                                 12.32µ         37.55µ       +204.77%

                                │ regexp_before.txt │           regexp_after.txt            │
                                │       B/op        │     B/op      vs base                 │
RegexpMatchCache-16                    1.336Mi ± 0%   8.958Mi ± 0%  +570.49% (p=0.000 n=10)
RegexpMatchAll-16                      1.991Mi ± 0%   9.611Mi ± 0%  +382.67% (p=0.000 n=10)
RegexpSingleExec/Re-ASCII-16             759.0 ± 0%    1384.0 ± 0%   +82.35% (p=0.000 n=10)
RegexpSingleExec/Re2-ASCII-16          1.266Ki ± 0%   1.398Ki ± 0%   +10.49% (p=0.000 n=10)
RegexpSingleExec/Re-Unicode-16           793.0 ± 0%    1304.0 ± 0%   +64.44% (p=0.000 n=10)
RegexpSingleExec/Re2-Unicode-16        1.273Ki ± 0%   1.320Ki ± 0%    +3.68% (p=0.000 n=10)
geomean                                11.71Ki        25.68Ki       +119.28%

                                │ regexp_before.txt │           regexp_after.txt           │
                                │     allocs/op     │  allocs/op   vs base                 │
RegexpMatchCache-16                     22.34k ± 0%   33.37k ± 0%   +49.39% (p=0.000 n=10)
RegexpMatchAll-16                       31.86k ± 0%   42.91k ± 0%   +34.66% (p=0.000 n=10)
RegexpSingleExec/Re-ASCII-16             11.00 ± 0%    22.00 ± 0%  +100.00% (p=0.000 n=10)
RegexpSingleExec/Re2-ASCII-16            18.00 ± 0%    24.00 ± 0%   +33.33% (p=0.000 n=10)
RegexpSingleExec/Re-Unicode-16           14.00 ± 0%    21.00 ± 0%   +50.00% (p=0.000 n=10)
RegexpSingleExec/Re2-Unicode-16          20.00 ± 0%    23.00 ± 0%   +15.00% (p=0.000 n=10)
geomean                                  184.5         267.4        +44.89%

Do you plan to do any performance improvements? The thing is most people are not even aware of the ECMAScript regular expression quirks, but they would notice if their ^[a-z]+$ suddenly ran slower...

auvred · 2025-09-16T11:12:33Z

Do you plan to do any performance improvements? The thing is most people are not even aware of the ECMAScript regular expression quirks, but they would notice if their ^[a-z]+$ suddenly ran slower...

I've been planning to implement a second finite automata engine (alongside the existing backtracking engine) to improve performance. Unfortunately, implementing this new engine is quite time-consuming, and I'm currently limited on time. I'll make this PR a draft for now and will come back to it once the finite automata engine is finished.

dop251 · 2025-09-24T20:51:39Z

Thanks. I'll add a couple of comments on the PR, as they would still apply...

dop251 · 2025-09-24T20:56:41Z

string.go

 	utf16Reader() utf16Reader
 	utf16RuneReader() io.RuneReader
 	utf16Runes() []rune
+	toUnicode() unicodeString


The current design assumes that an ASCII string is always an asciiString, never unicodeString. There are a couple of optimisations based on this assumption (like the equality operator for example). Even though this is only used for regexp, having this method on the interface would tempt someone to misuse it at some point.

A better way would be either to have utf16() []uint16 method instead, or, use devirtualizeString.

dop251 · 2025-09-24T20:59:09Z

regexp.go

-	match, result := r.execRegexp(target)
-	if match {
-		return r.execResultToArray(target, result)
+	targetUtf16 := target.toUnicode()


It seems a little wasteful to convert every single string to unicode. A significant proportion of strings are ASCII, in some environments they are all ASCII...

zbysir · 2026-02-14T07:51:37Z

@auvred Awesome! I've also encountered some regex compatibility issues, and it seems your library can solve this problem well. However, dop251 also mentioned its performance concerns, so merging might be a bit challenging.

How about modifying the code to have both modes coexist: use Go build tags, keeping the existing code as the default, and allow using your code via go build -tags regonaut.

This way, we can proceed with the merge smoothly and offer your library as an experimental option for those who need it.

auvred · 2026-02-14T08:05:40Z

How about modifying the code to have both modes coexist: use Go build tags, keeping the existing code as the default, and allow using your code via go build -tags regonaut.

This way, we can proceed with the merge smoothly and offer your library as an experimental option for those who need it.

This sounds nice! If dop251 is OK with it, I can try modifying this PR to use the build tags approach.

P.S. I haven't forgotten about this PR, it's still in my TODO list, but I've been swamped with other projects. Introducing a new regexp engine is a pretty complex task and it requires a lot of time, which I currently don't have :( Also, I don't want to have an LLM implement it instead of me, because every single line of regonaut was written manually with strict adherence to the ECMAScript spec, and I don't want to violate that principle.

zbysir · 2026-02-14T08:14:27Z

How about modifying the code to have both modes coexist: use Go build tags, keeping the existing code as the default, and allow using your code via go build -tags regonaut.
This way, we can proceed with the merge smoothly and offer your library as an experimental option for those who need it.

This sounds nice! If dop251 is OK with it, I can try modifying this PR to use the build tags approach.

P.S. I haven't forgotten about this PR, it's still in my TODO list, but I've been swamped with other projects. Introducing a new regexp engine is a pretty complex task and it requires a lot of time, which I currently don't have :( Also, I don't want to have an LLM implement it instead of me, because every single line of regonaut was written manually with strict adherence to the ECMAScript spec, and I don't want to violate that principle.

Absolutely! No need to rush this thing. Thanks a bunch for your contribution! 😊

auvred added 4 commits September 13, 2025 11:08

migrate to regonaut

527d3b2

get rid of unicodeSets for now

2004565

cleanup

36c76bd

enable language/literals/regexp/u- tests

dc87376

auvred commented Sep 13, 2025

View reviewed changes

auvred marked this pull request as draft September 16, 2025 11:13

dop251 reviewed Sep 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate RegExp to regonaut engine#678

Migrate RegExp to regonaut engine#678
auvred wants to merge 4 commits intodop251:masterfrom
auvred:use-regonaut-for-regexps

auvred commented Sep 13, 2025

Uh oh!

auvred Sep 13, 2025

Uh oh!

auvred Sep 13, 2025

Uh oh!

dop251 Sep 24, 2025

Uh oh!

dop251 commented Sep 14, 2025

Uh oh!

auvred commented Sep 16, 2025

Uh oh!

dop251 commented Sep 24, 2025

Uh oh!

dop251 Sep 24, 2025 •

edited

Loading

Uh oh!

dop251 Sep 24, 2025

Uh oh!

zbysir commented Feb 14, 2026

Uh oh!

auvred commented Feb 14, 2026

Uh oh!

zbysir commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

auvred commented Sep 13, 2025

Uh oh!

auvred Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

auvred Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

dop251 Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

dop251 commented Sep 14, 2025

Uh oh!

auvred commented Sep 16, 2025

Uh oh!

dop251 commented Sep 24, 2025

Uh oh!

dop251 Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dop251 Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

zbysir commented Feb 14, 2026

Uh oh!

auvred commented Feb 14, 2026

Uh oh!

zbysir commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dop251 Sep 24, 2025 •

edited

Loading