Conversation
| if limitValue != _undefined { | ||
| limit = int(toUint32(limitValue)) | ||
|
|
||
| var lim int64 |
There was a problem hiding this comment.
regexpproto_stdSplitter implementation is copied from the regexpproto_stdSplitterGeneric algorithm
| type Mode uint | ||
|
|
||
| const ( | ||
| IgnoreRegExpErrors Mode = 1 << iota // Ignore RegExp compatibility errors (allow backtracking) |
There was a problem hiding this comment.
IgnoreRegExpErrors was introduced in the initial commit (2577360). However, even back then it was only used in tests and nowhere else.
I'm not entirely sure about removing it completely, though. It has been part of the public API for 9 years. Nevertheless, I don't think anyone is actually relying on it in production code.
There was a problem hiding this comment.
I don't think it would be a problem.
|
This looks very promising. When I looked into named groups I realised regexp2 was not the right fit as it's a port of the .NET library which has a very limited ECMAScript compatibility, and in order to make it fully compatible some fundamental changes are needed. I even considered forking it at some point but then I realised I didn't have enough time to make those changes and maintain the code. I've had a quick look and my main concern so far is performance. Even running the very limited set of benchmarks from goja shows a significant degradation in both time and memory: Do you plan to do any performance improvements? The thing is most people are not even aware of the ECMAScript regular expression quirks, but they would notice if their |
I've been planning to implement a second finite automata engine (alongside the existing backtracking engine) to improve performance. Unfortunately, implementing this new engine is quite time-consuming, and I'm currently limited on time. I'll make this PR a draft for now and will come back to it once the finite automata engine is finished. |
|
Thanks. I'll add a couple of comments on the PR, as they would still apply... |
| utf16Reader() utf16Reader | ||
| utf16RuneReader() io.RuneReader | ||
| utf16Runes() []rune | ||
| toUnicode() unicodeString |
There was a problem hiding this comment.
The current design assumes that an ASCII string is always an asciiString, never unicodeString. There are a couple of optimisations based on this assumption (like the equality operator for example). Even though this is only used for regexp, having this method on the interface would tempt someone to misuse it at some point.
A better way would be either to have utf16() []uint16 method instead, or, use devirtualizeString.
| match, result := r.execRegexp(target) | ||
| if match { | ||
| return r.execResultToArray(target, result) | ||
| targetUtf16 := target.toUnicode() |
There was a problem hiding this comment.
It seems a little wasteful to convert every single string to unicode. A significant proportion of strings are ASCII, in some environments they are all ASCII...
|
@auvred Awesome! I've also encountered some regex compatibility issues, and it seems your library can solve this problem well. However, dop251 also mentioned its performance concerns, so merging might be a bit challenging. How about modifying the code to have both modes coexist: use Go build tags, keeping the existing code as the default, and allow using your code via go build -tags regonaut. This way, we can proceed with the merge smoothly and offer your library as an experimental option for those who need it. |
This sounds nice! If dop251 is OK with it, I can try modifying this PR to use the build tags approach. P.S. I haven't forgotten about this PR, it's still in my TODO list, but I've been swamped with other projects. Introducing a new regexp engine is a pretty complex task and it requires a lot of time, which I currently don't have :( Also, I don't want to have an LLM implement it instead of me, because every single line of regonaut was written manually with strict adherence to the ECMAScript spec, and I don't want to violate that principle. |
Absolutely! No need to rush this thing. Thanks a bunch for your contribution! 😊 |
This PR replaces both https://pkg.go.dev/regexp and https://github.com/dlclark/regexp2 usages with https://github.com/auvred/regonaut (an ES2025-compatible RegExp engine).
I've tried to keep diff minimal for easier review. For now, this PR only replaces the RegExp engine and enables some TC39 tests - no new features are introduced yet.
If this PR is merged, I plan to submit a few follow-up PRs to add support for:
vflag)dflag)