-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Description
Discussion
Currently, all StringScanner scanning or checking methods delegate to the private helper #match (sometimes via #scan). All of these unconditionally perform a #byte_slice on the source string, allocating a string the length of the match.
However, many applications, especially those using #skip, completely ignore the resulting string.
Proposal
- Refactor
#matchto not return the string, and instead grab the source string directly in the various#scanand#checkoverloads, ideally using@last_match[0]. - For regex matches, this would result in grabbing the string directly from the PCRE match object rather than re-slicing it from the source string.
- Make
StringMatchDatalazy - instead of holding the pre-sliced string, it should hold the source string and byte offset/length, and allocate the sliced string only when required by#[]. - Benchmark the above and verify that it actually saves time/allocations.
Alternatives
We could also avoid allocation with a StringView or StringSlice type of some kind, similar to this thread. This would complicate GC somewhat, but in terms of StringScanner-based parsers it is common to keep the original source code live anyways, to slice bits out of it for error reporting.
Reactions are currently unavailable