Skip to content

StringScanner performance proposal: Refactor to skip string allocations when not used #16625

@jneen

Description

@jneen

Discussion

Currently, all StringScanner scanning or checking methods delegate to the private helper #match (sometimes via #scan). All of these unconditionally perform a #byte_slice on the source string, allocating a string the length of the match.

However, many applications, especially those using #skip, completely ignore the resulting string.

Proposal

  • Refactor #match to not return the string, and instead grab the source string directly in the various #scan and #check overloads, ideally using @last_match[0].
  • For regex matches, this would result in grabbing the string directly from the PCRE match object rather than re-slicing it from the source string.
  • Make StringMatchData lazy - instead of holding the pre-sliced string, it should hold the source string and byte offset/length, and allocate the sliced string only when required by #[].
  • Benchmark the above and verify that it actually saves time/allocations.

Alternatives

We could also avoid allocation with a StringView or StringSlice type of some kind, similar to this thread. This would complicate GC somewhat, but in terms of StringScanner-based parsers it is common to keep the original source code live anyways, to slice bits out of it for error reporting.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions