Skip to content

<regex>: Avoid heap allocations for few capturing groups and stack frames #5969

@muellerj2

Description

@muellerj2

Even if string matching only involves a few capturing groups or repetitions, a few buffers are still allocated on the heap by the matcher:

  • The vector<bool> _Grp_valid in _Matcher3::_Tgt_state to store which capturing groups are matched.
  • The vector<_Grp_t> _Grps in _Matcher3::_Tgt_state to store the extents of capturing groups.
  • If matching follows the leftmost-longest rule, vectors of the same sizes in _Matcher3::_Res.
  • The stack frames in vector<_Rx_state_frame_t<_It>> _Matcher3::_Frames.

Only one of these vectors, _Matcher3::_Frames, can grow beyond its initial size during matching.

These heap allocations are comparatively costly especially when matching short inputs, so we should avoid performing these allocations and instead use some stack-allocated buffer(s) if the required amount of memory is small enough.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceMust go fasterregexmeow is a substring of homeowner

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions