Skip to content

Inconsistent matching with repeated backreferences and match_unset_backref #335

@addisoncrump

Description

@addisoncrump

Discovered by #322.

The following regex demonstrates the issue:

  re> /(a)|\1+/match_unset_backref
data> ba
 0: a
 1: a
data> ba\=no_jit
 0: 

I believe a similar, related case is the following:

  re> /(a)|\1+/match_unset_backref
data> bbbb
No match
data> bbbb\=no_jit
 0: 

What's very curious is that this does not appear without the repetition:

  re> /(a)|\1/match_unset_backref
data> ba
 0: 
data> ba\=no_jit
 0:
data> a
 0: a
 1: a
data> a\=no_jit
 0: a
 1: a

Finally, it appears with fixed repetitions, but not range repetitions:

  re> /(a)|\1{128}/match_unset_backref
data> ba
 0: a
 1: a
data> ba\=no_jit
 0:
data>
  re> /(a)|\1{,128}/match_unset_backref
data> ba
 0: 
data> ba\=no_jit
 0: 
data> 

This implies to me that there is some issue with how the JIT handles repetitions of empty backreferences.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions