Skip to content

Inconsistent (?0) behaviour in the presence of endanchored #331

@addisoncrump

Description

@addisoncrump

Discovered by #322.

I tried to minify and root cause this further, but I got a bit stuck as I've never used recursive patterns. Consider the following output:

$ ./pcre2test -jit
PCRE2 version 10.43-DEV 2023-04-14 (8-bit)
  re> /|a(?0)/endanchored
data> aaaa
 0: aaaa
data> aaaa\=no_jit
 0: a

It seems that JIT and non-JIT do not agree on what is matched. I suspect that they are taking different branches in the recursive pattern; thie JIT is greedily taking the right branch, whereas the non-JIT is only taking the right branch at the last character. I suspect that the non-JIT treats the pattern's end anchor as part of the pattern (and thus the only final character is valid) whereas the JIT does not and reads the whole input before hitting the end. This notably does not appear when using a $ at the end as opposed to the endanchored flag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions