Skip to content

Lazy matching consumes more than expected #532

@Lancelotbronner

Description

@Lancelotbronner

I'm writing a lexer for C++ and comments conveniently aren't allowed to nest so I can use /\*.*?\*/ to match anything within /* and */, either with the (?m) flag or .|\n instead of ..

The beginning was being matched as Slash / and Star *. I tried increasing and decreasing priority on Comment but it didn't work. I replaced .*? with (.|\n)*? and now it matches too much.

I've tried all combinations and it either doesn't match or match everything.

It seems to be completely ignoring the end of the comment and keeps going.
The regex works as expected in playgrounds like regexr.com.

Did I encounter unsupported behaviour? Is there anything else I can do?

#[derive(Logos)]
enum Token {
    #[regex(r"(?m)\/\*.*?\*\/")]
    Comment,
    #[token("/")]
    Slash,
    #[token("*")]
    Star,
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-lazy-matchingTriage: issue related to lazy matchingbugSomething isn't workingenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions