-
Notifications
You must be signed in to change notification settings - Fork 38
Fix backspace lexeme escaping #483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I guess I have two questions, Whether people can currently use/rely on the Similarly, when we make I'm actually a bit surprised that this isn't something configurable by RegexBuilder but perhaps it would be nice if we could just add a flag to CTLexerBuilder such as I'm not certain if we actually have a |
|
This PR is getting at a fairly deep point: because I was lazy, lrlex uses the I expected that ultimately we would implement a full Lex-compatible matcher. But I'm unlikely to get to any year/decade soon, and we now have a fairly substantial userbase who, whether they intended to or not, are probably now relying on regex compatibility. I would be surprised if someone isn't using Arguably what we should do is allow lrlex to support different flavours of regexes. lrpar does this tolerably well with https://docs.rs/lrpar/0.13.8/lrpar/struct.CTParserBuilder.html#method.yacckind. lrlex should probably grow [Note: lrlex has a So summary: we can't merge this PR as-is for backwards compatibility reasons, but I would love to merge a variant of it! It would need to do something like:
I know that being asked to do more work is never fun, so I apologise for putting up barriers -- I would like to think, though, that this isn't too much work, and it would make & keep lots of people happy! |
I hadn't really thought of nimbleparse in my earlier reply, it seems like it is also already an issue with the
I had originally added these for similar reasons as are given by this patch because the default
So perhaps those code snippets should also make their way towards nimbleparse? |
|
@ratmice On reflection, you're right: I've muddled two things up. Mea culpa! Perhaps the right thing is to make this PR use the existing |
|
@ltratt Sure thing. Introducing this in a non-breaking way makes sense, even though using word break assertions in a lexer is of dubious merit. I've added the option, defaulted it to off, and added a test case that it's respected. Let me know if I can help any further :) |
In posix lex, '\b' represents the backspace character. In the rust 'regex' crate, it represents a word boundary assertion. This patch adds an option to maintain posix-lex semantics, and tests that all escapes are interpreted correctly, under both sets of semantics.
c3b6c27 to
b2b65ec
Compare
|
Thanks! |
In posix lex, '\b' represents the backspace character. In the rust 'regex' crate, it represents a word boundary assertion.
This patch adds a test that all posix escapes are interpreted correctly, and a fix for the backspace escape incongruity.