Backslash sequences #5
Replies: 3 comments 16 replies
-
BTW, I've bought both RegexBuddy and RegexMagic form JGS (author of the website you linked), so if you need me to test some RegExs for you I'll happily do it. Both tools have a custom engine that includes all versions of the major RegEx engines (so that you can test backward compatibility issues with any engine) plus the custom engine by JGS, which is very powerful (also documented at the website). One of these two programs also allows debugging a RegEx to break it down into each single passage, in case you need to compare expected behaviour in your code with actual behaviour by other engines. As for the shorthand classes to implement, it really depends on what your engine goals are — which I'm guessing is mostly oriented toward lexers creation? I'm not quite sure that Some other useful shorthands can be found here:
I know that the above don't all qualify as characters shorthand, for some of them are more abstract in nature, but still... |
Beta Was this translation helpful? Give feedback.
-
|
I have now looked again at the documentation of Rust's crate With the help of the very useful tool I have adjusted the listing at the top and the relevant issues to use the Unicode's character classes. For example, |
Beta Was this translation helpful? Give feedback.
-
|
A backslash sequence that matches a byte value can also be useful. So the RegEx engine can also be used to match byte sequences (beside characters) in binary data. This is not so unusual anymore. The question comes up again and again on the Internet (search for: regex binary data). The NFA/DFA of the RegEx engine already works byte-based, so the implementation should not be difficult. Example: Syntax:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
In addition to character classes, there will also be shorthand character classes. However, I'm not quite sure yet which ones there should be and which characters they should cover.
According to this website, the different RegEx engines cover different characters in the shorthand character classes:
https://www.regular-expressions.info/shorthand.html
The current listing:
\rfor carriage return (Add escape sequence\r(carriage return) #8)\nfor new line (Add escape sequence\n(line feed) #9)\tfor horizontal tab character (Add escape sequence\t(horizontal tab) #10)\ffor form feed (Add escape sequence\f(form feed) #22)\dfor digit (Add predefined character class\d(digit) #14)\Dfor no digit (Add predefined character class\D(no digit) #23)\sfor whitespace character ( Add predefined character class\s(whitespace) #15)\Sfor no whitespace character (Add predefined character class\S(no whitespace) #24)\wfor word character (Add predefined character class\w(word character) #25)\Wfor no word character (Add predefined character class\W(no word character) #26)\xhh(Add escape sequence\xhh(character with hex codehh) #17)\uhhhh(Add escape sequence\uhhhh(character with hex codehhhh) #18)\Q...\E(Add escape sequence\Q...\E#13)Beta Was this translation helpful? Give feedback.
All reactions