-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Description
Documentation
The claim at:
Lines 253 to 255 in d0c6ba9
| * Special characters lose their special meaning inside sets. For example, | |
| ``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``, | |
| ``'*'``, or ``')'``. |
seems wrong at least for
\.
Consider the following example:
>>> bool(re.search(string=b"a\\b",pattern=b"[\\\n\r]"))
False
My expectation would be that after backslash-unescaping the b"…"-string, pattern is assigned the sequence of:
literal \, the line-feed "character", the carriage-return "character"
If it would be true, that "Special characters lose their special meaning inside sets.", then the resolved \ in the unescaped pattern should match the one in my test string b"a\\b", however it does not.
I guess what Python actually "sees" is:
backslash-escaped line-feed "character", the carriage-return "character"
which probably effectively yields:
the line-feed "character", the carriage-return "character"
Now you could argue that the \ is not considered a special-character for the terms of the regular expression syntax... but it is, at least already because of:
Lines 504 to 507 in d0c6ba9
| The special sequences consist of ``'\'`` and a character from the list below. | |
| If the ordinary character is not an ASCII digit or an ASCII letter, then the | |
| resulting RE will match the second character. For example, ``\$`` matches the | |
| character ``'$'``. |
and ff..
Also, even the section that explains […] mentions the escaping functionality of it:
Lines 249 to 250 in d0c6ba9
| ``[0-9A-Fa-f]`` will match any hexadecimal digit. If ``-`` is escaped (e.g. | |
| ``[a\-z]``) or if it's placed as the first or last character |
I think:
Lines 253 to 255 in d0c6ba9
| * Special characters lose their special meaning inside sets. For example, | |
| ``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``, | |
| ``'*'``, or ``')'``. |
should be improved to document that:
\is exempt from this- whether or this is only the case for characters that are actually special with respect to the RE bracket expression, i.e.
[0\-9]is0,-and9, because the-was special in that position. But what about[\-9]? Here, the-would not have been special, so it the result\,-and9or just-and9? - or whether this is simply the case for any character following the
\... ones that are special outside and RE bracket expression, like\$,\D.\wor\number... and/or ones that are never special, like\ü.
Thanks,
Chris.