| 
7 | 7 | :Author: A.M. Kuchling < [email protected]>   | 
8 | 8 | 
 
  | 
9 | 9 | .. TODO:  | 
10 |  | -   Document lookbehind assertions  | 
11 | 10 |    Better way of displaying a RE, a string, and what it matches  | 
12 | 11 |    Mention optional argument to match.groups()  | 
13 | 12 |    Unicode (at least a reference)  | 
@@ -1061,6 +1060,73 @@ end in either ``bat`` or ``exe``:  | 
1061 | 1060 | ``.*[.](?!bat$|exe$)[^.]*$``  | 
1062 | 1061 | 
 
  | 
1063 | 1062 | 
 
  | 
 | 1063 | +Lookbehind Assertions  | 
 | 1064 | +---------------------  | 
 | 1065 | + | 
 | 1066 | +Lookbehind assertions work similarly to lookahead assertions, but they look  | 
 | 1067 | +backwards in the string instead of forwards.  They are available in both  | 
 | 1068 | +positive and negative form, and look like this:  | 
 | 1069 | + | 
 | 1070 | +``(?<=...)``  | 
 | 1071 | +   Positive lookbehind assertion.  This succeeds if the contained regular  | 
 | 1072 | +   expression, represented here by ``...``, successfully matches ending at the  | 
 | 1073 | +   current location, and fails otherwise. The matching engine doesn't advance;  | 
 | 1074 | +   the rest of the pattern is tried right where the assertion started.  | 
 | 1075 | + | 
 | 1076 | +``(?<!...)``  | 
 | 1077 | +   Negative lookbehind assertion.  This is the opposite of the positive assertion;  | 
 | 1078 | +   it succeeds if the contained expression *doesn't* match ending at the current  | 
 | 1079 | +   position in the string.  | 
 | 1080 | + | 
 | 1081 | +Here's a comparison of lookahead and lookbehind assertions:  | 
 | 1082 | + | 
 | 1083 | ++------------------+------------------+------------------+  | 
 | 1084 | +| Type             | Lookahead        | Lookbehind       |  | 
 | 1085 | ++==================+==================+==================+  | 
 | 1086 | +| Positive         | ``(?=...)``      | ``(?<=...)``     |  | 
 | 1087 | ++------------------+------------------+------------------+  | 
 | 1088 | +| Negative         | ``(?!...)``      | ``(?<!...)``     |  | 
 | 1089 | ++------------------+------------------+------------------+  | 
 | 1090 | +| Direction        | Forward          | Backward         |  | 
 | 1091 | ++------------------+------------------+------------------+  | 
 | 1092 | +| Checks           | What comes after | What came before |  | 
 | 1093 | ++------------------+------------------+------------------+  | 
 | 1094 | + | 
 | 1095 | +Examples  | 
 | 1096 | +~~~~~~~~  | 
 | 1097 | + | 
 | 1098 | +*Positive assertions:*  | 
 | 1099 | +- Lookahead: ``Python(?= )`` matches "Python" only when followed by a space  | 
 | 1100 | +- Lookbehind: ``(?<=Hello )Python`` matches "Python" only when preceded by "Hello "  | 
 | 1101 | + | 
 | 1102 | +*Negative assertions:*  | 
 | 1103 | +- Lookahead: ``Python(?! )`` matches "Python" only when NOT followed by a space  | 
 | 1104 | +- Lookbehind: ``(?<!Hello )Python`` matches "Python" only when NOT preceded by "Hello "  | 
 | 1105 | + | 
 | 1106 | +*Practical examples:*  | 
 | 1107 | +- Lookahead: ``\d+(?=\$)`` matches digits that are followed by a dollar sign  | 
 | 1108 | +- Lookbehind: ``(?<=\$)\d+`` matches digits that are preceded by a dollar sign  | 
 | 1109 | + | 
 | 1110 | +Key differences  | 
 | 1111 | +~~~~~~~~~~~~~~~  | 
 | 1112 | + | 
 | 1113 | +1. **Direction**: Lookahead checks forward in the string, lookbehind checks backward  | 
 | 1114 | +2. **Limitations**: Lookbehind assertions must match fixed-width strings (no  | 
 | 1115 | +   variable quantifiers like ``*``, ``+``, or ``{m,n}``)  | 
 | 1116 | +3. **Performance**: Lookahead is generally more efficient because it follows the  | 
 | 1117 | +   natural left-to-right parsing of strings. Lookbehind, especially when emulated  | 
 | 1118 | +   or extended with variable-width support (as in some advanced regex engines),  | 
 | 1119 | +   can be computationally expensive.  | 
 | 1120 | + | 
 | 1121 | +For example, this is valid for lookahead but not for lookbehind:  | 
 | 1122 | +- Lookahead: ``(?=a*)def`` ✓ (valid)  | 
 | 1123 | +- Lookbehind: ``(?<=a*)def`` ✗ (error: variable-width lookbehind)  | 
 | 1124 | + | 
 | 1125 | +This limitation exists because the regex engine processes the string from left to  | 
 | 1126 | +right, and variable-width lookbehind would require the engine to look back an  | 
 | 1127 | +unknown distance, which is computationally expensive and not supported.  | 
 | 1128 | + | 
 | 1129 | + | 
1064 | 1130 | Modifying Strings  | 
1065 | 1131 | =================  | 
1066 | 1132 | 
 
  | 
 | 
0 commit comments