You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
per /issues/83, need to clarify that named classes must be contained in a class
The "which must be part of a character class" from the previous version was intended to indicate the rules, but it was apparently non-obvious to some users, so this update makes it much more explicit, with examples. Also explicitly shows the complements of those character classes.
While modifying the searching docs, also clarified that `(?-s)` does _not_ affect character classes, because that's a common misunderstanding.
Copy file name to clipboardExpand all lines: content/docs/searching.md
+12-2Lines changed: 12 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -339,7 +339,9 @@ In a regular expression (shortened into regex throughout), special characters in
339
339
340
340
*`[^`_set_`]` ⇒ The complement of the characters in the _set_. For example, `[^A-Za-z]` means any character except an alphabetic character. Care should be taken with a complement list, as regular expressions are always multi-line, and hence `[^ABC]*` will match until the first `A`, `B` or `C` (or `a`, `b` or `c` if match case is off), including any newline characters. To confine the search to a single line, include the newline characters in the exception list, e.g. `[^ABC\r\n]`.
341
341
342
-
*`[:`_name_`:]` ⇒ The whole character class named _name_, which must be part of a character class. For many, there is also a single-letter short class name.
342
+
Please note that the complement of a character set is often many more characters than you expect: `(?-s)[^x]+` will match 1 or more instances of any non-`x` character, including newlines: the `(?-s)`[search modifier](#search-modifier) turns off "dot matches newlines", but the `[^x]` is _not_ a dot `.`, so that class is still allowed to match newlines.
343
+
344
+
*`[[:`_name_`:]]` or `[[:☒:]]` ⇒ The whole character class named _name_. For many, there is also a single-letter "short" class name, ☒. Please note: the `[:`_name_`:]` and `[:☒:]` must be inside a character class `[...]` to have their special meaning.
343
345
344
346
| short | full name | description | equivalent character class |
@@ -360,7 +362,13 @@ In a regular expression (shortened into regex throughout), special characters in
360
362
361
363
Note that letters include any unicode letters (ASCII letters, accented letters, and letters from a variety of other writing systems); digits include ASCII numeric digits, and anything else in Unicode that's classified as a digit (like superscript numbers ¹²³...).
362
364
363
-
Note that those character class names may be written in upper or lower case without changing the results. So `[:alnum:]` is the same as `[:ALNUM:]` or the mixed-case `[:AlNuM:]`.
365
+
Note that those character class names may be written in upper or lower case without changing the results. So `[[:alnum:]]` is the same as `[[:ALNUM:]]` or the mixed-case `[[:AlNuM:]]`.
366
+
367
+
As stated earlier, the `[:`_name_`:]` and `[:☒:]` (note the single brackets) must be a part of a surrounding character class. However, you _may_ combine them inside one character class, such as `[_[:d:]x[:upper:]=]`, which is a character class that would match any digit, any uppercase, the lowercase `x`, and the literal `_` and `=` characters. These named classes won't always appear with the double brackets, but they will always be inside of a character class.
368
+
369
+
If the `[:`_name_`:]` or `[:☒:]` are accidentally _not_ contained inside a surrounding character class, they will lose their special meaning. For example, `[:upper:]` is the character class matching `:`, `u`, `p`, `e`, and `r`; whereas `[[:upper:]]` is similar to `[A-Z]` (plus other unicode uppercase letters)
370
+
371
+
*`[^[:`_name_`:]]` or `[^[:☒:]]` ⇒ The complement of character class named _name_ or ☒ (matching anything _not_ in that named class). This uses the same long names, short names, and rules as mentioned in the previous description.
364
372
365
373
##### Character Properties
366
374
@@ -528,6 +536,8 @@ The following constructs control how matches condition other matches, or otherwi
528
536
*`(?-i)caseSensitive(?i)cAsE inSenSitive` ⇒ disables case insensitivity (makes it case-sensitive) for the portion of the regex indicated by `caseSensitive`, and re-enables case-insensitive matching for the rest of the regex
529
537
*`(?m:justHere)` ⇒ `^` and `$` will match on embedded newlines, but just for the contents of this subgroup `justHere`
530
538
*`(?x)` ⇒ Allow extra whitespace in the expression for the remainder of the regex
539
+
540
+
Please note that turning off "dot matches newline" with `(?-s)` will _not_ affect character classes: `(?-s)[^x]+` will match 1 or more instances of any non-`x` character, including newlines, even though the `(?-s)`[search modifier](#search-modifier) turns off "dot matches newlines" (the `[^x]` is _not_ a dot `.`, so is still allowed to match newlines).
531
541
532
542
*`(?|expression)` ⇒ If an alternation expression has parenthetical subexpressions in some of its alternatives, you may want the subexpression counter not to be altered by what is in the other branches of the alternation. This construct will just do that.
0 commit comments