Skip to content

Commit ae0de6f

Browse files
committed
Match '9-regular-expressions' to 'en' repo
2 parents ad806d4 + 5a31649 commit ae0de6f

File tree

70 files changed

+3627
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+3627
-0
lines changed
Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
# Patterns and flags
2+
3+
Regular expressions are patterns that provide a powerful way to search and replace in text.
4+
5+
In JavaScript, they are available via the [RegExp](mdn:js/RegExp) object, as well as being integrated in methods of strings.
6+
7+
## Regular Expressions
8+
9+
A regular expression (also "regexp", or just "reg") consists of a *pattern* and optional *flags*.
10+
11+
There are two syntaxes that can be used to create a regular expression object.
12+
13+
The "long" syntax:
14+
15+
```js
16+
regexp = new RegExp("pattern", "flags");
17+
```
18+
19+
And the "short" one, using slashes `"/"`:
20+
21+
```js
22+
regexp = /pattern/; // no flags
23+
regexp = /pattern/gmi; // with flags g,m and i (to be covered soon)
24+
```
25+
26+
Slashes `pattern:/.../` tell JavaScript that we are creating a regular expression. They play the same role as quotes for strings.
27+
28+
In both cases `regexp` becomes an instance of the built-in `RegExp` class.
29+
30+
The main difference between these two syntaxes is that pattern using slashes `/.../` does not allow for expressions to be inserted (like string template literals with `${...}`). They are fully static.
31+
32+
Slashes are used when we know the regular expression at the code writing time -- and that's the most common situation. While `new RegExp` is more often used when we need to create a regexp "on the fly" from a dynamically generated string. For instance:
33+
34+
```js
35+
let tag = prompt("What tag do you want to find?", "h2");
36+
37+
let regexp = new RegExp(`<${tag}>`); // same as /<h2>/ if answered "h2" in the prompt above
38+
```
39+
40+
## Flags
41+
42+
Regular expressions may have flags that affect the search.
43+
44+
There are only 6 of them in JavaScript:
45+
46+
`pattern:i`
47+
: With this flag the search is case-insensitive: no difference between `A` and `a` (see the example below).
48+
49+
`pattern:g`
50+
: With this flag the search looks for all matches, without it -- only the first match is returned.
51+
52+
`pattern:m`
53+
: Multiline mode (covered in the chapter <info:regexp-multiline-mode>).
54+
55+
`pattern:s`
56+
: Enables "dotall" mode, that allows a dot `pattern:.` to match newline character `\n` (covered in the chapter <info:regexp-character-classes>).
57+
58+
`pattern:u`
59+
: Enables full Unicode support. The flag enables correct processing of surrogate pairs. More about that in the chapter <info:regexp-unicode>.
60+
61+
`pattern:y`
62+
: "Sticky" mode: searching at the exact position in the text (covered in the chapter <info:regexp-sticky>)
63+
64+
```smart header="Colors"
65+
From here on the color scheme is:
66+
67+
- regexp -- `pattern:red`
68+
- string (where we search) -- `subject:blue`
69+
- result -- `match:green`
70+
```
71+
72+
## Searching: str.match
73+
74+
As mentioned previously, regular expressions are integrated with string methods.
75+
76+
The method `str.match(regexp)` finds all matches of `regexp` in the string `str`.
77+
78+
It has 3 working modes:
79+
80+
1. If the regular expression has flag `pattern:g`, it returns an array of all matches:
81+
```js run
82+
let str = "We will, we will rock you";
83+
84+
alert( str.match(/we/gi) ); // We,we (an array of 2 substrings that match)
85+
```
86+
Please note that both `match:We` and `match:we` are found, because flag `pattern:i` makes the regular expression case-insensitive.
87+
88+
2. If there's no such flag it returns only the first match in the form of an array, with the full match at index `0` and some additional details in properties:
89+
```js run
90+
let str = "We will, we will rock you";
91+
92+
let result = str.match(/we/i); // without flag g
93+
94+
alert( result[0] ); // We (1st match)
95+
alert( result.length ); // 1
96+
97+
// Details:
98+
alert( result.index ); // 0 (position of the match)
99+
alert( result.input ); // We will, we will rock you (source string)
100+
```
101+
The array may have other indexes, besides `0` if a part of the regular expression is enclosed in parentheses. We'll cover that in the chapter <info:regexp-groups>.
102+
103+
3. And, finally, if there are no matches, `null` is returned (doesn't matter if there's flag `pattern:g` or not).
104+
105+
This a very important nuance. If there are no matches, we don't receive an empty array, but instead receive `null`. Forgetting about that may lead to errors, e.g.:
106+
107+
```js run
108+
let matches = "JavaScript".match(/HTML/); // = null
109+
110+
if (!matches.length) { // Error: Cannot read property 'length' of null
111+
alert("Error in the line above");
112+
}
113+
```
114+
115+
If we'd like the result to always be an array, we can write it this way:
116+
117+
```js run
118+
let matches = "JavaScript".match(/HTML/)*!* || []*/!*;
119+
120+
if (!matches.length) {
121+
alert("No matches"); // now it works
122+
}
123+
```
124+
125+
## Replacing: str.replace
126+
127+
The method `str.replace(regexp, replacement)` replaces matches found using `regexp` in string `str` with `replacement` (all matches if there's flag `pattern:g`, otherwise, only the first one).
128+
129+
For instance:
130+
131+
```js run
132+
// no flag g
133+
alert( "We will, we will".replace(/we/i, "I") ); // I will, we will
134+
135+
// with flag g
136+
alert( "We will, we will".replace(/we/ig, "I") ); // I will, I will
137+
```
138+
139+
The second argument is the `replacement` string. We can use special character combinations in it to insert fragments of the match:
140+
141+
| Symbols | Action in the replacement string |
142+
|--------|--------|
143+
|`$&`|inserts the whole match|
144+
|<code>$&#096;</code>|inserts a part of the string before the match|
145+
|`$'`|inserts a part of the string after the match|
146+
|`$n`|if `n` is a 1-2 digit number, then it inserts the contents of n-th parentheses, more about it in the chapter <info:regexp-groups>|
147+
|`$<name>`|inserts the contents of the parentheses with the given `name`, more about it in the chapter <info:regexp-groups>|
148+
|`$$`|inserts character `$` |
149+
150+
An example with `pattern:$&`:
151+
152+
```js run
153+
alert( "I love HTML".replace(/HTML/, "$& and JavaScript") ); // I love HTML and JavaScript
154+
```
155+
156+
## Testing: regexp.test
157+
158+
The method `regexp.test(str)` looks for at least one match, if found, returns `true`, otherwise `false`.
159+
160+
```js run
161+
let str = "I love JavaScript";
162+
let regexp = /LOVE/i;
163+
164+
alert( regexp.test(str) ); // true
165+
```
166+
167+
Later in this chapter we'll study more regular expressions, walk through more examples, and also meet other methods.
168+
169+
Full information about the methods is given in the article <info:regexp-methods>.
170+
171+
## Summary
172+
173+
- A regular expression consists of a pattern and optional flags: `pattern:g`, `pattern:i`, `pattern:m`, `pattern:u`, `pattern:s`, `pattern:y`.
174+
- Without flags and special symbols (that we'll study later), the search by a regexp is the same as a substring search.
175+
- The method `str.match(regexp)` looks for matches: all of them if there's `pattern:g` flag, otherwise, only the first one.
176+
- The method `str.replace(regexp, replacement)` replaces matches found using `regexp` with `replacement`: all of them if there's `pattern:g` flag, otherwise only the first one.
177+
- The method `regexp.test(str)` returns `true` if there's at least one match, otherwise, it returns `false`.
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
# Character classes
2+
3+
Consider a practical task -- we have a phone number like `"+7(903)-123-45-67"`, and we need to turn it into pure numbers: `79031234567`.
4+
5+
To do so, we can find and remove anything that's not a number. Character classes can help with that.
6+
7+
A *character class* is a special notation that matches any symbol from a certain set.
8+
9+
For the start, let's explore the "digit" class. It's written as `pattern:\d` and corresponds to "any single digit".
10+
11+
For instance, let's find the first digit in the phone number:
12+
13+
```js run
14+
let str = "+7(903)-123-45-67";
15+
16+
let regexp = /\d/;
17+
18+
alert( str.match(regexp) ); // 7
19+
```
20+
21+
Without the flag `pattern:g`, the regular expression only looks for the first match, that is the first digit `pattern:\d`.
22+
23+
Let's add the `pattern:g` flag to find all digits:
24+
25+
```js run
26+
let str = "+7(903)-123-45-67";
27+
28+
let regexp = /\d/g;
29+
30+
alert( str.match(regexp) ); // array of matches: 7,9,0,3,1,2,3,4,5,6,7
31+
32+
// let's make the digits-only phone number of them:
33+
alert( str.match(regexp).join('') ); // 79031234567
34+
```
35+
36+
That was a character class for digits. There are other character classes as well.
37+
38+
Most used are:
39+
40+
`pattern:\d` ("d" is from "digit")
41+
: A digit: a character from `0` to `9`.
42+
43+
`pattern:\s` ("s" is from "space")
44+
: A space symbol: includes spaces, tabs `\t`, newlines `\n` and few other rare characters, such as `\v`, `\f` and `\r`.
45+
46+
`pattern:\w` ("w" is from "word")
47+
: A "wordly" character: either a letter of Latin alphabet or a digit or an underscore `_`. Non-Latin letters (like cyrillic or hindi) do not belong to `pattern:\w`.
48+
49+
For instance, `pattern:\d\s\w` means a "digit" followed by a "space character" followed by a "wordly character", such as `match:1 a`.
50+
51+
**A regexp may contain both regular symbols and character classes.**
52+
53+
For instance, `pattern:CSS\d` matches a string `match:CSS` with a digit after it:
54+
55+
```js run
56+
let str = "Is there CSS4?";
57+
let regexp = /CSS\d/
58+
59+
alert( str.match(regexp) ); // CSS4
60+
```
61+
62+
Also we can use many character classes:
63+
64+
```js run
65+
alert( "I love HTML5!".match(/\s\w\w\w\w\d/) ); // ' HTML5'
66+
```
67+
68+
The match (each regexp character class has the corresponding result character):
69+
70+
![](love-html5-classes.svg)
71+
72+
## Inverse classes
73+
74+
For every character class there exists an "inverse class", denoted with the same letter, but uppercased.
75+
76+
The "inverse" means that it matches all other characters, for instance:
77+
78+
`pattern:\D`
79+
: Non-digit: any character except `pattern:\d`, for instance a letter.
80+
81+
`pattern:\S`
82+
: Non-space: any character except `pattern:\s`, for instance a letter.
83+
84+
`pattern:\W`
85+
: Non-wordly character: anything but `pattern:\w`, e.g a non-latin letter or a space.
86+
87+
In the beginning of the chapter we saw how to make a number-only phone number from a string like `subject:+7(903)-123-45-67`: find all digits and join them.
88+
89+
```js run
90+
let str = "+7(903)-123-45-67";
91+
92+
alert( str.match(/\d/g).join('') ); // 79031234567
93+
```
94+
95+
An alternative, shorter way is to find non-digits `pattern:\D` and remove them from the string:
96+
97+
```js run
98+
let str = "+7(903)-123-45-67";
99+
100+
alert( str.replace(/\D/g, "") ); // 79031234567
101+
```
102+
103+
## A dot is "any character"
104+
105+
A dot `pattern:.` is a special character class that matches "any character except a newline".
106+
107+
For instance:
108+
109+
```js run
110+
alert( "Z".match(/./) ); // Z
111+
```
112+
113+
Or in the middle of a regexp:
114+
115+
```js run
116+
let regexp = /CS.4/;
117+
118+
alert( "CSS4".match(regexp) ); // CSS4
119+
alert( "CS-4".match(regexp) ); // CS-4
120+
alert( "CS 4".match(regexp) ); // CS 4 (space is also a character)
121+
```
122+
123+
Please note that a dot means "any character", but not the "absence of a character". There must be a character to match it:
124+
125+
```js run
126+
alert( "CS4".match(/CS.4/) ); // null, no match because there's no character for the dot
127+
```
128+
129+
### Dot as literally any character with "s" flag
130+
131+
By default, a dot doesn't match the newline character `\n`.
132+
133+
For instance, the regexp `pattern:A.B` matches `match:A`, and then `match:B` with any character between them, except a newline `\n`:
134+
135+
```js run
136+
alert( "A\nB".match(/A.B/) ); // null (no match)
137+
```
138+
139+
There are many situations when we'd like a dot to mean literally "any character", newline included.
140+
141+
That's what flag `pattern:s` does. If a regexp has it, then a dot `pattern:.` matches literally any character:
142+
143+
```js run
144+
alert( "A\nB".match(/A.B/s) ); // A\nB (match!)
145+
```
146+
147+
````warn header="Not supported in IE"
148+
The `pattern:s` flag is not supported in IE.
149+
150+
Luckily, there's an alternative, that works everywhere. We can use a regexp like `pattern:[\s\S]` to match "any character" (this pattern will be covered in the article <info:regexp-character-sets-and-ranges>).
151+
152+
```js run
153+
alert( "A\nB".match(/A[\s\S]B/) ); // A\nB (match!)
154+
```
155+
156+
The pattern `pattern:[\s\S]` literally says: "a space character OR not a space character". In other words, "anything". We could use another pair of complementary classes, such as `pattern:[\d\D]`, that doesn't matter. Or even the `pattern:[^]` -- as it means match any character except nothing.
157+
158+
Also we can use this trick if we want both kind of "dots" in the same pattern: the actual dot `pattern:.` behaving the regular way ("not including a newline"), and also a way to match "any character" with `pattern:[\s\S]` or alike.
159+
````
160+
161+
````warn header="Pay attention to spaces"
162+
Usually we pay little attention to spaces. For us strings `subject:1-5` and `subject:1 - 5` are nearly identical.
163+
164+
But if a regexp doesn't take spaces into account, it may fail to work.
165+
166+
Let's try to find digits separated by a hyphen:
167+
168+
```js run
169+
alert( "1 - 5".match(/\d-\d/) ); // null, no match!
170+
```
171+
172+
Let's fix it adding spaces into the regexp `pattern:\d - \d`:
173+
174+
```js run
175+
alert( "1 - 5".match(/\d - \d/) ); // 1 - 5, now it works
176+
// or we can use \s class:
177+
alert( "1 - 5".match(/\d\s-\s\d/) ); // 1 - 5, also works
178+
```
179+
180+
**A space is a character. Equal in importance with any other character.**
181+
182+
We can't add or remove spaces from a regular expression and expect it to work the same.
183+
184+
In other words, in a regular expression all characters matter, spaces too.
185+
````
186+
187+
## Summary
188+
189+
There exist following character classes:
190+
191+
- `pattern:\d` -- digits.
192+
- `pattern:\D` -- non-digits.
193+
- `pattern:\s` -- space symbols, tabs, newlines.
194+
- `pattern:\S` -- all but `pattern:\s`.
195+
- `pattern:\w` -- Latin letters, digits, underscore `'_'`.
196+
- `pattern:\W` -- all but `pattern:\w`.
197+
- `pattern:.` -- any character if with the regexp `'s'` flag, otherwise any except a newline `\n`.
198+
199+
...But that's not all!
200+
201+
Unicode encoding, used by JavaScript for strings, provides many properties for characters, like: which language the letter belongs to (if it's a letter), is it a punctuation sign, etc.
202+
203+
We can search by these properties as well. That requires flag `pattern:u`, covered in the next article.
Lines changed: 1 addition & 0 deletions
Loading

0 commit comments

Comments
 (0)