You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 03-regexp-unicode/article.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@ JavaScript uses [Unicode encoding](https://en.wikipedia.org/wiki/Unicode) for st
4
4
5
5
That range is not big enough to encode all possible characters, that's why some rare characters are encoded with 4 bytes, for instance like `𝒳` (mathematical X) or `😄` (a smile), some hieroglyphs and so on.
There's a unicode property `Script` (a writing system), that may have a value: `Cyrillic`, `Greek`, `Arabic`, `Han` (Chinese) and so on, [here's the full list](https://en.wikipedia.org/wiki/Script_(Unicode)).
124
+
There's a Unicode property `Script` (a writing system), that may have a value: `Cyrillic`, `Greek`, `Arabic`, `Han` (Chinese) and so on, [here's the full list](https://en.wikipedia.org/wiki/Script_(Unicode)).
125
125
126
126
To look for characters in a given writing system we should use `pattern:Script=<value>`, e.g. for Cyrillic letters: `pattern:\p{sc=Cyrillic}`, for Chinese hieroglyphs: `pattern:\p{sc=Han}`, and so on:
Copy file name to clipboardExpand all lines: 08-regexp-character-sets-and-ranges/article.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,16 +57,16 @@ For instance:
57
57
58
58
- **\d** -- is the same as `pattern:[0-9]`,
59
59
- **\w** -- is the same as `pattern:[a-zA-Z0-9_]`,
60
-
- **\s** -- is the same as `pattern:[\t\n\v\f\r ]`, plus few other rare unicode space characters.
60
+
- **\s** -- is the same as `pattern:[\t\n\v\f\r ]`, plus few other rare Unicode space characters.
61
61
```
62
62
63
63
### Example: multi-language \w
64
64
65
65
As the character class `pattern:\w` is a shorthand for `pattern:[a-zA-Z0-9_]`, it can't find Chinese hieroglyphs, Cyrillic letters, etc.
66
66
67
-
We can write a more universal pattern, that looks for wordly characters in any language. That's easy with unicode properties: `pattern:[\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_C}]`.
67
+
We can write a more universal pattern, that looks for wordly characters in any language. That's easy with Unicode properties: `pattern:[\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_C}]`.
68
68
69
-
Let's decipher it. Similar to `pattern:\w`, we're making a set of our own that includes characters with following unicode properties:
69
+
Let's decipher it. Similar to `pattern:\w`, we're making a set of our own that includes characters with following Unicode properties:
70
70
71
71
-`Alphabetic` (`Alpha`) - for letters,
72
72
-`Mark` (`M`) - for accents,
@@ -85,7 +85,7 @@ let str = `Hi 你好 12`;
85
85
alert( str.match(regexp) ); // H,i,你,好,1,2
86
86
```
87
87
88
-
Of course, we can edit this pattern: add unicode properties or remove them. Unicode properties are covered in more details in the article <info:regexp-unicode>.
88
+
Of course, we can edit this pattern: add Unicode properties or remove them. Unicode properties are covered in more details in the article <info:regexp-unicode>.
89
89
90
90
```warn header="Unicode properties aren't supported in Edge and Firefox"
91
91
Unicode properties `pattern:p{…}` are not yet implemented in Edge and Firefox. If we really need them, we can use library [XRegExp](http://xregexp.com/).
0 commit comments