Skip to content

Commit 35c8153

Browse files
Add regexp/unicode-property rule (#722)
* Add `regexp/unicode-property` rule * Create nervous-lies-yawn.md * Document exceptions to short and long names
1 parent 528d3b5 commit 35c8153

File tree

9 files changed

+1639
-0
lines changed

9 files changed

+1639
-0
lines changed

.changeset/nervous-lies-yawn.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"eslint-plugin-regexp": minor
3+
---
4+
5+
Add `regexp/unicode-property` rule to enforce consistent naming of unicode properties

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,7 @@ The `plugin.configs["flat/all"]` / `plugin:regexp/all` config enables all rules.
231231
| [sort-character-class-elements](https://ota-meshi.github.io/eslint-plugin-regexp/rules/sort-character-class-elements.html) | enforces elements order in character class | | | 🔧 | |
232232
| [sort-flags](https://ota-meshi.github.io/eslint-plugin-regexp/rules/sort-flags.html) | require regex flags to be sorted | 🟢 🔵 | | 🔧 | |
233233
| [unicode-escape](https://ota-meshi.github.io/eslint-plugin-regexp/rules/unicode-escape.html) | enforce consistent usage of unicode escape or unicode codepoint escape | | | 🔧 | |
234+
| [unicode-property](https://ota-meshi.github.io/eslint-plugin-regexp/rules/unicode-property.html) | enforce consistent naming of unicode properties | | | 🔧 | |
234235

235236
<!-- end auto-generated rules list -->
236237

docs/rules/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ sidebarDepth: 0
108108
| [sort-character-class-elements](sort-character-class-elements.md) | enforces elements order in character class | | | 🔧 | |
109109
| [sort-flags](sort-flags.md) | require regex flags to be sorted | 🟢 🔵 | | 🔧 | |
110110
| [unicode-escape](unicode-escape.md) | enforce consistent usage of unicode escape or unicode codepoint escape | | | 🔧 | |
111+
| [unicode-property](unicode-property.md) | enforce consistent naming of unicode properties | | | 🔧 | |
111112

112113
<!-- end auto-generated rules list -->
113114

docs/rules/unicode-property.md

Lines changed: 245 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,245 @@
1+
---
2+
pageClass: "rule-details"
3+
sidebarDepth: 0
4+
title: "regexp/unicode-property"
5+
description: "enforce consistent naming of unicode properties"
6+
---
7+
# regexp/unicode-property
8+
9+
🔧 This rule is automatically fixable by the [`--fix` CLI option](https://eslint.org/docs/latest/user-guide/command-line-interface#--fix).
10+
11+
<!-- end auto-generated rule header -->
12+
13+
> enforce consistent naming of unicode properties
14+
15+
## :book: Rule Details
16+
17+
This rule helps to enforce consistent style and naming of unicode properties.
18+
19+
There are many ways a single Unicode property can be expressed. E.g. `\p{L}`, `\p{Letter}`, `\p{gc=L}`, `\p{gc=Letter}`, `\p{General_Category=L}`, and `\p{General_Category=Letter}` are all equivalent. This rule can be configured in a variety of ways to control exactly which ones of those variants are allowed. The default configuration is intended to be a good starting point for most users.
20+
21+
<eslint-code-block fix>
22+
23+
```js
24+
/* eslint regexp/unicode-property: "error" */
25+
26+
/* ✓ GOOD */
27+
var re = /\p{L}/u;
28+
var re = /\p{Letter}/u;
29+
var re = /\p{Script=Greek}/u;
30+
var re = /\p{scx=Greek}/u;
31+
var re = /\p{Hex}/u;
32+
var re = /\p{Hex_Digit}/u;
33+
34+
/* ✗ BAD */
35+
var re = /\p{gc=L}/u;
36+
var re = /\p{General_Category=Letter}/u;
37+
var re = /\p{Script=Grek}/u;
38+
```
39+
40+
</eslint-code-block>
41+
42+
## :wrench: Options
43+
44+
```json
45+
{
46+
"regexp/unicode-property": ["error", {
47+
"generalCategory": "never",
48+
"key": "ignore",
49+
"property": {
50+
"binary": "ignore",
51+
"generalCategory": "ignore",
52+
"script": "long",
53+
}
54+
}]
55+
}
56+
```
57+
58+
### `generalCategory: "never" | "always" | "ignore"`
59+
60+
Values from the `General_Category` property can be expressed in two ways: either without or with the `gc=` (or `General_Category=`) prefix. E.g. `\p{Letter}` or `\p{gc=Letter}`.
61+
62+
This option controls whether the `gc=` prefix is required or forbidden.
63+
64+
- `"never"` (default): The `gc=` (or `General_Category=`) prefix is forbidden.
65+
<eslint-code-block fix>
66+
67+
```js
68+
/* eslint regexp/unicode-property: ["error", { generalCategory: "never" }] */
69+
70+
var re = /\p{Letter}/u;
71+
var re = /\p{gc=Letter}/u;
72+
var re = /\p{General_Category=Letter}/u;
73+
```
74+
75+
</eslint-code-block>
76+
77+
- `"always"`: The `gc=` (or `General_Category=`) prefix is required.
78+
<eslint-code-block fix>
79+
80+
```js
81+
/* eslint regexp/unicode-property: ["error", { generalCategory: "always" }] */
82+
83+
var re = /\p{Letter}/u;
84+
var re = /\p{gc=Letter}/u;
85+
var re = /\p{General_Category=Letter}/u;
86+
```
87+
88+
</eslint-code-block>
89+
90+
- `"ignore"`: Both with and without prefix is allowed.
91+
<eslint-code-block fix>
92+
93+
```js
94+
/* eslint regexp/unicode-property: ["error", { generalCategory: "ignore" }] */
95+
96+
var re = /\p{Letter}/u;
97+
var re = /\p{gc=Letter}/u;
98+
var re = /\p{General_Category=Letter}/u;
99+
```
100+
101+
</eslint-code-block>
102+
103+
### `key: "short" | "long" | "ignore"`
104+
105+
Unicode properties in key-value form (e.g. `\p{gc=Letter}`, `\P{scx=Greek}`) have two variants for the key: a short and a long form. E.g. `\p{gc=Letter}` and `\p{General_Category=Letter}`.
106+
107+
This option controls whether the short or long form is required.
108+
109+
- `"short"`: The key must be in short form.
110+
<eslint-code-block fix>
111+
112+
```js
113+
/* eslint regexp/unicode-property: ["error", { key: "short", generalCategory: "ignore" }] */
114+
115+
var re = /\p{gc=Letter}/u;
116+
var re = /\p{General_Category=Letter}/u;
117+
var re = /\p{sc=Greek}/u;
118+
var re = /\p{Script=Greek}/u;
119+
var re = /\p{scx=Greek}/u;
120+
var re = /\p{Script_Extensions=Greek}/u;
121+
```
122+
123+
</eslint-code-block>
124+
125+
- `"long"`: The key must be in long form.
126+
<eslint-code-block fix>
127+
128+
```js
129+
/* eslint regexp/unicode-property: ["error", { key: "long", generalCategory: "ignore" }] */
130+
131+
var re = /\p{gc=Letter}/u;
132+
var re = /\p{General_Category=Letter}/u;
133+
var re = /\p{sc=Greek}/u;
134+
var re = /\p{Script=Greek}/u;
135+
var re = /\p{scx=Greek}/u;
136+
var re = /\p{Script_Extensions=Greek}/u;
137+
```
138+
139+
</eslint-code-block>
140+
141+
- `"ignore"` (default): The key can be in either form.
142+
<eslint-code-block fix>
143+
144+
```js
145+
/* eslint regexp/unicode-property: ["error", { key: "ignore", generalCategory: "ignore" }] */
146+
147+
var re = /\p{gc=Letter}/u;
148+
var re = /\p{General_Category=Letter}/u;
149+
var re = /\p{sc=Greek}/u;
150+
var re = /\p{Script=Greek}/u;
151+
var re = /\p{scx=Greek}/u;
152+
var re = /\p{Script_Extensions=Greek}/u;
153+
```
154+
155+
</eslint-code-block>
156+
157+
### `property: "short" | "long" | "ignore" | object`
158+
159+
Similar to `key`, most property names also have long and short forms. E.g. `\p{Letter}` and `\p{L}`.
160+
161+
This option controls whether the short or long form is required. Which forms is required can be configured for each property type via an object. The object has to be of the type:
162+
163+
```ts
164+
{
165+
binary?: "short" | "long" | "ignore",
166+
generalCategory?: "short" | "long" | "ignore",
167+
script?: "short" | "long" | "ignore",
168+
}
169+
```
170+
171+
- `binary` controls the form of Binary Unicode properties. E.g. `ASCII`, `Any`, `Hex`.
172+
- `generalCategory` controls the form of values from the `General_Category` property. E.g. `Letter`, `Ll`, `P`.
173+
- `script` controls the form of values from the `Script` and `Script_Extensions` properties. E.g. `Greek`.
174+
175+
If the option is set to a string instead of an object, it will be used for all property types.
176+
177+
> NOTE: The `"short"` and `"long"` options follow the [Unicode standard](https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt) for short and long names. However, short names aren't always shorter than long names. E.g. the short name for `p{sc=Han}` is `\p{sc=Hani}`.
178+
>
179+
> There are also some properties that don't have a short name, such as `\p{sc=Thai}`, and some that have additional aliases that can be longer than the long name, such as `\p{Mark}` (long) with its short name `\p{M}` and alias `\p{Combining_Mark}`.
180+
181+
#### Examples
182+
183+
All set to `"long"`:
184+
185+
<eslint-code-block fix>
186+
187+
```js
188+
/* eslint regexp/unicode-property: ["error", { property: "long" }] */
189+
190+
var re = /\p{Hex}/u;
191+
var re = /\p{Hex_Digit}/u;
192+
var re = /\p{L}/u;
193+
var re = /\p{Letter}/u;
194+
var re = /\p{sc=Grek}/u;
195+
var re = /\p{sc=Greek}/u;
196+
```
197+
198+
</eslint-code-block>
199+
200+
All set to `"short"`:
201+
202+
<eslint-code-block fix>
203+
204+
```js
205+
/* eslint regexp/unicode-property: ["error", { property: "short" }] */
206+
207+
var re = /\p{Hex}/u;
208+
var re = /\p{Hex_Digit}/u;
209+
var re = /\p{L}/u;
210+
var re = /\p{Letter}/u;
211+
var re = /\p{sc=Grek}/u;
212+
var re = /\p{sc=Greek}/u;
213+
```
214+
215+
</eslint-code-block>
216+
217+
Binary properties and values of the `General_Category` property set to `"short"` and values of the `Script` property set to `"long"`:
218+
219+
<eslint-code-block fix>
220+
221+
```js
222+
/* eslint regexp/unicode-property: ["error", { property: { binary: "short", generalCategory: "short", script: "long" } }] */
223+
224+
var re = /\p{Hex}/u;
225+
var re = /\p{Hex_Digit}/u;
226+
var re = /\p{L}/u;
227+
var re = /\p{Letter}/u;
228+
var re = /\p{sc=Grek}/u;
229+
var re = /\p{sc=Greek}/u;
230+
```
231+
232+
</eslint-code-block>
233+
234+
## :books: Further reading
235+
236+
- [MDN docs on Unicode property escapes](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape)
237+
238+
## :rocket: Version
239+
240+
:exclamation: <badge text="This rule has not been released yet." vertical="middle" type="error"> ***This rule has not been released yet.*** </badge>
241+
242+
## :mag: Implementation
243+
244+
- [Rule source](https://github.com/ota-meshi/eslint-plugin-regexp/blob/master/lib/rules/unicode-property.ts)
245+
- [Test source](https://github.com/ota-meshi/eslint-plugin-regexp/blob/master/tests/lib/rules/unicode-property.ts)

lib/all-rules.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ import sortCharacterClassElements from "./rules/sort-character-class-elements"
7878
import sortFlags from "./rules/sort-flags"
7979
import strict from "./rules/strict"
8080
import unicodeEscape from "./rules/unicode-escape"
81+
import unicodeProperty from "./rules/unicode-property"
8182
import useIgnoreCase from "./rules/use-ignore-case"
8283
import type { RuleModule } from "./types"
8384

@@ -162,5 +163,6 @@ export const rules: RuleModule[] = [
162163
sortFlags,
163164
strict,
164165
unicodeEscape,
166+
unicodeProperty,
165167
useIgnoreCase,
166168
]

0 commit comments

Comments
 (0)