Skip to content

Commit 8d2fd5d

Browse files
authored
Add regexp/simplify-set-operations rule (#595)
1 parent 4696b89 commit 8d2fd5d

14 files changed

+800
-27
lines changed

.changeset/early-islands-press.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"eslint-plugin-regexp": major
3+
---
4+
5+
Add `regexp/simplify-set-operations` rule

.changeset/early-islands-press2.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"eslint-plugin-regexp": minor
3+
---
4+
5+
Improve `regexp/negation` rule to report nested negation character classes

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,7 @@ The `plugin:regexp/all` config enables all rules. It's meant for testing, not fo
170170
| [prefer-set-operation](https://ota-meshi.github.io/eslint-plugin-regexp/rules/prefer-set-operation.html) | prefer character class set operations instead of lookarounds || | 🔧 | |
171171
| [require-unicode-regexp](https://ota-meshi.github.io/eslint-plugin-regexp/rules/require-unicode-regexp.html) | enforce the use of the `u` flag | | | 🔧 | |
172172
| [require-unicode-sets-regexp](https://ota-meshi.github.io/eslint-plugin-regexp/rules/require-unicode-sets-regexp.html) | enforce the use of the `v` flag | | | 🔧 | |
173+
| [simplify-set-operations](https://ota-meshi.github.io/eslint-plugin-regexp/rules/simplify-set-operations.html) | require simplify set operations || | 🔧 | |
173174
| [sort-alternatives](https://ota-meshi.github.io/eslint-plugin-regexp/rules/sort-alternatives.html) | sort alternatives if order doesn't matter | | | 🔧 | |
174175
| [use-ignore-case](https://ota-meshi.github.io/eslint-plugin-regexp/rules/use-ignore-case.html) | use the `i` flag if it simplifies the pattern || | 🔧 | |
175176

docs/rules/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ sidebarDepth: 0
7777
| [prefer-set-operation](prefer-set-operation.md) | prefer character class set operations instead of lookarounds || | 🔧 | |
7878
| [require-unicode-regexp](require-unicode-regexp.md) | enforce the use of the `u` flag | | | 🔧 | |
7979
| [require-unicode-sets-regexp](require-unicode-sets-regexp.md) | enforce the use of the `v` flag | | | 🔧 | |
80+
| [simplify-set-operations](simplify-set-operations.md) | require simplify set operations || | 🔧 | |
8081
| [sort-alternatives](sort-alternatives.md) | sort alternatives if order doesn't matter | | | 🔧 | |
8182
| [use-ignore-case](use-ignore-case.md) | use the `i` flag if it simplifies the pattern || | 🔧 | |
8283

docs/rules/negation.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,12 @@ var foo = /[^\P{ASCII}]/u
5353

5454
Nothing.
5555

56+
## :couple: Related rules
57+
58+
- [regexp/simplify-set-operations]
59+
60+
[regexp/simplify-set-operations]: ./simplify-set-operations.md
61+
5662
## :rocket: Version
5763

5864
This rule was introduced in eslint-plugin-regexp v0.4.0

docs/rules/simplify-set-operations.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
pageClass: "rule-details"
3+
sidebarDepth: 0
4+
title: "regexp/simplify-set-operations"
5+
description: "require simplify set operations"
6+
---
7+
# regexp/simplify-set-operations
8+
9+
💼 This rule is enabled in the ✅ `plugin:regexp/recommended` config.
10+
11+
🔧 This rule is automatically fixable by the [`--fix` CLI option](https://eslint.org/docs/latest/user-guide/command-line-interface#--fix).
12+
13+
<!-- end auto-generated rule header -->
14+
15+
> require simplify set operations
16+
17+
## :book: Rule Details
18+
19+
This rule aims to optimize patterns by simplifying set operations in character classes (with `v` flag).
20+
21+
This rule does not report simple nested negations. (e.g. `/[^[^abc]]/v`)\
22+
If you want to report simple nested negations, use the [regexp/negation] rule.
23+
24+
<eslint-code-block fix>
25+
26+
```js
27+
/* eslint regexp/simplify-set-operations: "error" */
28+
29+
/* ✗ BAD */
30+
var re = /[a&&[^b]]/v; // -> /[a--b]/v
31+
var re = /[[^b]&&a]/v; // -> /[a--b]/v
32+
var re = /[a--[^b]]/v; // -> /[a&&b]/v
33+
var re = /[[^a]&&[^b]]/v; // -> /[^ab]/v
34+
var re = /[[^a][^b]]/v; // -> /[^a&&b]/v
35+
36+
/* ✓ GOOD */
37+
var re = /[a--b]/v;
38+
var re = /[a&&b]/v;
39+
var re = /[^ab]/v;
40+
var re = /[^a&&b]/v;
41+
```
42+
43+
</eslint-code-block>
44+
45+
### How does this rule work?
46+
47+
This rule attempts to simplify set operations in the ways listed below:
48+
49+
#### De Morgan's laws
50+
51+
This rule uses De Morgan's laws to look for patterns that can convert multiple negations into a single negation, reports on them, and auto-fix them.\
52+
For example, `/[[^a]&&[^b]]/v` is equivalent to `/[^ab]/v`, and `/[[^a][^b]]/v` is equivalent to `/[^a&&b]/v`.
53+
54+
See <https://en.wikipedia.org/wiki/De_Morgan's_laws>.
55+
56+
#### Conversion from the intersection to the subtraction
57+
58+
Intersection sets with complement operands can be converted to difference sets.\
59+
The rule looks for character class intersection with negation operands, reports on them, auto-fix them.\
60+
For example, `/[a&&[^b]]/v` is equivalent to `/[a--b]/v`, `/[[^a]&&b]/v` is equivalent to `/[b--a]/v`.
61+
62+
#### Conversion from the subtraction to the intersection
63+
64+
Difference set with a complement operand on the right side can be converted to intersection sets.\
65+
The rule looks for character class subtraction with negation operand on the right side, reports on them, auto-fix them.\
66+
For example, `/[a--[^b]]/v` is equivalent to `/[a&&b]/v`.
67+
68+
### Auto Fixes
69+
70+
This rule's auto-fix does not remove unnecessary brackets. For example, `/[[^a]&&[^b]]/v` will be automatically fixed to `/[^[a][b]]/v`.\
71+
If you want to remove unnecessary brackets (e.g. auto-fixed to `/[^ab]/v`), use [regexp/no-useless-character-class] rule together.
72+
73+
## :wrench: Options
74+
75+
Nothing.
76+
77+
## :couple: Related rules
78+
79+
- [regexp/negation]
80+
- [regexp/no-useless-character-class]
81+
82+
[regexp/negation]: ./negation.md
83+
[regexp/no-useless-character-class]: ./no-useless-character-class.md
84+
85+
## :rocket: Version
86+
87+
:exclamation: <badge text="This rule has not been released yet." vertical="middle" type="error"> ***This rule has not been released yet.*** </badge>
88+
89+
## :mag: Implementation
90+
91+
- [Rule source](https://github.com/ota-meshi/eslint-plugin-regexp/blob/master/lib/rules/simplify-set-operations.ts)
92+
- [Test source](https://github.com/ota-meshi/eslint-plugin-regexp/blob/master/tests/lib/rules/simplify-set-operations.ts)

lib/configs/recommended.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ export const rules = {
6666
"regexp/prefer-star-quantifier": "error",
6767
"regexp/prefer-unicode-codepoint-escapes": "error",
6868
"regexp/prefer-w": "error",
69+
"regexp/simplify-set-operations": "error",
6970
"regexp/sort-flags": "error",
7071
"regexp/strict": "error",
7172
"regexp/use-ignore-case": "error",

lib/rules/negation.ts

Lines changed: 46 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,33 @@
1-
import { toCharSet, toUnicodeSet } from "regexp-ast-analysis"
1+
import { toUnicodeSet } from "regexp-ast-analysis"
22
import type {
3+
CharacterClass,
4+
CharacterClassElement,
5+
CharacterUnicodePropertyCharacterSet,
36
EscapeCharacterSet,
4-
UnicodePropertyCharacterSet,
7+
ExpressionCharacterClass,
58
} from "@eslint-community/regexpp/ast"
69
import type { RegExpVisitor } from "@eslint-community/regexpp/visitor"
710
import type { RegExpContext } from "../utils"
811
import { createRule, defineRegexpVisitor } from "../utils"
12+
import { assertNever } from "../utils/util"
13+
14+
type NegatableCharacterClassElement =
15+
| CharacterClass
16+
| ExpressionCharacterClass
17+
| EscapeCharacterSet
18+
| CharacterUnicodePropertyCharacterSet
19+
20+
/** Checks whether the given character class is negatable. */
21+
function isNegatableCharacterClassElement<N extends CharacterClassElement>(
22+
node: N,
23+
): node is N & NegatableCharacterClassElement {
24+
return (
25+
node.type === "CharacterClass" ||
26+
node.type === "ExpressionCharacterClass" ||
27+
(node.type === "CharacterSet" &&
28+
(node.kind !== "property" || !node.strings))
29+
)
30+
}
931

1032
export default createRule("negation", {
1133
meta: {
@@ -36,19 +58,17 @@ export default createRule("negation", {
3658
}
3759

3860
const element = ccNode.elements[0]
39-
if (element.type !== "CharacterSet") {
61+
if (!isNegatableCharacterClassElement(element)) {
4062
return
4163
}
42-
if (element.kind === "property" && element.strings) {
43-
// Unicode property escape with property of strings.
44-
// Actually the pattern passing through this branch is an invalid pattern,
45-
// but it has to be checked because of the type guards.
64+
if (element.type !== "CharacterSet" && !element.negate) {
4665
return
4766
}
4867

4968
if (
5069
flags.ignoreCase &&
5170
!flags.unicodeSets &&
71+
element.type === "CharacterSet" &&
5272
element.kind === "property"
5373
) {
5474
// The ignore case canonicalization affects negated
@@ -61,7 +81,7 @@ export default createRule("negation", {
6181
// (/./, /\s/, /\d/) or inconsistent (/\w/).
6282
const ccSet = toUnicodeSet(ccNode, flags)
6383

64-
const negatedElementSet = toCharSet(
84+
const negatedElementSet = toUnicodeSet(
6585
{
6686
...element,
6787
negate: !element.negate,
@@ -96,17 +116,24 @@ export default createRule("negation", {
96116
/**
97117
* Gets the text that negation the CharacterSet.
98118
*/
99-
function getNegationText(
100-
node: EscapeCharacterSet | UnicodePropertyCharacterSet,
101-
) {
102-
// they are all of the form: /\\[dswp](?:\{[^{}]+\})?/
103-
let kind = node.raw[1]
119+
function getNegationText(node: NegatableCharacterClassElement) {
120+
if (node.type === "CharacterSet") {
121+
// they are all of the form: /\\[dswp](?:\{[^{}]+\})?/
122+
let kind = node.raw[1]
104123

105-
if (kind.toLowerCase() === kind) {
106-
kind = kind.toUpperCase()
107-
} else {
108-
kind = kind.toLowerCase()
109-
}
124+
if (kind.toLowerCase() === kind) {
125+
kind = kind.toUpperCase()
126+
} else {
127+
kind = kind.toLowerCase()
128+
}
110129

111-
return `\\${kind}${node.raw.slice(2)}`
130+
return `\\${kind}${node.raw.slice(2)}`
131+
}
132+
if (node.type === "CharacterClass") {
133+
return `[${node.elements.map((e) => e.raw).join("")}]`
134+
}
135+
if (node.type === "ExpressionCharacterClass") {
136+
return `[${node.raw.slice(2, -1)}]`
137+
}
138+
return assertNever(node)
112139
}

0 commit comments

Comments
 (0)