Skip to content

Commit 70393ef

Browse files
authored
Add regexp/no-useless-assertions rule (#137)
1 parent 51027f7 commit 70393ef

File tree

6 files changed

+732
-0
lines changed

6 files changed

+732
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ The rules with the following star :star: are included in the `plugin:regexp/reco
103103
| [regexp/no-potentially-useless-backreference](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-potentially-useless-backreference.html) | disallow backreferences that reference a group that might not be matched | |
104104
| [regexp/no-trivially-nested-assertion](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-trivially-nested-assertion.html) | disallow trivially nested assertions | :wrench: |
105105
| [regexp/no-unused-capturing-group](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-unused-capturing-group.html) | disallow unused capturing group | |
106+
| [regexp/no-useless-assertions](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-useless-assertions.html) | disallow assertions that are known to always accept (or reject) | |
106107
| [regexp/no-useless-backreference](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-useless-backreference.html) | disallow useless backreferences in regular expressions | :star: |
107108
| [regexp/no-useless-character-class](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-useless-character-class.html) | disallow character class with one character | :wrench: |
108109
| [regexp/no-useless-dollar-replacements](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-useless-dollar-replacements.html) | disallow useless `$` replacements in replacement string | |

docs/rules/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ The rules with the following star :star: are included in the `plugin:regexp/reco
3131
| [regexp/no-potentially-useless-backreference](./no-potentially-useless-backreference.md) | disallow backreferences that reference a group that might not be matched | |
3232
| [regexp/no-trivially-nested-assertion](./no-trivially-nested-assertion.md) | disallow trivially nested assertions | :wrench: |
3333
| [regexp/no-unused-capturing-group](./no-unused-capturing-group.md) | disallow unused capturing group | |
34+
| [regexp/no-useless-assertions](./no-useless-assertions.md) | disallow assertions that are known to always accept (or reject) | |
3435
| [regexp/no-useless-backreference](./no-useless-backreference.md) | disallow useless backreferences in regular expressions | :star: |
3536
| [regexp/no-useless-character-class](./no-useless-character-class.md) | disallow character class with one character | :wrench: |
3637
| [regexp/no-useless-dollar-replacements](./no-useless-dollar-replacements.md) | disallow useless `$` replacements in replacement string | |

docs/rules/no-useless-assertions.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
pageClass: "rule-details"
3+
sidebarDepth: 0
4+
title: "regexp/no-useless-assertions"
5+
description: "disallow assertions that are known to always accept (or reject)"
6+
---
7+
# regexp/no-useless-assertions
8+
9+
> disallow assertions that are known to always accept (or reject)
10+
11+
- :exclamation: <badge text="This rule has not been released yet." vertical="middle" type="error"> ***This rule has not been released yet.*** </badge>
12+
13+
## :book: Rule Details
14+
15+
Some assertion are unnecessary because the rest of the pattern forces them to
16+
always be accept (or reject).
17+
18+
<eslint-code-block>
19+
20+
```js
21+
/* eslint regexp/no-useless-assertions: "error" */
22+
23+
/* ✓ GOOD */
24+
var foo = /\bfoo\b/;
25+
26+
/* ✗ BAD */
27+
var foo = /#\bfoo/; // \b will always accept
28+
var foo = /foo\bbar/; // \b will always reject
29+
var foo = /$foo/; // $ will always reject
30+
var foo = /(?=\w)\d+/; // (?=\w) will always accept
31+
```
32+
33+
</eslint-code-block>
34+
35+
### Limitations
36+
37+
Right now, this rule is implemented by only looking a single character ahead and
38+
behind. This is enough to determine whether the builtin assertions (`\b`, `\B`,
39+
`^`, `$`) trivially reject or accept but it is not enough for all lookarounds.
40+
The algorithm determining the characters ahead and behind is very conservative
41+
which can lead to false negatives.
42+
43+
## :wrench: Options
44+
45+
Nothing.
46+
47+
## :heart: Compatibility
48+
49+
This rule was taken from [eslint-plugin-clean-regex].
50+
This rule is compatible with [clean-regex/no-unnecessary-assertions] rule.
51+
52+
[eslint-plugin-clean-regex]: https://github.com/RunDevelopment/eslint-plugin-clean-regex
53+
[clean-regex/no-unnecessary-assertions]: https://github.com/RunDevelopment/eslint-plugin-clean-regex/blob/master/docs/rules/no-unnecessary-assertions.md
54+
55+
## :mag: Implementation
56+
57+
- [Rule source](https://github.com/ota-meshi/eslint-plugin-regexp/blob/master/lib/rules/no-useless-assertions.ts)
58+
- [Test source](https://github.com/ota-meshi/eslint-plugin-regexp/blob/master/tests/lib/rules/no-useless-assertions.ts)

lib/rules/no-useless-assertions.ts

Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
import type { Expression } from "estree"
2+
import type { RegExpVisitor } from "regexpp/visitor"
3+
import type {
4+
Assertion,
5+
EdgeAssertion,
6+
LookaroundAssertion,
7+
WordBoundaryAssertion,
8+
} from "regexpp/ast"
9+
import {
10+
createRule,
11+
defineRegexpVisitor,
12+
getRegexpLocation,
13+
parseFlags,
14+
} from "../utils"
15+
import {
16+
Chars,
17+
getFirstCharAfter,
18+
getFirstConsumedChar,
19+
getLengthRange,
20+
getMatchingDirectionFromAssertionKind,
21+
hasSomeDescendant,
22+
isPotentiallyEmpty,
23+
} from "regexp-ast-analysis"
24+
25+
const messages = {
26+
alwaysRejectByChar:
27+
"'{{assertion}}' will always reject because it is {{followedOrPreceded}} by a character.",
28+
alwaysRejectByNonLineTerminator:
29+
"'{{assertion}}' will always reject because it is {{followedOrPreceded}} by a non-line-terminator character.",
30+
alwaysAcceptByLineTerminator:
31+
"'{{assertion}}' will always accept because it is {{followedOrPreceded}} by a line-terminator character.",
32+
alwaysAcceptOrRejectFollowedByWord:
33+
"'{{assertion}}' will always {{acceptOrReject}} because it is preceded by a non-word character and followed by a word character.",
34+
alwaysAcceptOrRejectFollowedByNonWord:
35+
"'{{assertion}}' will always {{acceptOrReject}} because it is preceded by a non-word character and followed by a non-word character.",
36+
alwaysAcceptOrRejectPrecededByWordFollowedByNonWord:
37+
"'{{assertion}}' will always {{acceptOrReject}} because it is preceded by a word character and followed by a non-word character.",
38+
alwaysAcceptOrRejectPrecededByWordFollowedByWord:
39+
"'{{assertion}}' will always {{acceptOrReject}} because it is preceded by a word character and followed by a word character.",
40+
alwaysForLookaround:
41+
"The {{kind}} '{{assertion}}' will always {{acceptOrReject}}.",
42+
alwaysForNegativeLookaround:
43+
"The negative {{kind}} '{{assertion}}' will always {{acceptOrReject}}.",
44+
}
45+
46+
export default createRule("no-useless-assertions", {
47+
meta: {
48+
docs: {
49+
description:
50+
"disallow assertions that are known to always accept (or reject)",
51+
// TODO Switch to recommended in the major version.
52+
// recommended: true,
53+
recommended: false,
54+
},
55+
schema: [],
56+
messages,
57+
type: "problem",
58+
},
59+
create(context) {
60+
const sourceCode = context.getSourceCode()
61+
62+
/**
63+
* Create visitor
64+
* @param node
65+
*/
66+
function createVisitor(
67+
node: Expression,
68+
_pattern: string,
69+
flagsStr: string,
70+
): RegExpVisitor.Handlers {
71+
const flags = parseFlags(flagsStr)
72+
const flagsWithoutDotAll = parseFlags(flagsStr.replace(/s/g, ""))
73+
74+
/** Report */
75+
function report(
76+
assertion: Assertion,
77+
messageId: keyof typeof messages,
78+
data: Record<string, string>,
79+
) {
80+
context.report({
81+
node,
82+
loc: getRegexpLocation(sourceCode, node, assertion),
83+
messageId,
84+
data: {
85+
assertion: assertion.raw,
86+
...data,
87+
},
88+
})
89+
}
90+
91+
/**
92+
* Verify for `^` or `$`
93+
*/
94+
function verifyStartOrEnd(assertion: EdgeAssertion): void {
95+
// Note: /^/ is the same as /(?<!.)/s and /^/m is the same as /(?<!.)/
96+
// Note: /$/ is the same as /(?!.)/s and /$/m is the same as /(?!.)/
97+
98+
// get the "next" character
99+
const direction = getMatchingDirectionFromAssertionKind(
100+
assertion.kind,
101+
)
102+
const next = getFirstCharAfter(assertion, direction, flags)
103+
104+
const followedOrPreceded =
105+
assertion.kind === "end" ? "followed" : "preceded"
106+
107+
if (!next.edge) {
108+
// there is always some character of `node`
109+
110+
if (!flags.multiline) {
111+
// since the m flag isn't present any character will result in trivial rejection
112+
report(assertion, "alwaysRejectByChar", {
113+
followedOrPreceded,
114+
})
115+
} else {
116+
// only if the character is a sub set of /./, will the assertion trivially reject
117+
118+
// with this little flag hack, we can easily create the dot set.
119+
const dot = Chars.lineTerminator(
120+
flagsWithoutDotAll,
121+
).negate()
122+
123+
if (next.char.isSubsetOf(dot)) {
124+
report(
125+
assertion,
126+
"alwaysRejectByNonLineTerminator",
127+
{ followedOrPreceded },
128+
)
129+
} else if (next.char.isDisjointWith(dot)) {
130+
report(assertion, "alwaysAcceptByLineTerminator", {
131+
followedOrPreceded,
132+
})
133+
}
134+
}
135+
}
136+
}
137+
138+
/**
139+
* Verify for `\b` or `\B`
140+
*/
141+
function verifyWordBoundary(
142+
assertion: WordBoundaryAssertion,
143+
): void {
144+
const word = Chars.word(flags)
145+
146+
const next = getFirstCharAfter(assertion, "ltr", flags)
147+
const prev = getFirstCharAfter(assertion, "rtl", flags)
148+
149+
if (prev.edge || next.edge) {
150+
// we can only do this analysis if we know the previous and next character
151+
return
152+
}
153+
154+
const nextIsWord = next.char.isSubsetOf(word)
155+
const prevIsWord = prev.char.isSubsetOf(word)
156+
const nextIsNonWord = next.char.isDisjointWith(word)
157+
const prevIsNonWord = prev.char.isDisjointWith(word)
158+
159+
// Note: /\b/ == /(?:(?<!\w)(?=\w)|(?<=\w)(?!\w))/ (other flags may apply)
160+
161+
// the idea here is that \B accepts when \b reject and vise versa.
162+
const accept = assertion.negate ? "reject" : "accept"
163+
const reject = assertion.negate ? "accept" : "reject"
164+
165+
if (prevIsNonWord) {
166+
// current branch: /(?<!\w)(?=\w)/
167+
168+
if (nextIsWord) {
169+
report(
170+
assertion,
171+
"alwaysAcceptOrRejectFollowedByWord",
172+
{
173+
acceptOrReject: accept,
174+
},
175+
)
176+
}
177+
if (nextIsNonWord) {
178+
report(
179+
assertion,
180+
"alwaysAcceptOrRejectFollowedByNonWord",
181+
{
182+
acceptOrReject: reject,
183+
},
184+
)
185+
}
186+
}
187+
if (prevIsWord) {
188+
// current branch: /(?<=\w)(?!\w)/
189+
190+
if (nextIsNonWord) {
191+
report(
192+
assertion,
193+
"alwaysAcceptOrRejectPrecededByWordFollowedByNonWord",
194+
{
195+
acceptOrReject: accept,
196+
},
197+
)
198+
}
199+
if (nextIsWord) {
200+
report(
201+
assertion,
202+
"alwaysAcceptOrRejectPrecededByWordFollowedByWord",
203+
{
204+
acceptOrReject: reject,
205+
},
206+
)
207+
}
208+
}
209+
}
210+
211+
/**
212+
* Verify for LookaroundAssertion
213+
*/
214+
function verifyLookaround(assertion: LookaroundAssertion): void {
215+
if (isPotentiallyEmpty(assertion.alternatives)) {
216+
// we don't handle trivial accept/reject based on emptiness
217+
return
218+
}
219+
220+
const direction = getMatchingDirectionFromAssertionKind(
221+
assertion.kind,
222+
)
223+
const after = getFirstCharAfter(assertion, direction, flags)
224+
if (after.edge) {
225+
return
226+
}
227+
228+
const firstOf = getFirstConsumedChar(
229+
assertion.alternatives,
230+
direction,
231+
flags,
232+
)
233+
if (firstOf.empty) {
234+
return
235+
}
236+
237+
// the idea here is that a negate lookaround accepts when non-negated version reject and vise versa.
238+
const accept = assertion.negate ? "reject" : "accept"
239+
const reject = assertion.negate ? "accept" : "reject"
240+
241+
// Careful now! If exact is false, we are only guaranteed to have a superset of the actual character.
242+
// False negatives are fine but we can't have false positives.
243+
244+
if (after.char.isDisjointWith(firstOf.char)) {
245+
report(
246+
assertion,
247+
assertion.negate
248+
? "alwaysForNegativeLookaround"
249+
: "alwaysForLookaround",
250+
{
251+
kind: assertion.kind,
252+
acceptOrReject: reject,
253+
},
254+
)
255+
}
256+
257+
// accept is harder because that can't generally be decided by the first character
258+
259+
// if this contains another assertion then that might reject. It's out of our control
260+
if (
261+
!hasSomeDescendant(
262+
assertion,
263+
(d) => d !== assertion && d.type === "Assertion",
264+
)
265+
) {
266+
const range = getLengthRange(assertion.alternatives)
267+
// we only check the first character, so it's only correct if the assertion requires only one
268+
// character
269+
if (range && range.max === 1) {
270+
// require exactness
271+
if (
272+
firstOf.exact &&
273+
after.char.isSubsetOf(firstOf.char)
274+
) {
275+
report(
276+
assertion,
277+
assertion.negate
278+
? "alwaysForNegativeLookaround"
279+
: "alwaysForLookaround",
280+
{
281+
kind: assertion.kind,
282+
acceptOrReject: accept,
283+
},
284+
)
285+
}
286+
}
287+
}
288+
}
289+
290+
return {
291+
onAssertionEnter(assertion) {
292+
switch (assertion.kind) {
293+
case "start":
294+
case "end":
295+
verifyStartOrEnd(assertion)
296+
break
297+
298+
case "word":
299+
verifyWordBoundary(assertion)
300+
break
301+
302+
case "lookahead":
303+
case "lookbehind":
304+
verifyLookaround(assertion)
305+
break
306+
default:
307+
}
308+
},
309+
}
310+
}
311+
312+
return defineRegexpVisitor(context, {
313+
createVisitor,
314+
})
315+
},
316+
})

0 commit comments

Comments
 (0)