Skip to content

Commit 17f9d5a

Browse files
no-useless-backreference: Made compatible with clean-regex/no-empty-backreference (#111)
* `no-useless-backreference`: Made compatible with `clean-regex/no-empty-alternative` * More efficient implementation * Removed `isEmptyBackreference` call * Updated docs * Update * Fixed editor * Added more docs
1 parent a11643f commit 17f9d5a

File tree

7 files changed

+208
-112
lines changed

7 files changed

+208
-112
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ The rules with the following star :star: are included in the `plugin:regexp/reco
100100
| [regexp/no-octal](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-octal.html) | disallow octal escape sequence | :star: |
101101
| [regexp/no-trivially-nested-assertion](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-trivially-nested-assertion.html) | disallow trivially nested assertions | :wrench: |
102102
| [regexp/no-unused-capturing-group](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-unused-capturing-group.html) | disallow unused capturing group | |
103-
| [regexp/no-useless-backreference](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-useless-backreference.html) | disallow useless backreferences in regular expressions | |
103+
| [regexp/no-useless-backreference](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-useless-backreference.html) | disallow useless backreferences in regular expressions | :star: |
104104
| [regexp/no-useless-character-class](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-useless-character-class.html) | disallow character class with one character | :wrench: |
105105
| [regexp/no-useless-dollar-replacements](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-useless-dollar-replacements.html) | disallow useless `$` replacements in replacement string | |
106106
| [regexp/no-useless-escape](https://ota-meshi.github.io/eslint-plugin-regexp/rules/no-useless-escape.html) | disallow unnecessary escape characters in RegExp | |

docs/rules/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ The rules with the following star :star: are included in the `plugin:regexp/reco
2828
| [regexp/no-octal](./no-octal.md) | disallow octal escape sequence | :star: |
2929
| [regexp/no-trivially-nested-assertion](./no-trivially-nested-assertion.md) | disallow trivially nested assertions | :wrench: |
3030
| [regexp/no-unused-capturing-group](./no-unused-capturing-group.md) | disallow unused capturing group | |
31-
| [regexp/no-useless-backreference](./no-useless-backreference.md) | disallow useless backreferences in regular expressions | |
31+
| [regexp/no-useless-backreference](./no-useless-backreference.md) | disallow useless backreferences in regular expressions | :star: |
3232
| [regexp/no-useless-character-class](./no-useless-character-class.md) | disallow character class with one character | :wrench: |
3333
| [regexp/no-useless-dollar-replacements](./no-useless-dollar-replacements.md) | disallow useless `$` replacements in replacement string | |
3434
| [regexp/no-useless-escape](./no-useless-escape.md) | disallow unnecessary escape characters in RegExp | |

docs/rules/no-useless-backreference.md

Lines changed: 69 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,80 @@ since: "v0.1.0"
99

1010
> disallow useless backreferences in regular expressions
1111
12+
- :gear: This rule is included in `"plugin:regexp/recommended"`.
13+
1214
## :book: Rule Details
1315

14-
This rule is a copy of the ESLint core [no-useless-backreference] rule.
15-
The [no-useless-backreference] rule was added in ESLint 7.x, but this plugin supports ESLint 6.x.
16-
Copied to this plugin to allow the same [no-useless-backreference] rules to be used in ESLint 6.x.
16+
Backreferences that will always trivially accept serve no function and can be removed.
17+
18+
This rule is a based on the ESLint core [no-useless-backreference] rule. It reports all the ESLint core rule reports and some more.
19+
20+
### Causes
21+
22+
Backreferences can be useless for multiple reasons.
23+
24+
#### Empty capturing groups
25+
26+
The is the easiest reason. The references capturing group does not consume any characters (e.g. `/(\b)a\1/`). Since the capturing group can only capture the empty string, the backreference is guaranteed to accept any input.
27+
28+
#### Nested backreferences
29+
30+
If a backreference is inside the group it references (e.g. `/(a\1)/`), then it is guaranteed to trivially accept.
31+
32+
This is because the regex engine only sets the text of a capturing group **after** the group has been matched. Since the regex engine is still in the process of matching the group, its capture text is undefined.
33+
34+
#### Different alternatives
35+
36+
If a backreference and its referenced capturing group are in different alternatives (e.g. `/(a)|\1/`), then the backreference will always trivially accept because the captured text of the referenced group is undefined.
37+
38+
#### Forward references and backward references
39+
40+
Backreferences are supposed to be matched **after** their referenced capturing group. Since regular expressions are matched from left to right, backreferences usually appear to the right of to their referenced capturing groups (e.g. `/(a)\1/`). However, backreferences can also be placed before (to the left of) their referenced capturing group (e.g. `/\1(a)/`). These backreferences are to trivially accept because the captured text of their referenced groups is undefined. We call these backreferences _forward references_.
41+
42+
Inside **lookbehind assertions**, regular expressions are matched from right to left and not from left to right. This means that only backreferences now have to appear to the left of their respective capturing group to be matched after them (e.g. `/(?<=\1(a))/`). Backreferences placed to before (to the right of) their referenced capturing group inside lookbehinds are guaranteed to trivially accept. We call these backreferences _backward references_.
43+
44+
#### Negated lookaround assertions
45+
46+
If the referenced capturing group of a backreference is inside a negated lookaround assertion the backreference is also part of, then the backreference will be guaranteed to trivially accept.
47+
48+
To understand why this is the case, let's look at the example `/(?!(a))\w\1/y`.
49+
50+
1. Let's assume the input string is `ab`. <br>
51+
Since `(a)` accepts the character `a`, `(?!(a))` will reject. So the input is reject before the backreference `\1` can be reached.
52+
53+
The result of `/(?!(a))\w\1/y.exec("ab")` is `null`.
54+
2. Let's assume the input string is `bc`. <br>
55+
Since `(a)` rejects the character `b`, its captured text will be undefined and `(?!(a))` will accept. Then `\w` will accept and consume the character `b`. Since the captured text of `(a)` is undefined, the backreference `\1` will trivially accept without consuming characters.
56+
57+
The result of `/(?!(a))\w\1/y.exec("bc")` is `[ 'b', undefined, index: 0, input: 'bc' ]`.
58+
59+
Note that this is only a problem if the backreference is not part of the negated lookaround assertion. I.e. `/(?!(a)\1)\w/` is okay.
60+
61+
<eslint-code-block>
62+
63+
```js
64+
/* eslint regexp/no-useless-backreference: "error" */
65+
66+
/* ✓ GOOD */
67+
var foo = /(a)b\1/;
68+
var foo = /(a?)b\1/;
69+
var foo = /(\b|a)+b\1/;
70+
var foo = /(a)?(?:a|\1)/;
71+
72+
/* ✗ BAD */
73+
var foo = /\1(a)/;
74+
var foo = /(a\1)/;
75+
var foo = /(a)|\1/;
76+
var foo = /(?:(a)|\1)+/;
77+
var foo = /(?<=(a)\1)/;
78+
var foo = /(\b)a\1/;
79+
```
80+
81+
</eslint-code-block>
1782

1883
## :wrench: Options
1984

20-
See [no-useless-backreference] document.
85+
Nothing.
2186

2287
## :books: Further reading
2388

lib/configs/recommended.ts

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
import eslint from "eslint"
2-
31
export = {
42
plugins: ["regexp"],
53
rules: {
@@ -9,10 +7,8 @@ export = {
97
"no-misleading-character-class": "error",
108
"no-regex-spaces": "error",
119
"prefer-regex-literals": "error",
12-
// If ESLint is 7 or higher, use core rule. If it is 6 or less, use the copied rule.
13-
[parseInt(eslint.Linter?.version?.[0] ?? "6", 10) >= 7
14-
? "no-useless-backreference"
15-
: "regexp/no-useless-backreference"]: "error",
10+
// The ESLint rule will report fewer cases than our rule
11+
"no-useless-backreference": "off",
1612

1713
// eslint-plugin-regexp rules
1814
"regexp/match-any": "error",
@@ -23,6 +19,7 @@ export = {
2319
"regexp/no-escape-backspace": "error",
2420
"regexp/no-invisible-character": "error",
2521
"regexp/no-octal": "error",
22+
"regexp/no-useless-backreference": "error",
2623
"regexp/no-useless-exactly-quantifier": "error",
2724
"regexp/no-useless-two-nums-quantifier": "error",
2825
"regexp/prefer-d": "error",

lib/rules/no-useless-backreference.ts

Lines changed: 75 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1,63 +1,89 @@
11
import type { Expression } from "estree"
22
import type { RegExpVisitor } from "regexpp/visitor"
3-
import type { Node as RegExpNode, LookaroundAssertion } from "regexpp/ast"
4-
import { createRule, defineRegexpVisitor } from "../utils"
3+
import type {
4+
Node as RegExpNode,
5+
Backreference,
6+
Alternative,
7+
CapturingGroup,
8+
} from "regexpp/ast"
9+
import { createRule, defineRegexpVisitor, getRegexpLocation } from "../utils"
10+
import {
11+
getClosestAncestor,
12+
getMatchingDirection,
13+
isZeroLength,
14+
} from "regexp-ast-analysis"
515

6-
/* istanbul ignore file */
716
/**
8-
* Finds the path from the given `regexpp` AST node to the root node.
9-
* @param {regexpp.Node} node Node.
10-
* @returns {regexpp.Node[]} Array that starts with the given node and ends with the root node.
17+
* Returns whether the list of ancestors from `from` to `to` contains a negated
18+
* lookaround.
1119
*/
12-
function getPathToRoot(node: RegExpNode) {
13-
const path = []
14-
let current = node
15-
16-
while (current) {
17-
path.push(current)
18-
if (!current.parent) {
19-
break
20+
function hasNegatedLookaroundInBetween(
21+
from: CapturingGroup,
22+
to: Alternative,
23+
): boolean {
24+
for (let p: RegExpNode | null = from.parent; p && p !== to; p = p.parent) {
25+
if (
26+
p.type === "Assertion" &&
27+
(p.kind === "lookahead" || p.kind === "lookbehind") &&
28+
p.negate
29+
) {
30+
return true
2031
}
21-
current = current.parent
2232
}
23-
24-
return path
33+
return false
2534
}
2635

2736
/**
28-
* Determines whether the given `regexpp` AST node is a lookaround node.
29-
* @param {regexpp.Node} node Node.
30-
* @returns {boolean} `true` if it is a lookaround node.
37+
* Returns the message id specifying the reason why the backreference is
38+
* useless.
3139
*/
32-
function isLookaround(node: RegExpNode): node is LookaroundAssertion {
33-
return (
34-
node.type === "Assertion" &&
35-
(node.kind === "lookahead" || node.kind === "lookbehind")
36-
)
37-
}
40+
function getUselessMessageId(backRef: Backreference): string | null {
41+
const group = backRef.resolved
3842

39-
/**
40-
* Determines whether the given `regexpp` AST node is a negative lookaround node.
41-
* @param {regexpp.Node} node Node.
42-
* @returns {boolean} `true` if it is a negative lookaround node.
43-
*/
44-
function isNegativeLookaround(node: RegExpNode) {
45-
return isLookaround(node) && node.negate
46-
}
43+
const closestAncestor = getClosestAncestor(backRef, group)
4744

48-
/**
49-
* Get last element
50-
*/
51-
function last<T>(arr: T[]): T {
52-
return arr[arr.length - 1]
45+
if (closestAncestor === group) {
46+
return "nested"
47+
} else if (closestAncestor.type !== "Alternative") {
48+
// if the closest common ancestor isn't an alternative => they're disjunctive.
49+
return "disjunctive"
50+
}
51+
52+
if (hasNegatedLookaroundInBetween(group, closestAncestor)) {
53+
// if there are negated lookarounds between the group and the closest ancestor
54+
// => group has already failed when backRef starts to match.
55+
// e.g. `/(?!(a))\w\1/`
56+
return "intoNegativeLookaround"
57+
}
58+
59+
const matchingDir = getMatchingDirection(closestAncestor)
60+
61+
if (matchingDir === "ltr" && backRef.end <= group.start) {
62+
// backRef is left, group is right ('forward reference')
63+
// => group hasn't matched yet when backRef starts to match.
64+
return "forward"
65+
} else if (matchingDir === "rtl" && group.end <= backRef.start) {
66+
// the opposite of the previous when the regex is matching backwards
67+
// in a lookbehind context.
68+
return "backward"
69+
}
70+
71+
if (isZeroLength(group)) {
72+
// if the referenced group does not consume characters, then any
73+
// backreference will trivially be replaced with the empty string
74+
return "empty"
75+
}
76+
77+
// not useless
78+
return null
5379
}
5480

5581
export default createRule("no-useless-backreference", {
5682
meta: {
5783
docs: {
5884
description:
5985
"disallow useless backreferences in regular expressions",
60-
recommended: false,
86+
recommended: true,
6187
},
6288
schema: [],
6389
messages: {
@@ -71,75 +97,31 @@ export default createRule("no-useless-backreference", {
7197
"Backreference '{{ bref }}' will be ignored. It references group '{{ group }}' which is in another alternative.",
7298
intoNegativeLookaround:
7399
"Backreference '{{ bref }}' will be ignored. It references group '{{ group }}' which is in a negative lookaround.",
100+
empty:
101+
"Backreference '{{ bref }}' will be ignored. It references group '{{ group }}' which always captures zero characters.",
74102
},
75103
type: "suggestion", // "problem",
76104
},
77105
create(context) {
106+
const sourceCode = context.getSourceCode()
107+
78108
/**
79109
* Create visitor
80110
* @param node
81111
*/
82112
function createVisitor(node: Expression): RegExpVisitor.Handlers {
83113
return {
84-
onBackreferenceEnter(bref) {
85-
const group = bref.resolved
86-
const brefPath = getPathToRoot(bref)
87-
const groupPath = getPathToRoot(group)
88-
let messageId = null
89-
90-
if (brefPath.includes(group)) {
91-
// group is bref's ancestor => bref is nested ('nested reference') => group hasn't matched yet when bref starts to match.
92-
messageId = "nested"
93-
} else {
94-
// Start from the root to find the lowest common ancestor.
95-
let i = brefPath.length - 1
96-
let j = groupPath.length - 1
97-
98-
do {
99-
i--
100-
j--
101-
} while (brefPath[i] === groupPath[j])
102-
103-
const indexOfLowestCommonAncestor = j + 1
104-
const groupCut = groupPath.slice(
105-
0,
106-
indexOfLowestCommonAncestor,
107-
)
108-
const commonPath = groupPath.slice(
109-
indexOfLowestCommonAncestor,
110-
)
111-
const lowestCommonLookaround = commonPath.find(
112-
isLookaround,
113-
)
114-
const isMatchingBackward =
115-
lowestCommonLookaround &&
116-
lowestCommonLookaround.kind === "lookbehind"
117-
118-
if (!isMatchingBackward && bref.end <= group.start) {
119-
// bref is left, group is right ('forward reference') => group hasn't matched yet when bref starts to match.
120-
messageId = "forward"
121-
} else if (
122-
isMatchingBackward &&
123-
group.end <= bref.start
124-
) {
125-
// the opposite of the previous when the regex is matching backward in a lookbehind context.
126-
messageId = "backward"
127-
} else if (last(groupCut).type === "Alternative") {
128-
// group's and bref's ancestor nodes below the lowest common ancestor are sibling alternatives => they're disjunctive.
129-
messageId = "disjunctive"
130-
} else if (groupCut.some(isNegativeLookaround)) {
131-
// group is in a negative lookaround which isn't bref's ancestor => group has already failed when bref starts to match.
132-
messageId = "intoNegativeLookaround"
133-
}
134-
}
114+
onBackreferenceEnter(backRef) {
115+
const messageId = getUselessMessageId(backRef)
135116

136117
if (messageId) {
137118
context.report({
138119
node,
120+
loc: getRegexpLocation(sourceCode, node, backRef),
139121
messageId,
140122
data: {
141-
bref: bref.raw,
142-
group: group.raw,
123+
bref: backRef.raw,
124+
group: backRef.resolved.raw,
143125
},
144126
})
145127
}

0 commit comments

Comments
 (0)