Skip to content

Commit 417ca1a

Browse files
committed
Enchanced js/regex/duplicate-in-character-class's qhelp
1 parent 42a880b commit 417ca1a

File tree

1 file changed

+39
-10
lines changed

1 file changed

+39
-10
lines changed

javascript/ql/src/RegExp/DuplicateCharacterInCharacterClass.qhelp

Lines changed: 39 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,26 +5,42 @@
55

66
<overview>
77
<p>
8-
Character classes in regular expressions represent sets of characters, so there is no need to specify
9-
the same character twice in one character class. Duplicate characters in character classes are at best
10-
useless, and may even indicate a latent bug.
8+
Character classes in regular expressions (denoted by square brackets <code>[]</code>) represent sets of characters where the pattern matches any single character from that set. Since character classes are sets, specifying the same character multiple times is redundant and often indicates a programming error.
119
</p>
1210

11+
<p>
12+
Common mistakes include:
13+
</p>
14+
<ul>
15+
<li>Using square brackets <code>[]</code> instead of parentheses <code>()</code> for grouping alternatives</li>
16+
<li>Misunderstanding that special regex characters like <code>|</code>, <code>*</code>, <code>+</code>, <code>()</code>, <code>-</code> etc. work the same inside character classes as outside</li>
17+
<li>Accidentally duplicating characters or escape sequences that represent the same character</li>
18+
</ul>
19+
1320
</overview>
1421
<recommendation>
1522

16-
<p>If the character was accidentally duplicated, remove it. If the character class was meant to be a
17-
group, replace the brackets with parentheses.</p>
23+
<p>
24+
Examine each duplicate character to determine the intended behavior:
25+
</p>
26+
<ul>
27+
<li><strong>If you see <code>|</code> inside square brackets (e.g., <code>[a|b|c]</code>)</strong>: This is usually a mistake. The author likely intended alternation. Replace the character class with a group: <code>(a|b|c)</code></li>
28+
<li>If trying to match alternative strings, use parentheses <code>()</code> for grouping instead of square brackets</li>
29+
<li>If the duplicate was truly accidental, remove the redundant characters</li>
30+
<li>If trying to use special regex operators inside square brackets, note that most operators (like <code>|</code>) are treated as literal characters</li>
31+
</ul>
1832

33+
<p>
34+
<strong>Important:</strong> Simply removing <code>|</code> characters from character classes is rarely the correct fix. Instead, analyze the pattern to understand what the author intended to match.
35+
</p>
1936

2037
</recommendation>
2138
<example>
2239
<p>
23-
In the following example, the character class <code>[password|pwd]</code> contains two instances each
24-
of the characters <code>d</code>, <code>p</code>, <code>s</code>, and <code>w</code>. The programmer
25-
most likely meant to write <code>(password|pwd)</code> (a pattern that matches either the string
26-
<code>"password"</code> or the string <code>"pwd"</code>), and accidentally mistyped the enclosing
27-
brackets.
40+
<strong>Example 1: Confusing character classes with groups</strong>
41+
</p>
42+
<p>
43+
The pattern <code>[password|pwd]</code> does not match "password" or "pwd" as intended. Instead, it matches any single character from the set <code>{p, a, s, w, o, r, d, |}</code>. Note that <code>|</code> has no special meaning inside character classes.
2844
</p>
2945

3046
<sample src="examples/DuplicateCharacterInCharacterClass.js" />
@@ -33,10 +49,23 @@ brackets.
3349
To fix this problem, the regular expression should be rewritten to <code>/(password|pwd) =/</code>.
3450
</p>
3551

52+
<p>
53+
<strong>Example 2: CSS unit matching</strong>
54+
</p>
55+
<p>
56+
The pattern <code>r?e[m|x]</code> appears to be trying to match "rem" or "rex", but actually matches "re" followed by any of the characters <code>{m, |, x}</code>. The correct pattern should be <code>r?e(m|x)</code> or <code>(rem|rex)</code>.
57+
</p>
58+
59+
<p>
60+
Similarly, <code>v[h|w|min|max]</code> should be <code>v(h|w|min|max)</code> to properly match "vh", "vw", "vmin", or "vmax".
61+
</p>
62+
3663
</example>
3764
<references>
3865

3966
<li>Mozilla Developer Network: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions">JavaScript Regular Expressions</a>.</li>
67+
<li>MDN: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes">Character Classes</a> - Details on how character classes work.</li>
68+
<li>MDN: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges">Groups and Ranges</a> - Proper use of grouping with parentheses.</li>
4069

4170
</references>
4271
</qhelp>

0 commit comments

Comments
 (0)