|
| 1 | +/** |
| 2 | + * Finds Regex patterns with a character class which contains the same character multiple |
| 3 | + * times. This is redundant and might indicate that the string was not supposed to represent |
| 4 | + * a character class. |
| 5 | + * |
| 6 | + * For example in the pattern `[ERROR] some message.*` the part `[ERROR]` is actually a |
| 7 | + * character class which matches any of these characters. The `[` and `]` should be escaped |
| 8 | + * with a `\` in this case. |
| 9 | + * |
| 10 | + * Note that a `|` _inside a character class_ does not represent an 'either' and is instead |
| 11 | + * matched literally. E.g. the pattern `[ab|cd|ef]` also matches the string `"|"`. |
| 12 | + * |
| 13 | + * This issue is also reported by IntelliJ as `RegExpDuplicateCharacterInClass`. |
| 14 | + * |
| 15 | + * @id todo |
| 16 | + * @kind problem |
| 17 | + */ |
| 18 | + |
| 19 | +import java |
| 20 | +// Uses alias `re` to avoid conflicting declarations |
| 21 | +import semmle.code.java.regex.RegexTreeView as re |
| 22 | + |
| 23 | +// Note: This does not match all Regex patterns, see |
| 24 | +// https://github.com/github/codeql/blob/codeql-cli/v2.15.5/java/ql/lib/semmle/code/java/regex/RegexFlowConfigs.qll#L161-L162 |
| 25 | +from |
| 26 | + re::RegExpCharacterClass charClass, int indexA, re::RegExpNormalChar charA, int indexB, |
| 27 | + re::RegExpNormalChar charB, string charValue |
| 28 | +where |
| 29 | + charA = charClass.getChild(indexA) and |
| 30 | + charB = charClass.getChild(indexB) and |
| 31 | + // Prevent reporting twice with order reversed |
| 32 | + indexA < indexB and |
| 33 | + charValue = charA.getRawValue() and |
| 34 | + charValue = charB.getRawValue() and |
| 35 | + // Ignore false positives for `&&`, which does not seem to be recognized by CodeQL Regex library yet |
| 36 | + not (charValue = "&" and indexA + 1 = indexB) |
| 37 | +// TODO: Maybe check for parse errors to reduce false-positives, with `not charClass.getRegex().failedToParse(_)`? |
| 38 | +select charClass, "Contains '" + charValue + "' twice $@ and $@", charA, "here", charB, "here" |
0 commit comments