Skip to content

[Bug]: Overly permissive regular expression range #734

@ekuboo100

Description

@ekuboo100

CLI Version

v5.56.2

Command

assert re.search("[0-9]+,([A-z]|[0-9])+,True", output[0])

It's easy to write a regular expression range that matches a wider range of characters than you intended. /[a-zA-z]/ matches all lowercase and all uppercase letters, as you would expect, but it also matches the characters: `[ \ ] ^ _ ``.

Another common problem is failing to escape the dash character in a regular expression. An unescaped dash is interpreted as part of a range. For example, in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is clearly not intended by the writer.

Output

No response

Expected Behavior

[CWE-20

Actual Behavior

Improper Neutralization of Special Elements used in a Command in Shell-quote
Exploiting CVE-2021-42740
no-obscure-range
The regex [,-.]
CWE-20.

Steps to Reproduce

POC

The following code is intended to check whether a string is a valid 6 digit hex color.

import re
def is_valid_hex_color(color):
    return re.match(r'^#[0-9a-fA-f]{6}$', color) is not None

However, the A-f range is overly large and matches every uppercase character. It would parse a "color" like #XXYYZZ as valid.

The fix is to use an uppercase A-F range instead.

import re
def is_valid_hex_color(color):
    return re.match(r'^#[0-9a-fA-F]{6}$', color) is not None

Recommendation

Avoid any confusion about which characters are included in the range by writing unambiguous regular expressions. Always check that character ranges match only the expected characters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugissues that report a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions