Skip to content

Commit 61719cf

Browse files
committed
Python: Fix a bug in glob conversion
If you have a filter like `**/foo/**` set in the `paths-ignore` bit of your config file, then currently the following happens: - First, the CodeQL CLI observes that this string ends in `/**` and strips off the `**` leaving `**/foo/` - Then the Python extractor strips off leading and trailing `/` characters and proceeds to convert `**/foo` into a regex that is matched against files to (potentially) extract. The trouble with this is that it leaves us unable to distinguish between, say, a file `foo.py` and a file `foo/bar.py`. In other words, we have lost the ability to exclude only the _folder_ `foo` and not any files that happen to start with `foo`. To fix this, we instead make a note of whether the glob ends in a forward slash or not, and adjust the regex correspondingly.
1 parent 2ded42c commit 61719cf

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

python/extractor/semmle/path_filters.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@ def glob_part_to_regex(glob, add_sep):
4141

4242
def glob_to_regex(glob, prefix=""):
4343
'''Convert entire glob to a compiled regex'''
44+
# When the glob ends in `/`, we need to remember this so that we don't accidentally add an
45+
# extra separator to the final regex.
46+
end_sep = "" if glob.endswith("/") else SEP
4447
glob = glob.strip().strip("/")
4548
parts = glob.split("/")
4649
#Trailing '**' is redundant, so strip it off.
@@ -53,7 +56,7 @@ def glob_to_regex(glob, prefix=""):
5356
# something like `C:\\folder\\subfolder\\` and without escaping the
5457
# backslash-path-separators will get interpreted as regex escapes (which might be
5558
# invalid sequences, causing the extractor to crash)
56-
full_pattern = escape(prefix) + ''.join(parts) + "(?:" + SEP + ".*|$)"
59+
full_pattern = escape(prefix) + ''.join(parts) + "(?:" + end_sep + ".*|$)"
5760
return re.compile(full_pattern)
5861

5962
def filter_from_pattern(pattern, prev_filter, prefix):

0 commit comments

Comments
 (0)