Skip to content

Commit 775ed41

Browse files
committed
Python: Update SensitiveDataHeuristics with newer JS version
which also prompted me to rewrite the QLDoc for `nameIndicatesSensitiveData`
1 parent 16b6248 commit 775ed41

File tree

2 files changed

+12
-8
lines changed

2 files changed

+12
-8
lines changed

javascript/ql/src/semmle/javascript/security/internal/SensitiveDataHeuristics.qll

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -93,10 +93,11 @@ module HeuristicNames {
9393

9494
/**
9595
* Gets a regular expression that identifies strings that may indicate the presence of data
96-
* that is hashed or encrypted, and hence rendered non-sensitive.
96+
* that is hashed or encrypted, and hence rendered non-sensitive, or contains special characters
97+
* suggesting nouns within the string do not represent the meaning of the whole string (e.g. a URL or a SQL query).
9798
*/
9899
string notSensitiveRegexp() {
99-
result = "(?is).*(redact|censor|obfuscate|hash|md5|sha|((?<!un)(en))?(crypt|code)).*"
100+
result = "(?is).*([^\\w$.-]|redact|censor|obfuscate|hash|md5|sha|((?<!un)(en))?(crypt|code)).*"
100101
}
101102

102103
/**
@@ -113,8 +114,9 @@ module HeuristicNames {
113114

114115
/**
115116
* Holds if `name` may indicate the presence of sensitive data, and
116-
* `name` does not indicate the presence of data that is hashed or encrypted, which would have
117-
* rendered the data non-sensitive. `classification` describes the kind of sensitive data involved.
117+
* `name` does not indicate that the data is in fact non-sensitive (for example since
118+
* it is hashed or encrypted). `classification` describes the kind of sensitive data
119+
* involved.
118120
*
119121
* That is, one of the rexeps from `maybeSensitiveRegexp` matches `name` (with the
120122
* given classification), and none of the regexps from `notSensitiveRegexp` matches

python/ql/src/semmle/python/security/internal/SensitiveDataHeuristics.qll

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -93,10 +93,11 @@ module HeuristicNames {
9393

9494
/**
9595
* Gets a regular expression that identifies strings that may indicate the presence of data
96-
* that is hashed or encrypted, and hence rendered non-sensitive.
96+
* that is hashed or encrypted, and hence rendered non-sensitive, or contains special characters
97+
* suggesting nouns within the string do not represent the meaning of the whole string (e.g. a URL or a SQL query).
9798
*/
9899
string notSensitiveRegexp() {
99-
result = "(?is).*(redact|censor|obfuscate|hash|md5|sha|((?<!un)(en))?(crypt|code)).*"
100+
result = "(?is).*([^\\w$.-]|redact|censor|obfuscate|hash|md5|sha|((?<!un)(en))?(crypt|code)).*"
100101
}
101102

102103
/**
@@ -113,8 +114,9 @@ module HeuristicNames {
113114

114115
/**
115116
* Holds if `name` may indicate the presence of sensitive data, and
116-
* `name` does not indicate the presence of data that is hashed or encrypted, which would have
117-
* rendered the data non-sensitive. `classification` describes the kind of sensitive data involved.
117+
* `name` does not indicate that the data is in fact non-sensitive (for example since
118+
* it is hashed or encrypted). `classification` describes the kind of sensitive data
119+
* involved.
118120
*
119121
* That is, one of the rexeps from `maybeSensitiveRegexp` matches `name` (with the
120122
* given classification), and none of the regexps from `notSensitiveRegexp` matches

0 commit comments

Comments
 (0)