Skip to content

Commit 2bb44d4

Browse files
authored
Python: Perform more deduplication
This cut the evaluation time on `django` down from 1.2 seconds to ~0.8 seconds (but the impact will likely be greater on bigger projects).
1 parent 0999340 commit 2bb44d4

File tree

1 file changed

+20
-5
lines changed

1 file changed

+20
-5
lines changed

python/ql/src/semmle/python/dataflow/new/SensitiveDataSources.qll

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -195,18 +195,33 @@ private module SensitiveDataModeling {
195195
}
196196

197197
/**
198-
* Returns strings (primarily the names of various program entities) that may contain sensitive data
199-
* with the classification `classification`.
198+
* This helper predicate serves to deduplicate the results of the preceding predicates. This
199+
* means that if, say, an attribute and a function parameter have the same name, then that name will
200+
* only be matched once, which greatly cuts down on the number of regexp matches that have to be
201+
* performed.
200202
*
201-
* This is a helper predicate, used to limit the number of regexp matches that have to be performed.
203+
* Under normal circumstances, deduplication is only performed when a predicate is materialized, and
204+
* so to see the effect of this we must create a separate predicate that calculates the union of the
205+
* preceding predicates.
202206
*/
203207
pragma[nomagic]
204-
private string sensitiveString(SensitiveDataClassification classification) {
208+
private string sensitiveStringCandidate() {
205209
result in [
206210
sensitiveNameCandidate(), sensitiveAttributeNameCandidate(),
207211
sensitiveParameterNameCandidate(), sensitiveFunctionNameCandidate(),
208212
sensitiveStrConstCandidate()
209-
] and
213+
]
214+
}
215+
216+
/**
217+
* Returns strings (primarily the names of various program entities) that may contain sensitive data
218+
* with the classification `classification`.
219+
*
220+
* This is a helper predicate, used to limit the number of regexp matches that have to be performed.
221+
*/
222+
pragma[nomagic]
223+
private string sensitiveString(SensitiveDataClassification classification) {
224+
result = sensitiveStringCandidate() and
210225
result.regexpMatch(maybeSensitiveRegexp(classification))
211226
}
212227

0 commit comments

Comments
 (0)