Skip to content

Commit ac83c69

Browse files
committed
Python: Add py/weak-sensitive-data-hashing query
1 parent 499adc2 commit ac83c69

File tree

12 files changed

+488
-0
lines changed

12 files changed

+488
-0
lines changed
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
<overview>
6+
<p>
7+
Using a broken or weak cryptographic hash function can leave data
8+
vulnerable, and should not be used in security related code.
9+
</p>
10+
11+
<p>
12+
A strong cryptographic hash function should be resistant to:
13+
</p>
14+
<ul>
15+
<li>
16+
pre-image attacks: if you know a hash value <code>h(x)</code>,
17+
you should not be able to easily find the input <code>x</code>.
18+
</li>
19+
<li>
20+
collision attacks: if you know a hash value <code>h(x)</code>,
21+
you should not be able to easily find a different input <code>y</code>
22+
such that hash value is the same <code>h(x) = h(y)</code>.
23+
</li>
24+
</ul>
25+
<p>
26+
In cases with a limited input space, such as for passwords, the hash
27+
function also needs to be computationally expensive to be resistant to
28+
brute-force attacks.
29+
</p>
30+
31+
<p>
32+
As an example, both MD5 and SHA-1 is known to be vulnerable to collision attacks.
33+
</p>
34+
35+
<p>
36+
Since it's OK to use a weak cryptographic hash function in a non-security
37+
context, this query only alerts when these are used to hash sensitive
38+
data (such as passwords, certificates, usernames).
39+
</p>
40+
41+
<p>
42+
Use of broken or weak cryptographic algorithms that are not hashing algorithms, is
43+
handled by the <code>py/weak-cryptographic-algorithm</code> query.
44+
</p>
45+
46+
</overview>
47+
<recommendation>
48+
49+
<p>
50+
Ensure that you use a strong, modern cryptographic hash function:
51+
</p>
52+
53+
<ul>
54+
<li>
55+
such as Argon2, scrypt, bcrypt, or PBKDF2 for passwords and other data with limited input space.
56+
</li>
57+
<li>
58+
such as SHA-2, or SHA-3 in other cases.
59+
</li>
60+
</ul>
61+
62+
</recommendation>
63+
<example>
64+
65+
<p>
66+
The following example shows two functions for checking whether the hash
67+
of a certificate matches a known value -- to prevent tampering.
68+
69+
The first function uses MD5 that is known to be vulnerable to collision attacks.
70+
71+
The second function uses SHA-256 that is a strong cryptographic hashing function.
72+
</p>
73+
74+
<sample src="examples/weak_certificate_hashing.py" />
75+
76+
</example>
77+
<example>
78+
<p>
79+
The following example shows two functions for hashing passwords.
80+
81+
The first function uses SHA-256 to hash passwords. Although SHA-256 is a
82+
strong cryptographic hash function, it is not suitable for password
83+
hashing since it is not computationally expensive.
84+
</p>
85+
86+
<sample src="examples/weak_password_hashing_bad.py" />
87+
88+
89+
<p>
90+
The second function uses Argon2 (through the <code>argon2-cffi</code>
91+
PyPI package), which is a strong password hashing algorithm (and
92+
includes a per-password salt by default).
93+
</p>
94+
95+
<sample src="examples/weak_password_hashing_good.py" />
96+
97+
</example>
98+
99+
<references>
100+
<li>OWASP: <a href="https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html">Password Storage Cheat Sheet</a></li>
101+
</references>
102+
103+
</qhelp>
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
/**
2+
* @name Use of a broken or weak cryptographic hashing algorithm on sensitive data
3+
* @description Using broken or weak cryptographic hashing algorithms can compromise security.
4+
* @kind path-problem
5+
* @problem.severity warning
6+
* @precision high
7+
* @id py/weak-sensitive-data-hashing
8+
* @tags security
9+
* external/cwe/cwe-327
10+
* external/cwe/cwe-916
11+
*/
12+
13+
import python
14+
import semmle.python.security.dataflow.WeakSensitiveDataHashing
15+
import semmle.python.dataflow.new.DataFlow
16+
import semmle.python.dataflow.new.TaintTracking
17+
import DataFlow::PathGraph
18+
19+
from
20+
DataFlow::PathNode source, DataFlow::PathNode sink, string ending, string algorithmName,
21+
string classification
22+
where
23+
exists(NormalHashFunction::Configuration config |
24+
config.hasFlowPath(source, sink) and
25+
algorithmName = sink.getNode().(NormalHashFunction::Sink).getAlgorithmName() and
26+
classification = source.getNode().(NormalHashFunction::Source).getClassification() and
27+
ending = "."
28+
)
29+
or
30+
exists(ComputationallyExpensiveHashFunction::Configuration config |
31+
config.hasFlowPath(source, sink) and
32+
algorithmName = sink.getNode().(ComputationallyExpensiveHashFunction::Sink).getAlgorithmName() and
33+
classification =
34+
source.getNode().(ComputationallyExpensiveHashFunction::Source).getClassification() and
35+
(
36+
sink.getNode().(ComputationallyExpensiveHashFunction::Sink).isComputationallyExpensive() and
37+
ending = "."
38+
or
39+
not sink.getNode().(ComputationallyExpensiveHashFunction::Sink).isComputationallyExpensive() and
40+
ending =
41+
" for " + classification +
42+
" hashing, since it is not a computationally expensive hash function."
43+
)
44+
)
45+
select sink.getNode(), source, sink,
46+
"$@ is used in a hashing algorithm (" + algorithmName + ") that is insecure" + ending,
47+
source.getNode(), "Sensitive data (" + classification + ")"
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
import hashlib
2+
3+
def certificate_matches_known_hash_bad(certificate, known_hash):
4+
hash = hashlib.md5(certificate).hexdigest() # BAD
5+
return hash == known_hash
6+
7+
def certificate_matches_known_hash_good(certificate, known_hash):
8+
hash = hashlib.sha256(certificate).hexdigest() # GOOD
9+
return hash == known_hash
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
import hashlib
2+
3+
def get_password_hash(password: str, salt: str):
4+
return hashlib.sha256(password + salt).hexdigest() # BAD
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
from argon2 import PasswordHasher
2+
3+
def get_initial_hash(password: str):
4+
ph = PasswordHasher()
5+
return ph.hash(password) # GOOD
6+
7+
def check_password(password: str, known_hash):
8+
ph = PasswordHasher()
9+
return ph.verify(known_hash, password) # GOOD
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
/**
2+
* Provides a taint-tracking configuration for detecting use of a broken or weak
3+
* cryptographic hashing algorithm on sensitive data.
4+
*
5+
* Note, for performance reasons: only import this file if
6+
* `WeakSensitiveDataHashing::Configuration` is needed, otherwise
7+
* `SqlInjectionCustomizations` should be imported instead.
8+
*/
9+
10+
private import python
11+
private import semmle.python.dataflow.new.DataFlow
12+
private import semmle.python.dataflow.new.TaintTracking
13+
private import semmle.python.Concepts
14+
private import semmle.python.dataflow.new.RemoteFlowSources
15+
private import semmle.python.dataflow.new.BarrierGuards
16+
17+
/**
18+
* Provides a taint-tracking configuration for detecting use of a broken or weak
19+
* cryptographic hash function on sensitive data, that does NOT require a
20+
* computationally expensive hash function.
21+
*/
22+
module NormalHashFunction {
23+
import WeakSensitiveDataHashingCustomizations::NormalHashFunction
24+
25+
/**
26+
* A taint-tracking configuration for detecting use of a broken or weak
27+
* cryptographic hashing algorithm on sensitive data.
28+
*/
29+
class Configuration extends TaintTracking::Configuration {
30+
Configuration() { this = "NormalHashFunction" }
31+
32+
override predicate isSource(DataFlow::Node source) { source instanceof Source }
33+
34+
override predicate isSink(DataFlow::Node sink) { sink instanceof Sink }
35+
36+
override predicate isSanitizer(DataFlow::Node node) {
37+
super.isSanitizer(node)
38+
or
39+
node instanceof Sanitizer
40+
}
41+
}
42+
}
43+
44+
/**
45+
* Provides a taint-tracking configuration for detecting use of a broken or weak
46+
* cryptographic hashing algorithm on passwords.
47+
*
48+
* Passwords has stricter requirements on the hashing algorithm used (must be
49+
* computationally expensive to prevent brute-force attacks).
50+
*/
51+
module ComputationallyExpensiveHashFunction {
52+
import WeakSensitiveDataHashingCustomizations::ComputationallyExpensiveHashFunction
53+
54+
/**
55+
* A taint-tracking configuration for detecting use of a broken or weak
56+
* cryptographic hashing algorithm on passwords.
57+
*
58+
* Passwords has stricter requirements on the hashing algorithm used (must be
59+
* computationally expensive to prevent brute-force attacks).
60+
*/
61+
class Configuration extends TaintTracking::Configuration {
62+
Configuration() { this = "ComputationallyExpensiveHashFunction" }
63+
64+
override predicate isSource(DataFlow::Node source) { source instanceof Source }
65+
66+
override predicate isSink(DataFlow::Node sink) { sink instanceof Sink }
67+
68+
override predicate isSanitizer(DataFlow::Node node) {
69+
super.isSanitizer(node)
70+
or
71+
node instanceof Sanitizer
72+
}
73+
}
74+
}

0 commit comments

Comments
 (0)