Conversation
| def vuln_crypto(secret): | ||
| # vulnerable: using md5 for security and random for tokens | ||
| token = str(random.random()) | ||
| h = hashlib.md5((secret + token).encode()).hexdigest() |
Check failure
Code scanning / CodeQL
Use of a broken or weak cryptographic hashing algorithm on sensitive data High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 months ago
The best way to fix this problem is to replace the insecure MD5 hash with a strong cryptographic hash function suitable for the data being processed. For general-purpose cryptographic hashing of secrets, SHA-256 or higher (e.g., SHA-3) is recommended. If the purpose is password storage or token generation for authentication, an even better approach is to use a password hashing algorithm that is slow and salted, such as Argon2, bcrypt, PBKDF2, or scrypt.
Since the code is only shown in one function (vuln_crypto(secret)), and based on current context, simply switching from hashlib.md5 to hashlib.sha256 is the minimal fix, preserving current behavior (token generation based on a secret and a random value) but removing the use of a broken hash algorithm.
To implement the change:
- Update line 44 in
presidio-analyzer/presidio_analyzer/codeqlcheck.pyto usehashlib.sha256instead ofhashlib.md5. - No new imports are necessary, as
hashlibis already imported. - No change is required elsewhere unless there is logic dependent on MD5-specific output format (e.g., fixed string length). In this case, both
md5().hexdigest()andsha256().hexdigest()return strings, but the length will increase (from 32 to 64 chars).
| @@ -41,7 +41,7 @@ | ||
| def vuln_crypto(secret): | ||
| # vulnerable: using md5 for security and random for tokens | ||
| token = str(random.random()) | ||
| h = hashlib.md5((secret + token).encode()).hexdigest() | ||
| h = hashlib.sha256((secret + token).encode()).hexdigest() | ||
| return h | ||
|
|
||
|
|
| q = request.args.get("q") # taint source | ||
| conn = sqlite3.connect(":memory:") | ||
| c = conn.cursor() | ||
| c.execute("SELECT * FROM items WHERE name = '%s'" % q) # vulnerable sink |
Check failure
Code scanning / CodeQL
SQL query built from user-controlled sources High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 months ago
To fix the problem, the code must not interpolate user input into the SQL query string. Instead, the query should be parameterized so the database library (sqlite3 in this case) safely escapes/quotes the user value before including it in the SQL sent to the database.
In the sqlite3 module, parameterized queries are written using ? as placeholders, and the user-provided arguments are passed as a tuple in the second argument to cursor.execute().
To fix this, edit line 58 in presidio-analyzer/presidio_analyzer/codeqlcheck.py to:
c.execute("SELECT * FROM items WHERE name = ?", (q,))No imports or new methods are needed.
| @@ -55,7 +55,7 @@ | ||
| q = request.args.get("q") # taint source | ||
| conn = sqlite3.connect(":memory:") | ||
| c = conn.cursor() | ||
| c.execute("SELECT * FROM items WHERE name = '%s'" % q) # vulnerable sink | ||
| c.execute("SELECT * FROM items WHERE name = ?", (q,)) | ||
| return str(c.fetchall()) | ||
|
|
||
|
|
Change Description
Describe your changes
Issue reference
Fixes #XX
Checklist