Create codeqlcheck.py by SharonHart · Pull Request #1756 · microsoft/presidio

SharonHart · 2025-10-21T13:16:25Z

Change Description

Describe your changes

Issue reference

Fixes #XX

Checklist

I have reviewed the contribution guidelines
I have signed the CLA (if required)
My code includes unit tests
All unit tests and lint checks pass locally
My PR contains documentation updates / additions if required

presidio-analyzer/presidio_analyzer/codeqlcheck.py

+def vuln_crypto(secret):
+    # vulnerable: using md5 for security and random for tokens
+    token = str(random.random())
+    h = hashlib.md5((secret + token).encode()).hexdigest()


The best way to fix this problem is to replace the insecure MD5 hash with a strong cryptographic hash function suitable for the data being processed. For general-purpose cryptographic hashing of secrets, SHA-256 or higher (e.g., SHA-3) is recommended. If the purpose is password storage or token generation for authentication, an even better approach is to use a password hashing algorithm that is slow and salted, such as Argon2, bcrypt, PBKDF2, or scrypt.

Since the code is only shown in one function (vuln_crypto(secret)), and based on current context, simply switching from hashlib.md5 to hashlib.sha256 is the minimal fix, preserving current behavior (token generation based on a secret and a random value) but removing the use of a broken hash algorithm.

To implement the change:

Update line 44 in presidio-analyzer/presidio_analyzer/codeqlcheck.py to use hashlib.sha256 instead of hashlib.md5.

No new imports are necessary, as hashlib is already imported.

No change is required elsewhere unless there is logic dependent on MD5-specific output format (e.g., fixed string length). In this case, both md5().hexdigest() and sha256().hexdigest() return strings, but the length will increase (from 32 to 64 chars).

presidio-analyzer/presidio_analyzer/codeqlcheck.py

+    q = request.args.get("q")   # taint source
+    conn = sqlite3.connect(":memory:")
+    c = conn.cursor()
+    c.execute("SELECT * FROM items WHERE name = '%s'" % q)  # vulnerable sink


To fix the problem, the code must not interpolate user input into the SQL query string. Instead, the query should be parameterized so the database library (sqlite3 in this case) safely escapes/quotes the user value before including it in the SQL sent to the database.

In the sqlite3 module, parameterized queries are written using ? as placeholders, and the user-provided arguments are passed as a tuple in the second argument to cursor.execute().

To fix this, edit line 58 in presidio-analyzer/presidio_analyzer/codeqlcheck.py to:

c.execute("SELECT * FROM items WHERE name = ?", (q,))

No imports or new methods are needed.

presidio-analyzer/presidio_analyzer/codeqlcheck.py

+def vuln_crypto(secret):
+    # vulnerable: using md5 for security and random for tokens
+    token = str(random.random())
+    h = hashlib.md5((secret + token).encode()).hexdigest()


Create codeqlcheck.py

445eae1

github-advanced-security bot found potential problems Oct 21, 2025

View reviewed changes

SharonHart closed this Oct 21, 2025

SharonHart deleted the SharonHart-patch-1 branch October 21, 2025 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create codeqlcheck.py#1756

Create codeqlcheck.py#1756
SharonHart wants to merge 1 commit intomainfrom
SharonHart-patch-1

SharonHart commented Oct 21, 2025

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

Check warning

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SharonHart commented Oct 21, 2025

Change Description

Issue reference

Checklist

Uh oh!

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check warning

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant