Skip to content

Create codeqlcheck.py#1756

Closed
SharonHart wants to merge 1 commit intomainfrom
SharonHart-patch-1
Closed

Create codeqlcheck.py#1756
SharonHart wants to merge 1 commit intomainfrom
SharonHart-patch-1

Conversation

@SharonHart
Copy link
Contributor

Change Description

Describe your changes

Issue reference

Fixes #XX

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

def vuln_crypto(secret):
# vulnerable: using md5 for security and random for tokens
token = str(random.random())
h = hashlib.md5((secret + token).encode()).hexdigest()

Check failure

Code scanning / CodeQL

Use of a broken or weak cryptographic hashing algorithm on sensitive data High

Sensitive data (secret)
is used in a hashing algorithm (MD5) that is insecure.

Copilot Autofix

AI 5 months ago

The best way to fix this problem is to replace the insecure MD5 hash with a strong cryptographic hash function suitable for the data being processed. For general-purpose cryptographic hashing of secrets, SHA-256 or higher (e.g., SHA-3) is recommended. If the purpose is password storage or token generation for authentication, an even better approach is to use a password hashing algorithm that is slow and salted, such as Argon2, bcrypt, PBKDF2, or scrypt.

Since the code is only shown in one function (vuln_crypto(secret)), and based on current context, simply switching from hashlib.md5 to hashlib.sha256 is the minimal fix, preserving current behavior (token generation based on a secret and a random value) but removing the use of a broken hash algorithm.

To implement the change:

  • Update line 44 in presidio-analyzer/presidio_analyzer/codeqlcheck.py to use hashlib.sha256 instead of hashlib.md5.
  • No new imports are necessary, as hashlib is already imported.
  • No change is required elsewhere unless there is logic dependent on MD5-specific output format (e.g., fixed string length). In this case, both md5().hexdigest() and sha256().hexdigest() return strings, but the length will increase (from 32 to 64 chars).

Suggested changeset 1
presidio-analyzer/presidio_analyzer/codeqlcheck.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/presidio-analyzer/presidio_analyzer/codeqlcheck.py b/presidio-analyzer/presidio_analyzer/codeqlcheck.py
--- a/presidio-analyzer/presidio_analyzer/codeqlcheck.py
+++ b/presidio-analyzer/presidio_analyzer/codeqlcheck.py
@@ -41,7 +41,7 @@
 def vuln_crypto(secret):
     # vulnerable: using md5 for security and random for tokens
     token = str(random.random())
-    h = hashlib.md5((secret + token).encode()).hexdigest()
+    h = hashlib.sha256((secret + token).encode()).hexdigest()
     return h
 
 
EOF
@@ -41,7 +41,7 @@
def vuln_crypto(secret):
# vulnerable: using md5 for security and random for tokens
token = str(random.random())
h = hashlib.md5((secret + token).encode()).hexdigest()
h = hashlib.sha256((secret + token).encode()).hexdigest()
return h


Copilot is powered by AI and may make mistakes. Always verify output.
q = request.args.get("q") # taint source
conn = sqlite3.connect(":memory:")
c = conn.cursor()
c.execute("SELECT * FROM items WHERE name = '%s'" % q) # vulnerable sink

Check failure

Code scanning / CodeQL

SQL query built from user-controlled sources High

This SQL query depends on a
user-provided value
.

Copilot Autofix

AI 5 months ago

To fix the problem, the code must not interpolate user input into the SQL query string. Instead, the query should be parameterized so the database library (sqlite3 in this case) safely escapes/quotes the user value before including it in the SQL sent to the database.

In the sqlite3 module, parameterized queries are written using ? as placeholders, and the user-provided arguments are passed as a tuple in the second argument to cursor.execute().

To fix this, edit line 58 in presidio-analyzer/presidio_analyzer/codeqlcheck.py to:

c.execute("SELECT * FROM items WHERE name = ?", (q,))

No imports or new methods are needed.


Suggested changeset 1
presidio-analyzer/presidio_analyzer/codeqlcheck.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/presidio-analyzer/presidio_analyzer/codeqlcheck.py b/presidio-analyzer/presidio_analyzer/codeqlcheck.py
--- a/presidio-analyzer/presidio_analyzer/codeqlcheck.py
+++ b/presidio-analyzer/presidio_analyzer/codeqlcheck.py
@@ -55,7 +55,7 @@
     q = request.args.get("q")   # taint source
     conn = sqlite3.connect(":memory:")
     c = conn.cursor()
-    c.execute("SELECT * FROM items WHERE name = '%s'" % q)  # vulnerable sink
+    c.execute("SELECT * FROM items WHERE name = ?", (q,))
     return str(c.fetchall())
 
 
EOF
@@ -55,7 +55,7 @@
q = request.args.get("q") # taint source
conn = sqlite3.connect(":memory:")
c = conn.cursor()
c.execute("SELECT * FROM items WHERE name = '%s'" % q) # vulnerable sink
c.execute("SELECT * FROM items WHERE name = ?", (q,))
return str(c.fetchall())


Copilot is powered by AI and may make mistakes. Always verify output.
def vuln_crypto(secret):
# vulnerable: using md5 for security and random for tokens
token = str(random.random())
h = hashlib.md5((secret + token).encode()).hexdigest()

Check warning

Code scanning / Bandit

Use of weak MD5 hash for security. Consider usedforsecurity=False Warning

Use of weak MD5 hash for security. Consider usedforsecurity=False
@SharonHart SharonHart closed this Oct 21, 2025
@SharonHart SharonHart deleted the SharonHart-patch-1 branch October 21, 2025 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant