Skip to content

Comments

Fix SQL injection in cache modules#1365

Open
codingfrog wants to merge 1 commit intoarsenetar:masterfrom
codingfrog:security/fix-sql-injection
Open

Fix SQL injection in cache modules#1365
codingfrog wants to merge 1 commit intoarsenetar:masterfrom
codingfrog:security/fix-sql-injection

Conversation

@codingfrog
Copy link

@codingfrog codingfrog commented Feb 14, 2026

Context

During a security audit of dupeGuru, we identified SQL injection risks in two cache modules that build SQL queries using string formatting instead of parameterized queries.

Vulnerability

1. core/pe/cache_sqlite.py — Picture cache

The get_multiple() and purge_outdated() methods build SQL WHERE ... IN (...) clauses using Python string formatting:

# get_multiple() — rowids come from the database itself
ids = ",".join(map(str, rowids))
sql = f"... where rowid in ({ids})"
self.con.execute(sql)

# purge_outdated() — same pattern with todelete list
sql = "delete from pictures where rowid in (%s)" % ",".join(str(x) for x in todelete)

While the values currently originate from the database (limiting practical exploitability), this pattern is fragile — any future code path that passes untrusted values would become exploitable. Parameterized queries are the correct practice regardless.

2. core/fs.py — File hash cache (FilesDB)

The get() and put() methods use .format(key=key) to insert column names into SQL queries:

cursor = conn.execute(self.select_query.format(key=key), {...})

The key parameter (e.g., "digest", "digest_partial") is used as a column name. Since SQL parameterization (? placeholders) cannot be used for column names, the standard defense is a whitelist of allowed values.

Fix

cache_sqlite.py

Replace string formatting with ? parameterized placeholders:

placeholders = ",".join("?" * len(rowids))
sql = f"... where rowid in ({placeholders})"
self.con.execute(sql, list(rowids))

fs.py

Add a class-level whitelist and validate before use:

VALID_KEYS = {"digest", "digest_partial", "digest_samples"}

def get(self, path, key):
    if key not in self.VALID_KEYS:
        raise ValueError(f"Invalid cache key: {key}")
    # ... existing logic using .format(key=key)

Test plan

  • pytest core/tests/cache_test.py core/tests/fs_test.py — 23 tests pass
  • No behavioral changes — only query construction is modified

🤖 Generated with Claude Code

Replace string interpolation with parameterized queries in
cache_sqlite.py and add whitelist validation in fs.py to
prevent SQL injection vulnerabilities.

- Use parameterized placeholders in get_multiple()
- Use parameterized placeholders in purge_outdated()
- Add VALID_KEYS whitelist in FilesDB class
- Add key validation in get() and put() methods

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant