Fix SQL injection in cache modules#1365
Open
codingfrog wants to merge 1 commit intoarsenetar:masterfrom
Open
Conversation
Replace string interpolation with parameterized queries in cache_sqlite.py and add whitelist validation in fs.py to prevent SQL injection vulnerabilities. - Use parameterized placeholders in get_multiple() - Use parameterized placeholders in purge_outdated() - Add VALID_KEYS whitelist in FilesDB class - Add key validation in get() and put() methods Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
During a security audit of dupeGuru, we identified SQL injection risks in two cache modules that build SQL queries using string formatting instead of parameterized queries.
Vulnerability
1.
core/pe/cache_sqlite.py— Picture cacheThe
get_multiple()andpurge_outdated()methods build SQLWHERE ... IN (...)clauses using Python string formatting:While the values currently originate from the database (limiting practical exploitability), this pattern is fragile — any future code path that passes untrusted values would become exploitable. Parameterized queries are the correct practice regardless.
2.
core/fs.py— File hash cache (FilesDB)The
get()andput()methods use.format(key=key)to insert column names into SQL queries:The
keyparameter (e.g.,"digest","digest_partial") is used as a column name. Since SQL parameterization (?placeholders) cannot be used for column names, the standard defense is a whitelist of allowed values.Fix
cache_sqlite.py
Replace string formatting with
?parameterized placeholders:fs.py
Add a class-level whitelist and validate before use:
Test plan
pytest core/tests/cache_test.py core/tests/fs_test.py— 23 tests pass🤖 Generated with Claude Code