Skip to content

Commit 17ac67a

Browse files
Fix SQL injection via untrusted dictionary keys in metadata generator (#241)
Adds strict schema validation using PRAGMA table_info to filter untrusted dictionary keys before dynamically constructing the SQL INSERT statement. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: thebearwithabite <216692431+thebearwithabite@users.noreply.github.com>
1 parent 613e4ba commit 17ac67a

2 files changed

Lines changed: 24 additions & 3 deletions

File tree

.jules/sentinel.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,13 @@
1919
**Prevention:**
2020
1. Always convert `pathlib.Path` objects to absolute strings using `str(path.absolute())` before passing them as arguments to `subprocess.run`.
2121
2. Absolute paths always begin with a directory separator (`/` on Unix) or a drive letter (`C:\` on Windows), guaranteeing the command-line tool parses them as file paths rather than flags or options.
22+
## 2024-05-30 - SQL Injection via Dictionary Keys in Dynamic Queries
23+
24+
**Vulnerability:** The `save_file_metadata` function in `metadata_generator.py` accepted an untrusted dictionary of metadata and dynamically constructed an `INSERT` statement using the dictionary's keys as column names (`column_names = ', '.join(columns)`). This allowed SQL injection if an attacker supplied a malformed key (e.g., `invalid_column) VALUES (?); DROP TABLE files; --`).
25+
26+
**Learning:** While parameterization (`?`) protects against SQL injection in values, it does not protect against injection in table or column names. When dynamically building queries where column names are derived from user input or external dictionaries, the keys must be strictly validated against an explicit schema allowlist.
27+
28+
**Prevention:**
29+
1. Never use untrusted input directly as column names or table names in SQL queries.
30+
2. Fetch valid columns dynamically using `PRAGMA table_info(table_name)` in SQLite (or hardcode an allowlist).
31+
3. Pre-filter dictionaries to only include keys that match the explicitly validated schema allowlist before building dynamic queries.

metadata_generator.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -463,13 +463,24 @@ def _analyze_multimedia(self, file_path: Path) -> Dict[str, Any]:
463463
return multimedia_data
464464

465465
def save_file_metadata(self, metadata: Dict[str, Any]) -> bool:
466-
"""Save file metadata to database"""
466+
"""Save file metadata to database safely, preventing SQL injection"""
467467

468468
try:
469469
with sqlite3.connect(self.db_path) as conn:
470+
# Validate column names to prevent SQL injection
471+
cursor = conn.execute("PRAGMA table_info(file_metadata)")
472+
valid_columns = {row[1] for row in cursor.fetchall()}
473+
474+
# Filter metadata to only include valid columns
475+
safe_metadata = {k: v for k, v in metadata.items() if k in valid_columns}
476+
477+
if not safe_metadata:
478+
print("Warning: No valid metadata fields to save")
479+
return False
480+
470481
# Convert to database format
471-
columns = list(metadata.keys())
472-
values = list(metadata.values())
482+
columns = list(safe_metadata.keys())
483+
values = list(safe_metadata.values())
473484
placeholders = ', '.join(['?' for _ in values])
474485
column_names = ', '.join(columns)
475486

0 commit comments

Comments
 (0)