Skip to content

Commit 3000255

Browse files
Fix potential SQL injection in metadata_generator.py
This commit adds a defense-in-depth enhancement to `_migrate_database_schema` in `metadata_generator.py`. The `ALTER TABLE` DDL statement uses an f-string to insert column names. By validating the column name with `str.isidentifier()`, we ensure that it is a safe SQL column name, preventing SQL injection if the column configuration is ever exposed to user input. Co-authored-by: thebearwithabite <216692431+thebearwithabite@users.noreply.github.com>
1 parent 613e4ba commit 3000255

2 files changed

Lines changed: 18 additions & 0 deletions

File tree

.jules/sentinel.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,19 @@
1919
**Prevention:**
2020
1. Always convert `pathlib.Path` objects to absolute strings using `str(path.absolute())` before passing them as arguments to `subprocess.run`.
2121
2. Absolute paths always begin with a directory separator (`/` on Unix) or a drive letter (`C:\` on Windows), guaranteeing the command-line tool parses them as file paths rather than flags or options.
22+
23+
## 2024-05-30 - Widespread Argument Injection Vulnerability in File Operations
24+
25+
**Vulnerability:** Several modules (`system_storage_cleanup.py`, `dev-archive/emergency_bulk_staging.py`, `dev-archive/repair_mislabeled_jsons.py`) passed unvalidated `pathlib.Path` strings directly to `subprocess.run` calls (e.g., for `du`, `rm`, `file`). If an attacker controlled the filename, they could name a file starting with `-` (e.g., `-rf`), leading to argument injection.
26+
27+
**Learning:** The argument injection vulnerability found in `vision_content_extractor.py` and `main.py` is a widespread pattern in this codebase. Any invocation of external command-line tools using `subprocess.run` with user-controlled file paths must use absolute paths to prevent the tool from interpreting the filename as an option.
28+
29+
**Prevention:** Always convert `pathlib.Path` objects to absolute strings using `str(path.absolute())` before passing them as arguments to `subprocess.run`.
30+
31+
## 2024-05-30 - SQL Injection via unvalidated dynamic column names
32+
33+
**Vulnerability:** The `_migrate_database_schema` function in `metadata_generator.py` uses an f-string to construct an `ALTER TABLE` query. The variables are hardcoded keys in a dictionary, however if the keys in the dictionary are user-controlled in the future, it is an avenue for SQL injection.
34+
35+
**Learning:** When dynamically generating SQL `ALTER TABLE` statements to add columns, standard `?` parameterization does not protect column keys. You must validate the column name to prevent SQL injection.
36+
37+
**Prevention:** Use Python's `str.isidentifier()` to validate column names and prevent SQL injection.

metadata_generator.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,8 @@ def _migrate_database_schema(self, conn):
231231
migration_count = 0
232232
for col_name, col_type in gdrive_columns.items():
233233
if col_name not in existing_columns:
234+
if not col_name.isidentifier():
235+
raise ValueError(f"Invalid column name: {col_name}")
234236
conn.execute(f"ALTER TABLE file_metadata ADD COLUMN {col_name} {col_type}")
235237
migration_count += 1
236238
print(f" ✅ Added column: {col_name}")

0 commit comments

Comments
 (0)