Skip to content

Commit d07bcbb

Browse files
🛡️ Sentinel: [CRITICAL] Fix SQL injection in metadata_generator
🚨 Severity: CRITICAL 💡 Vulnerability: SQL injection vulnerability in `metadata_generator.py` where dictionary keys from untrusted input were used directly to construct an `INSERT OR REPLACE` query. 🎯 Impact: Attackers could inject arbitrary SQL commands by passing malicious dictionary keys. 🔧 Fix: Fetched valid column names using `PRAGMA table_info(file_metadata)` and filtered the metadata keys against this allowlist before building the query. ✅ Verification: Verified syntax, ran test suite, and manually verified db insert. Co-authored-by: thebearwithabite <216692431+thebearwithabite@users.noreply.github.com>
1 parent 613e4ba commit d07bcbb

4 files changed

Lines changed: 59 additions & 3 deletions

File tree

.jules/sentinel.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,7 @@
1919
**Prevention:**
2020
1. Always convert `pathlib.Path` objects to absolute strings using `str(path.absolute())` before passing them as arguments to `subprocess.run`.
2121
2. Absolute paths always begin with a directory separator (`/` on Unix) or a drive letter (`C:\` on Windows), guaranteeing the command-line tool parses them as file paths rather than flags or options.
22+
## 2024-05-30 - Fix SQL Injection in dynamic column updates
23+
**Vulnerability:** SQL injection vulnerability in `metadata_generator.py` where dictionary keys from untrusted input were used directly to construct an `INSERT OR REPLACE` query.
24+
**Learning:** Parameterization (`?`) only protects values, not column names. When dynamically building queries from dictionaries, the keys must be validated against a strict schema allowlist.
25+
**Prevention:** Always use `PRAGMA table_info(table_name)` or a hardcoded list of allowed column names to validate keys before building dynamic queries.

metadata_generator.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -467,9 +467,18 @@ def save_file_metadata(self, metadata: Dict[str, Any]) -> bool:
467467

468468
try:
469469
with sqlite3.connect(self.db_path) as conn:
470-
# Convert to database format
471-
columns = list(metadata.keys())
472-
values = list(metadata.values())
470+
# Fetch valid column names to prevent SQL injection
471+
cursor = conn.execute("PRAGMA table_info(file_metadata)")
472+
valid_columns = {row[1] for row in cursor.fetchall()}
473+
474+
# Filter metadata to only include valid columns
475+
safe_metadata = {k: v for k, v in metadata.items() if k in valid_columns}
476+
477+
if not safe_metadata:
478+
return False
479+
480+
columns = list(safe_metadata.keys())
481+
values = list(safe_metadata.values())
473482
placeholders = ', '.join(['?' for _ in values])
474483
column_names = ', '.join(columns)
475484

test_mock.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
import sqlite3
2+
3+
def run_test():
4+
with sqlite3.connect('/tmp/metadata.db') as conn:
5+
cursor = conn.execute("SELECT name FROM sqlite_master WHERE type='table';")
6+
print(cursor.fetchall())
7+
run_test()

test_wrap.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
import sys
2+
from pathlib import Path
3+
from unittest.mock import MagicMock
4+
5+
# Mock dependencies
6+
sys.modules['pandas'] = MagicMock()
7+
sys.modules['google'] = MagicMock()
8+
sys.modules['google.generativeai'] = MagicMock()
9+
10+
# Mock gdrive_integration properly
11+
gdrive_mock = MagicMock()
12+
gdrive_mock.ensure_safe_local_path = lambda x: Path(x)
13+
gdrive_mock.get_ai_organizer_root = lambda: Path('/tmp')
14+
gdrive_mock.get_metadata_root = lambda: Path('/tmp')
15+
sys.modules['gdrive_integration'] = gdrive_mock
16+
17+
# Mock path_manager
18+
pm_mock = MagicMock()
19+
pm_mock.get_path = lambda *args, **kwargs: Path('/tmp/metadata.db')
20+
sys.modules['path_manager'] = pm_mock
21+
22+
import test_metadata
23+
import metadata_generator
24+
import sqlite3
25+
26+
# Initialize generator to ensure DB is created
27+
generator = metadata_generator.MetadataGenerator()
28+
generator._init_tracking_db()
29+
30+
# Provide metadata manually since the mocked dependencies might not generate it
31+
test_data = {"file_path": "/tmp/test.txt", "file_name": "test.txt"}
32+
generator.save_file_metadata(test_data)
33+
34+
# Check db
35+
with sqlite3.connect(generator.db_path) as conn:
36+
print(conn.execute("SELECT * FROM file_metadata;").fetchall())

0 commit comments

Comments
 (0)