thebearwithabite · thebearwithabite · Mar 31, 2026
diff --git a/.jules/sentinel.md b/.jules/sentinel.md
@@ -1,21 +1,4 @@
-## 2024-05-30 - Local File Inclusion and Argument Injection in File Reading API
-
-**Vulnerability:** The API endpoints `/api/open-file`, `/api/files/content`, and `/api/files/preview-text` accepted unvalidated absolute file paths directly from API requests, allowing Local File Inclusion (LFI) via paths like `/etc/passwd`. Additionally, the `/api/open-file` endpoint passed raw strings to `subprocess.run(['open', path])`, enabling argument injection if a filename started with `-`.
-
-**Learning:** When APIs expose raw file system operations (like reading or passing paths to system commands), depending on clients to send "valid" paths is insufficient. Python's `Path.resolve()` combined with `is_relative_to` provides a robust mechanism to evaluate the *final* destination of a path, neutralizing relative traversal (`../`). However, for endpoints supporting custom folders (like an organizer app), hardcoded whitelists can break functionality. Instead, limiting access to a broad safe boundary (like the user's home directory `Path.home()`) strikes a balance. Furthermore, treating strings as arguments requires strict validation; resolving local paths to absolute strings inherently prefixes them with root (`/`) or drive letters, naturally neutralizing argument injection (`-rf`).
-
-**Prevention:**
-1. Always resolve paths to absolute destinations using `.resolve()` before operating on them.
-2. Verify path containment within allowed boundaries using `is_relative_to` (or `security_utils.validate_path_within_base`).
-3. For endpoints invoking system commands with user-provided paths, ensure paths are absolute to prevent them from being parsed as options (flags starting with `-`), or explicitly block paths where `.name.startswith('-')`.
-4. Special care must be given to URL support to prevent bypasses like `file:///etc/passwd` when filtering `http`/`https`.
-
-## 2024-05-30 - Argument Injection Vulnerability in Video Processing Tools
-
-**Vulnerability:** The `vision_content_extractor.py` module passed unvalidated string file paths directly to `subprocess.run` calls for `ffprobe` and `ffmpeg` when preparing video samples. If an attacker controlled the filename, they could name a file starting with `-` (e.g., `-someflag`), leading to argument injection where the command-line tool interprets the filename as an option.
-
-**Learning:** When invoking external command-line tools (like `ffmpeg` or `ffprobe`) using `subprocess.run` with user-controlled file paths, using `str(file_path)` is insufficient to prevent argument injection. If a path string happens to be a relative filename like `-v`, it can alter the tool's behavior, potentially leading to unauthorized operations or command execution depending on the tool's supported flags.
-
-**Prevention:**
-1. Always convert `pathlib.Path` objects to absolute strings using `str(path.absolute())` before passing them as arguments to `subprocess.run`.
-2. Absolute paths always begin with a directory separator (`/` on Unix) or a drive letter (`C:\` on Windows), guaranteeing the command-line tool parses them as file paths rather than flags or options.
+## 2024-05-18 - Prevent SQL Injection in Dynamic Column Names
+**Vulnerability:** The `save_file_metadata` method in `metadata_generator.py` dynamically constructed SQL `INSERT OR REPLACE INTO` statements using unvalidated dictionary keys (`metadata.keys()`) as column names. This exposed the application to SQL injection if an attacker could control the keys in the `metadata` dictionary, as standard `?` parameterization does not protect column identifiers.
+**Learning:** When constructing dynamic SQL queries that require variable column names, standard parameterization is insufficient. You must use an explicit schema allowlist to filter user-provided keys.
+**Prevention:** Fetch the valid column names directly from the database schema (e.g., using `PRAGMA table_info(table_name)`) to create a strict allowlist. Filter the incoming dictionary keys against this allowlist before constructing the SQL query string.
diff --git a/metadata_generator.py b/metadata_generator.py
@@ -467,9 +467,19 @@ def save_file_metadata(self, metadata: Dict[str, Any]) -> bool:
 
         try:
             with sqlite3.connect(self.db_path) as conn:
+                # Get valid columns to prevent SQL injection
+                cursor = conn.execute("PRAGMA table_info(file_metadata)")
+                valid_columns = {row[1] for row in cursor.fetchall()}
+
+                # Filter metadata to only include valid columns
+                filtered_metadata = {k: v for k, v in metadata.items() if k in valid_columns}
+
+                if not filtered_metadata:
+                    return False
+
                 # Convert to database format
-                columns = list(metadata.keys())
-                values = list(metadata.values())
+                columns = list(filtered_metadata.keys())
+                values = list(filtered_metadata.values())
                 placeholders = ', '.join(['?' for _ in values])
                 column_names = ', '.join(columns)