Conversation
…schema fix Prevents path traversal attacks on get_content/get_chunk by validating file paths against indexed directories. Bumps MCP SDK to 1.27.1 for security fixes, adds tool annotations, changes index tool input from comma-separated string to proper array, fixes directory prefix collision in SQL queries, and enables SQLite WAL mode.
…recovery hints Adds SIGTERM/SIGINT handlers to close SQLite connections cleanly, per-directory indexing mutex to prevent concurrent indexing of the same path, structured JSON logging to stderr, MCP logging capability for client notifications, and descriptive error messages with recovery guidance.
…NC path validation Adds a delete_index tool (gated by DISABLE_DESTRUCTIVE env var) to remove directory indexes, normalizes CRLF line endings before chunking for consistent results, skips non-UTF-8 files during indexing with a warning, and validates Windows UNC path format in path validation.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds structured JSON logging, per-directory indexing mutexes, multi-directory index API, an index-deletion tool, path-validation utilities, storage lifecycle and directory-management APIs, content normalization/UTF-8 guards, .gitignore updates, a dependency bump, and extensive tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as MCP Client
participant Server as MCP Server
participant Handlers as MCP Handlers
participant Storage as SQLiteStorage
participant Qdrant as Qdrant Search
Client->>Server: delete_index(directory_path)
Server->>Handlers: handleDeleteIndexTool({directory_path}, config)
Handlers->>Storage: getDirectory(directory_path)
alt Directory exists
Storage-->>Handlers: directory record
Handlers->>Storage: deleteFilesByDirectory(directory_path)
Storage-->>Handlers: deleted files count
Handlers->>Storage: deleteDirectory(directory_path)
Storage-->>Handlers: ✓
Handlers->>Qdrant: delete points for directory
Qdrant-->>Handlers: ✓
Handlers->>Handlers: refreshIndexedDirsCache()
Handlers->>Server: sendLoggingMessage(deletion summary)
Server-->>Client: Tool result (summary)
else Directory not indexed
Storage-->>Handlers: null
Handlers-->>Server: Error response
Server-->>Client: Error: Not indexed
end
sequenceDiagram
participant Client1 as Client A
participant Client2 as Client B
participant Server as MCP Server
participant Mutex as Indexing Mutex
participant Handlers as MCP Handlers
participant Indexer as Indexing Engine
Client1->>Server: index({directory_paths: ['/path']})
Client2->>Server: index({directory_paths: ['/path']})
Server->>Mutex: Acquire lock for /path
Mutex-->>Server: Locked (Client A)
Server->>Handlers: handleIndexTool (Client A)
Handlers->>Indexer: perform indexing (normalize, chunk, embed)
Note over Handlers: Client B waits for lock
Indexer-->>Handlers: Indexing complete
Handlers->>Mutex: Release lock
Mutex-->>Server: Available
Server->>Client1: Tool result
Server->>Mutex: Acquire lock for /path
Mutex-->>Server: Locked (Client B)
Server->>Handlers: handleIndexTool (Client B)
Handlers->>Indexer: perform indexing
Indexer-->>Handlers: Indexing complete
Handlers->>Mutex: Release lock
Server->>Client2: Tool result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Update transitive deps (hono, rollup, minimatch, ajv, flatted) to resolve 6 security advisories.
d223b05 to
a3944c7
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
src/indexing.ts (1)
255-263: Replace fs.readFile with TextDecoder to properly detect invalid UTF-8.The current approach at line 258 checks for
\uFFFDafter decoding with Node's 'utf-8' option, which silently replaces invalid byte sequences with that character. This causes false positives: any legitimate file containing U+FFFD gets incorrectly skipped. UseTextDecoderwith{ fatal: true }to throw on actual invalid byte sequences instead.Suggested fix
- const rawContent = await fs.readFile(file.path, 'utf-8'); - - // Skip non-UTF-8 files (Node.js inserts U+FFFD for invalid byte sequences) - if (rawContent.includes('\uFFFD')) { + const rawBytes = await fs.readFile(file.path); + let rawContent: string; + try { + rawContent = new TextDecoder('utf-8', { fatal: true }).decode(rawBytes); + } catch { log('warning', 'Skipping non-UTF-8 file', { path: file.path }); await sqlite.upsertFile(file, [], ['Skipped: file appears to be non-UTF-8 encoded']); skipped++; continue; - } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/indexing.ts` around lines 255 - 263, The code currently reads files using fs.readFile(..., 'utf-8') and checks rawContent for '\uFFFD', which misclassifies legitimate files containing U+FFFD as invalid; replace the reading step to read raw bytes (fs.readFile without encoding or Buffer) and decode with new TextDecoder('utf-8', { fatal: true }) so invalid UTF-8 throws instead of being replaced, catch that specific decode error to run log('warning', ...) and sqlite.upsertFile(file, [], ['Skipped: file appears to be non-UTF-8 encoded']) and only increment skipped/continue on decode failures, otherwise use the decoded string as rawContent for downstream processing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/mcp-handlers.ts`:
- Around line 348-363: The current loop that calls
qdrant.deletePointsByFilePath(file.path) suppresses failures (logging warnings)
and then proceeds to call sqlite.deleteFilesByDirectory(dirPath) and
sqlite.deleteDirectory(dirPath), which can leave orphaned vectors; change this
so that in the code that iterates over files (the files const and the for..of
loop calling qdrant.deletePointsByFilePath) you collect any errors (or
short-circuit on first error), and if any qdrant deletion fails return or throw
an error/partial-failure result instead of proceeding to
sqlite.deleteFilesByDirectory and sqlite.deleteDirectory; only perform
sqlite.deleteFilesByDirectory(dirPath) and sqlite.deleteDirectory(dirPath) when
all qdrant.deletePointsByFilePath calls succeeded (or include clear
partial-failure reporting when some succeeded and some failed).
- Around line 121-140: The per-directory lock uses un-normalized keys
(args.directory_paths.map(p => p.trim())) and handleDeleteIndexTool doesn't use
the same lock, allowing races and bypass via trailing slashes, symlinks, or case
differences; fix by canonicalizing each directory path before using it as a
mutex key (e.g., call a shared normalize function that does
path.resolve/path.normalize, fs.realpathSync or async realpath to follow
symlinks, strip trailing slashes, and on Windows apply .toLowerCase()) and
replace usages of args.directory_paths/map trim with that normalizer when
creating indexingPromise and setting/getting indexingMutex (symbols:
indexingMutex, indexingPromise, resolveIndexing, args.directory_paths). Also
update handleDeleteIndexTool to normalize its input path and to acquire/await
the same per-directory mutex (or set a delete-specific mutex entry) before
performing deletions so deletes serialize with in-flight indexing; ensure both
index and delete code always set and clear the mutex (resolveIndexing) in
finally blocks to avoid stale locks.
- Around line 39-47: ensureIndexedDirsCache currently calls initializeStorage
which always initializes Qdrant; change it so the cache population uses
SQLite-only initialization and mark the cache as initialized even if empty: add
a new SQLite-only initializer (or add an option to initializeStorage, e.g.,
initializeStorage(config, { skipVectorStore: true })) and call that from
ensureIndexedDirsCache, then call refreshIndexedDirsCache(sqlite) and set a
boolean flag (e.g., indexedDirsCacheInitialized = true) after refresh so
indexedDirsCache.size === 0 no longer triggers repeated startup work; reference
ensureIndexedDirsCache, initializeStorage (or new initializeSqliteOnly),
refreshIndexedDirsCache, and indexedDirsCache in your change.
In `@src/mcp.ts`:
- Around line 341-344: The DISABLE_DESTRUCTIVE flag is only preventing
DELETE_INDEX_TOOL from being listed but not from being executed; update the
CallTool execution path to enforce the same flag check so destructive tools
cannot be invoked by name. In the CallTool handler (the function that processes
tool calls), add a guard that checks process.env.DISABLE_DESTRUCTIVE === 'true'
and rejects/throws when the requested tool matches DELETE_INDEX_TOOL (or any
tool with a destructive marker), or introduce an isDestructive(tool) helper and
use it to block execution; keep the existing ListTools logic but ensure both
listing and execution use the same guard so DELETE_INDEX_TOOL cannot be called
when destructive operations are disabled.
In `@src/storage.ts`:
- Around line 442-443: Queries that use a hardcoded '/' in the LIKE pattern must
be made platform-agnostic and must escape SQL LIKE wildcards; update any
occurrences in getFilesByDirectory(), deleteFilesByDirectory(), and the
status/count queries that prepare statements like this.db.prepare('SELECT * FROM
files WHERE path = ? OR path LIKE ?') to: normalize the input directory (e.g.,
path.normalize(directoryPath)), escape '%' and '_' (and the escape char) in that
normalized directory (e.g., dir.replace(/([%_\\])/g, '\\$1')), then build two
LIKE patterns using both path.sep and its alternate separator (e.g.,
`${escapedDir}${sep}%` and `${escapedDir}${altSep}%`) so both "C:\\a\\b\\%" and
"C:/a/b/%" match cross-platform, pass those as parameters to the prepared
statement, and add an ESCAPE '\\' clause or use parameter escaping consistent
with your DB driver so the escaped wildcards are treated literally.
In `@tests/mcp-handlers.test.ts`:
- Around line 435-459: The test's mock of indexDirectories and current
assertions don't guarantee serialization; update the test that uses
handleIndexTool and indexDirectories to assert ordering by inspecting callOrder:
after running the two concurrent calls, verify that the first "start:/same"
occurs before the first "end:/same" and that the second "start:/same" occurs
after that first "end:/same" (e.g., check indices of "start:/same" and
"end:/same" entries or assert the full sequence is start,end,start,end) so the
test fails if the second index starts before the first finishes.
In `@tests/storage.unit.test.ts`:
- Around line 89-90: The test hard-codes a Unix /tmp path (tmpPath) before
assigning it to config.storage.sqlitePath, which fails on Windows; change the
test to create a platform-safe temporary file path using the OS temp directory
(e.g., Node's os.tmpdir() or a temp-file helper) and assign that generated path
to config.storage.sqlitePath so the test runs cross-platform (locate the tmpPath
assignment and update how tmpPath is constructed in the test).
- Around line 117-118: The test relies on insertion order but getDirectories()
(storage.getDirectories) does not guarantee ORDER BY, so make the assertion
order-independent: either sort the returned array (const dirs = (await
storage.getDirectories()).sort()) and compare to a sorted expected array, or use
expect(dirs).toEqual(expect.arrayContaining(['/dir1','/dir2'])) plus a length
check to ensure no extra entries; update the test accordingly where
storage.getDirectories() is called.
---
Nitpick comments:
In `@src/indexing.ts`:
- Around line 255-263: The code currently reads files using fs.readFile(...,
'utf-8') and checks rawContent for '\uFFFD', which misclassifies legitimate
files containing U+FFFD as invalid; replace the reading step to read raw bytes
(fs.readFile without encoding or Buffer) and decode with new
TextDecoder('utf-8', { fatal: true }) so invalid UTF-8 throws instead of being
replaced, catch that specific decode error to run log('warning', ...) and
sqlite.upsertFile(file, [], ['Skipped: file appears to be non-UTF-8 encoded'])
and only increment skipped/continue on decode failures, otherwise use the
decoded string as rawContent for downstream processing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9e2b1754-f5ca-4c3a-b710-a6841a1a57d9
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (16)
.gitignore_docs/mcp-modernization/1-security-and-correctness/pr.md_docs/mcp-modernization/2-operational-hardening/pr.md_docs/mcp-modernization/3-features-and-cross-platform/pr.md_docs/mcp-modernization/plan.mdpackage.jsonsrc/indexing.tssrc/logger.tssrc/mcp-handlers.tssrc/mcp.tssrc/path-validation.tssrc/storage.tstests/delete-index.unit.test.tstests/mcp-handlers.test.tstests/path-validation.unit.test.tstests/storage.unit.test.ts
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #22 +/- ##
==========================================
- Coverage 81.26% 81.06% -0.21%
==========================================
Files 13 15 +2
Lines 2306 2672 +366
Branches 332 519 +187
==========================================
+ Hits 1874 2166 +292
- Misses 432 506 +74 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/path-validation.unit.test.ts (1)
165-170: Redundant assertion.Line 167 asserts
dirs.sizeequals 2, then line 170 assertsdirs.sizeis greater than or equal to 2. The second assertion is redundant and weaker.Proposed fix
expect(dirs.size).toBe(2); // The resolved paths should contain the original paths (resolved) // Since these directories don't exist, they fall back to path.resolve() - expect(dirs.size).toBeGreaterThanOrEqual(2);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/path-validation.unit.test.ts` around lines 165 - 170, The test contains a redundant assertion: after calling resolveIndexedDirectories(storage), the test first asserts expect(dirs.size).toBe(2) and then redundantly asserts expect(dirs.size).toBeGreaterThanOrEqual(2); remove the weaker second assertion so only the precise check remains. Locate the test using resolveIndexedDirectories and the variable dirs in tests/path-validation.unit.test.ts and delete the expect(dirs.size).toBeGreaterThanOrEqual(2) line.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/path-validation.unit.test.ts`:
- Around line 165-170: The test contains a redundant assertion: after calling
resolveIndexedDirectories(storage), the test first asserts
expect(dirs.size).toBe(2) and then redundantly asserts
expect(dirs.size).toBeGreaterThanOrEqual(2); remove the weaker second assertion
so only the precise check remains. Locate the test using
resolveIndexedDirectories and the variable dirs in
tests/path-validation.unit.test.ts and delete the
expect(dirs.size).toBeGreaterThanOrEqual(2) line.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ae261f14-b24b-42ad-8fe4-3855f8f56266
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (12)
.gitignorepackage.jsonsrc/indexing.tssrc/logger.tssrc/mcp-handlers.tssrc/mcp.tssrc/path-validation.tssrc/storage.tstests/delete-index.unit.test.tstests/mcp-handlers.test.tstests/path-validation.unit.test.tstests/storage.unit.test.ts
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/storage.unit.test.ts
- src/path-validation.ts
- src/logger.ts
- .gitignore
- tests/delete-index.unit.test.ts
- Guard delete_index at CallTool time when DISABLE_DESTRUCTIVE is set - Use SQLite-only init for path validation cache (no Qdrant dependency) - Add initialized flag so empty cache doesn't re-trigger init - Normalize mutex keys (resolve, realpath, strip trailing sep) - Acquire per-directory mutex in handleDeleteIndexTool - Abort delete when vector cleanup fails to prevent orphaned vectors
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/pre-push`:
- Line 6: The pre-push script's echo ("Linting and type checking...") is
misleading because the script only runs npm run lint; either update the message
to reflect linting only or add a typecheck step. Edit the scripts/pre-push file
to either change the echo to e.g. "Running lint..." (or similar) to match the
existing npm run lint invocation, or append a type check command (e.g. run your
project’s typecheck script such as npm run typecheck or tsc --noEmit) after npm
run lint so the log accurately reflects both actions; update the echo
accordingly if you add the typecheck.
- LIKE queries now match both / and \ separators with proper escaping - Replace hard-coded /tmp with os.tmpdir() in tests - Sort getDirectories() assertion to avoid non-deterministic order - Assert sequential execution order in mutex serialization test
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/storage.ts (1)
906-907:⚠️ Potential issue | 🟡 MinorEscape
%and_in directory statusLIKEpredicates.On Line 906 and Line 907, the status subqueries only escape backslashes. Directory paths containing
%or_will be treated as wildcards, causing incorrectfiles_count/chunks_count.Suggested fix
- (SELECT COUNT(*) FROM files f WHERE f.path = d.path OR f.path LIKE REPLACE(d.path, '\\', '\\\\') || '/%' ESCAPE '\\' OR f.path LIKE REPLACE(d.path, '\\', '\\\\') || '\\%' ESCAPE '\\') as files_count, - (SELECT COALESCE(SUM(json_array_length(f.chunks_json)), 0) FROM files f WHERE (f.path = d.path OR f.path LIKE REPLACE(d.path, '\\', '\\\\') || '/%' ESCAPE '\\' OR f.path LIKE REPLACE(d.path, '\\', '\\\\') || '\\%' ESCAPE '\\') AND f.chunks_json IS NOT NULL) as chunks_count + (SELECT COUNT(*) FROM files f WHERE f.path = d.path OR f.path LIKE REPLACE(REPLACE(REPLACE(d.path, '\\', '\\\\'), '%', '\\%'), '_', '\\_') || '/%' ESCAPE '\\' OR f.path LIKE REPLACE(REPLACE(REPLACE(d.path, '\\', '\\\\'), '%', '\\%'), '_', '\\_') || '\\%' ESCAPE '\\') as files_count, + (SELECT COALESCE(SUM(json_array_length(f.chunks_json)), 0) FROM files f WHERE (f.path = d.path OR f.path LIKE REPLACE(REPLACE(REPLACE(d.path, '\\', '\\\\'), '%', '\\%'), '_', '\\_') || '/%' ESCAPE '\\' OR f.path LIKE REPLACE(REPLACE(REPLACE(d.path, '\\', '\\\\'), '%', '\\%'), '_', '\\_') || '\\%' ESCAPE '\\') AND f.chunks_json IS NOT NULL) as chunks_count#!/bin/bash set -euo pipefail # Verify current status query only escapes backslashes in LIKE branch nl -ba src/storage.ts | sed -n '901,909p' rg -n "REPLACE\\(d\\.path, '\\\\\\\\', '\\\\\\\\\\\\\\\\'\\).*LIKE" src/storage.ts -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/storage.ts` around lines 906 - 907, The LIKE predicates in the files_count and chunks_count subqueries treat % and _ in directory names as wildcards because only backslashes are escaped; update the REPLACE chain for d.path used in both subqueries (the expressions used to build the LIKE patterns in files_count and chunks_count) to also escape '%' and '_' (e.g., REPLACE(REPLACE(REPLACE(d.path, '\\', '\\\\'), '%', '\\%'), '_', '\\_')) so the pattern treats literal percent/underscore characters, and keep the existing ESCAPE '\' clause; make the same change for both places where REPLACE(d.path, '\\', '\\\\') is currently used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/mcp-handlers.test.ts`:
- Around line 297-305: Add an assertion that validation runs before file-read by
comparing mock invocation order: after calling handleGetContentTool, grab the
mocked functions validatePathWithinIndexedDirs and getFileContent (and for the
chunk test getChunkContent) and assert their invocation order using their
mock.invocationCallOrder (e.g.
expect(validatePathWithinIndexedDirs.mock.invocationCallOrder[0]).toBeLessThan(getFileContent.mock.invocationCallOrder[0]));
update both tests that currently only check toHaveBeenCalled to also perform
this ordering assertion referencing validatePathWithinIndexedDirs,
getFileContent, and getChunkContent as appropriate.
---
Duplicate comments:
In `@src/storage.ts`:
- Around line 906-907: The LIKE predicates in the files_count and chunks_count
subqueries treat % and _ in directory names as wildcards because only
backslashes are escaped; update the REPLACE chain for d.path used in both
subqueries (the expressions used to build the LIKE patterns in files_count and
chunks_count) to also escape '%' and '_' (e.g., REPLACE(REPLACE(REPLACE(d.path,
'\\', '\\\\'), '%', '\\%'), '_', '\\_')) so the pattern treats literal
percent/underscore characters, and keep the existing ESCAPE '\' clause; make the
same change for both places where REPLACE(d.path, '\\', '\\\\') is currently
used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 73da9e2b-f775-401c-b48e-9849c36351d0
📒 Files selected for processing (3)
src/storage.tstests/mcp-handlers.test.tstests/storage.unit.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/storage.unit.test.ts
curl --max-time kills active downloads because it counts wall-clock time, not idle time. The streaming API keeps the connection open the entire download, so --max-time always fires before completion. ollama pull CLI handles retries and stall detection internally.
Integration tests pass locally at 30s against ollama:latest, but CI runners (2-core, shared resources) need more headroom for cold model loading on first embedding call.
The MCP SDK has evolved considerably. This bumps up the sdk version in use to benefit from improvements to the SDK. Also improves cross platform supprot
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests