Skip to content

Commit 32ddbaf

Browse files
committed
Document MCP Server tools
1 parent 594e09a commit 32ddbaf

File tree

3 files changed

+150
-17
lines changed

3 files changed

+150
-17
lines changed

src/indexing.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ export async function indexDirectories(paths: string[], config: Config): Promise
186186
if (config.verbose) {
187187
console.log(`Found ${files.length} files to process in ${path}`);
188188
}
189-
} catch (error) {
189+
} catch {
190190
// Continue with other directories even if one fails to scan
191191
}
192192
}

src/mcp.ts

Lines changed: 148 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -54,17 +54,41 @@ Performance note: Initial indexing may take time for large directories, but subs
5454
},
5555
{
5656
name: 'search',
57-
description: 'Search indexed content semantically',
57+
description: `Perform semantic search across indexed files using natural language queries. This tool uses vector similarity to find the most relevant content, going beyond simple keyword matching to understand intent and context.
58+
59+
When to use this tool:
60+
- Finding code examples, functions, or patterns ("error handling in Python", "JWT authentication implementation")
61+
- Locating documentation or explanations ("how to configure Redis", "API rate limiting guide")
62+
- Discovering similar functionality across files ("database connection patterns", "logging utilities")
63+
- Research and exploration of codebases ("machine learning models", "test utilities")
64+
- Finding files related to specific features or topics
65+
66+
How semantic search works:
67+
- Searches by meaning and context, not just exact keywords
68+
- Finds conceptually related content even with different terminology
69+
- Returns files ranked by relevance with similarity scores
70+
- Groups results by file to avoid duplicates from multiple matching sections
71+
72+
Response format:
73+
- Returns lightweight metadata including file paths, relevance scores, and chunk IDs
74+
- Use 'get_chunk' or 'get_content' tools to fetch actual content from search results
75+
- Chunks are sorted by relevance score within each file
76+
- Average similarity score calculated across all matching chunks per file
77+
78+
Example queries:
79+
- "error handling patterns" (finds try/catch, error classes, logging)
80+
- "database migration scripts" (finds SQL, schema changes, migration files)
81+
- "authentication middleware" (finds auth logic, JWT handling, middleware functions)`,
5882
inputSchema: {
5983
type: 'object',
6084
properties: {
6185
query: {
6286
type: 'string',
63-
description: 'Search query'
87+
description: 'Natural language search query describing what you are looking for. Can be concepts, functionality, or specific technical terms.'
6488
},
6589
limit: {
6690
type: 'number',
67-
description: 'Maximum number of results (default: 10)',
91+
description: 'Maximum number of files to return (default: 10). Each file may contain multiple matching chunks.',
6892
default: 10
6993
}
7094
},
@@ -73,17 +97,40 @@ Performance note: Initial indexing may take time for large directories, but subs
7397
},
7498
{
7599
name: 'similar_files',
76-
description: 'Find files similar to a given file',
100+
description: `Find files that are semantically similar to a given reference file. This tool analyzes the content and context of a file to discover other files with related functionality, similar patterns, or comparable content.
101+
102+
When to use this tool:
103+
- Discovering related implementations across a codebase ("find files similar to this authentication module")
104+
- Locating alternative approaches or patterns ("find other components like this React component")
105+
- Finding documentation or examples related to a specific file
106+
- Identifying code duplication or similar functionality that could be refactored
107+
- Exploring unfamiliar codebases by finding files similar to known examples
108+
- Locating test files, configuration files, or documentation related to a source file
109+
110+
How similarity detection works:
111+
- Analyzes the semantic content of the reference file
112+
- Compares against all indexed files using vector similarity
113+
- Considers code patterns, function signatures, imports, and documentation
114+
- Returns files ranked by content similarity, not just filename or location similarity
115+
- Works across different file types and programming languages
116+
117+
Use cases:
118+
- Code analysis: "Find files similar to this database model to understand the schema patterns"
119+
- Learning: "Show me other API controllers similar to this one"
120+
- Maintenance: "Find files with similar error handling patterns"
121+
- Architecture: "Locate other services that follow this microservice pattern"
122+
123+
Note: The reference file must be indexed for this tool to work. If the file is not found in the index, an error will be returned.`,
77124
inputSchema: {
78125
type: 'object',
79126
properties: {
80127
file_path: {
81128
type: 'string',
82-
description: 'Path to the file to find similar files for'
129+
description: 'Absolute or relative path to the reference file. This file must have been previously indexed.'
83130
},
84131
limit: {
85132
type: 'number',
86-
description: 'Maximum number of results (default: 10)',
133+
description: 'Maximum number of similar files to return (default: 10). Results are sorted by similarity score.',
87134
default: 10
88135
}
89136
},
@@ -92,46 +139,133 @@ Performance note: Initial indexing may take time for large directories, but subs
92139
},
93140
{
94141
name: 'get_content',
95-
description: 'Get file content',
142+
description: `Retrieve the full content of a file or specific chunks within a file. This tool reads files directly from the filesystem and can optionally return only specific portions of indexed files.
143+
144+
When to use this tool:
145+
- After performing a search, to retrieve the actual content of relevant files
146+
- Reading complete files that were identified through semantic search
147+
- Extracting specific sections of large files using chunk ranges
148+
- Accessing source code, documentation, or configuration files for analysis
149+
- Following up on search results with detailed content examination
150+
151+
How chunk selection works:
152+
- If no chunks parameter is provided, returns the entire file content
153+
- Chunk ranges allow selective reading of large files (e.g., "2-5" returns chunks 2, 3, 4, and 5)
154+
- Single chunks can be specified (e.g., "3" returns only chunk 3)
155+
- Chunks are the same segments created during indexing for semantic search
156+
- Useful for large files where you only need specific sections identified by search
157+
158+
File access:
159+
- Reads files directly from the filesystem (not from the search index)
160+
- Works with any readable file, whether indexed or not
161+
- Supports all text-based file formats
162+
- Preserves original formatting and content exactly as stored
163+
164+
Workflow integration:
165+
1. Use 'search' to find relevant files and identify interesting chunk IDs
166+
2. Use 'get_content' to retrieve full file content or specific chunks
167+
3. Analyze the content to understand context and implementation details
168+
169+
Performance note: For large files, using chunk ranges can be more efficient than reading entire files.`,
96170
inputSchema: {
97171
type: 'object',
98172
properties: {
99173
file_path: {
100174
type: 'string',
101-
description: 'Path to the file to retrieve'
175+
description: 'Absolute or relative path to the file to retrieve. File must be readable and text-based.'
102176
},
103177
chunks: {
104178
type: 'string',
105-
description: 'Optional chunk range (e.g., "2-5")'
179+
description: 'Optional chunk range specification. Examples: "3" (single chunk), "2-5" (chunks 2 through 5), "1-3" (first three chunks). Only works for indexed files.'
106180
}
107181
},
108182
required: ['file_path']
109183
}
110184
},
111185
{
112186
name: 'get_chunk',
113-
description: 'Get content of a specific chunk by file path and chunk ID',
187+
description: `Retrieve the content of a specific chunk from an indexed file. This tool provides precise access to individual text segments that were identified during semantic search, allowing efficient retrieval of only the most relevant content.
188+
189+
When to use this tool:
190+
- After performing a 'search' operation, to fetch the actual content of specific chunks that matched your query
191+
- When you want to examine only the most relevant sections of a file rather than reading the entire file
192+
- For targeted content analysis where you need specific text segments identified by their chunk IDs
193+
- To build contextual responses using only the most semantically relevant portions of files
194+
- When working with large files and you only need particular sections
195+
196+
How chunks work:
197+
- Files are divided into overlapping text segments during indexing for better search granularity
198+
- Each chunk represents a coherent section of text (typically 512 characters with overlap)
199+
- Chunk IDs are sequential strings ("0", "1", "2", etc.) within each file
200+
- Search results include chunk IDs for the most relevant sections
201+
- This tool retrieves the exact content that was semantically matched
202+
203+
Typical workflow:
204+
1. Use 'search' to find files and get chunk IDs with high relevance scores
205+
2. Use 'get_chunk' to retrieve the specific content of the most relevant chunks
206+
3. Analyze or process only the most pertinent text segments
207+
208+
Efficiency benefits:
209+
- Avoids transferring unnecessary content from large files
210+
- Provides precise access to semantically relevant text
211+
- Reduces token usage by fetching only needed sections
212+
- Enables focused analysis on the most important content
213+
214+
Note: Both the file and the specific chunk must exist in the search index for this tool to work.`,
114215
inputSchema: {
115216
type: 'object',
116217
properties: {
117218
file_path: {
118219
type: 'string',
119-
description: 'Path to the file'
220+
description: 'Absolute or relative path to the indexed file containing the desired chunk.'
120221
},
121222
chunk_id: {
122223
type: 'string',
123-
description: 'ID of the chunk to retrieve'
224+
description: 'ID of the specific chunk to retrieve. This is typically obtained from search results and is a sequential string like "0", "1", "2", etc.'
124225
}
125226
},
126227
required: ['file_path', 'chunk_id']
127228
}
128229
},
129230
{
130231
name: 'server_info',
131-
description: 'Get server information and status',
232+
description: `Get comprehensive information about the directory indexer server status, configuration, and indexed content. This tool provides a complete overview of the current state of the semantic search system.
233+
234+
When to use this tool:
235+
- To check if the indexer is properly set up and operational
236+
- Before starting work to understand what content is already indexed
237+
- To verify indexing operations completed successfully
238+
- When debugging search issues or unexpected results
239+
- To get an overview of available content for semantic search
240+
- To check system health and identify any configuration problems
241+
242+
Information provided:
243+
- Server version and operational status
244+
- Total count of indexed directories, files, and searchable chunks
245+
- Database size and storage information
246+
- Most recent indexing timestamp
247+
- List of all indexed directories with individual statistics
248+
- File counts and chunk counts per directory
249+
- Indexing status for each directory (completed, failed, in progress)
250+
- Error reports and processing issues
251+
- System consistency checks between database components
252+
253+
Status indicators:
254+
- Operational status of vector database (Qdrant) connection
255+
- Embedding service availability
256+
- Data consistency between SQLite metadata and vector storage
257+
- Recent errors or warnings that may affect search quality
258+
259+
Use this tool to:
260+
- Verify setup before performing search operations
261+
- Understand the scope of available content
262+
- Troubleshoot search or indexing issues
263+
- Plan additional indexing operations
264+
- Monitor system health and performance`,
132265
inputSchema: {
133266
type: 'object',
134-
properties: {}
267+
properties: {},
268+
additionalProperties: false
135269
}
136270
}
137271
];

src/search.ts

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ export async function searchContent(query: string, options: SearchOptions = {}):
4747
const points = await qdrant.searchPoints(queryEmbedding, limit * 5);
4848

4949
// Group points by file path
50-
const fileGroups = new Map<string, Array<{ score: number; chunkId: string; content: string; parentDirectories: string[] }>>();
50+
const fileGroups = new Map<string, Array<{ score: number; chunkId: string; parentDirectories: string[] }>>();
5151

5252
for (const point of points) {
5353
const score = point.score ?? 0;
@@ -61,7 +61,6 @@ export async function searchContent(query: string, options: SearchOptions = {}):
6161
fileGroups.get(filePath)!.push({
6262
score,
6363
chunkId: point.payload.chunkId,
64-
content: point.payload.content || '',
6564
parentDirectories: point.payload.parentDirectories
6665
});
6766
}

0 commit comments

Comments
 (0)