Skip to content

Commit f24c550

Browse files
Peterclaude
andcommitted
fix: resolve 5 critical bugs in docsrs-mcp server
- Dependency graph analysis: Added store_crate_dependencies to persist Cargo.toml dependencies - Migration suggestions: Fixed complex JOIN with simplified crate_metadata query - Health monitoring: Added server_health and get_ingestion_status tools for MCP SDK mode - Tool naming: Removed duplicate camelCase tools (getDocumentationDetail, etc.) - Cross-references: Completed resolve_import with alternative path suggestions All fixes tested successfully with 5/5 tests passing. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 4ce59d7 commit f24c550

File tree

8 files changed

+714
-233
lines changed

8 files changed

+714
-233
lines changed

Architecture.md

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -87,15 +87,15 @@ graph LR
8787
subgraph "docsrs_mcp Package"
8888
subgraph "Service Layer"
8989
CRATE_SVC[crate_service.py<br/>CrateService class<br/>FIXED: search_examples method dictionary handling<br/>Proper mapping to CodeExample model requirements<br/>Search, documentation, versions<br/>Transport-agnostic business logic<br/>_build_module_tree() transformation method]
90-
INGEST_SVC[ingestion_service.py<br/>IngestionService class<br/>Pipeline management<br/>Pre-ingestion control<br/>Cargo file processing]
90+
INGEST_SVC[ingestion_service.py<br/>IngestionService class<br/>Pipeline management<br/>Pre-ingestion control<br/>Cargo file processing<br/>Enhanced with dependency relationship storage]
9191
TYPE_NAV_SVC[type_navigation_service.py<br/>TypeNavigationService class<br/>Code intelligence operations<br/>get_item_intelligence(), search_by_safety()<br/>get_error_catalog() methods]
92-
MCP_RUNNER[mcp_runner.py<br/>MCPServerRunner class<br/>Memory leak mitigation<br/>1000 calls/1GB restart<br/>Process health monitoring]
92+
MCP_RUNNER[mcp_runner.py<br/>MCPServerRunner class<br/>Memory leak mitigation<br/>1000 calls/1GB restart<br/>Process health monitoring<br/>Enhanced with comprehensive health probing]
9393
PARAM_VAL[parameter_validation.py<br/>String parameter utilities<br/>Type conversion functions<br/>Boolean/integer validation]
9494
VALIDATION[validation.py<br/>Centralized validation utilities<br/>Performance-optimized patterns<br/>MCP client compatibility]
9595
end
9696
9797
subgraph "MCP Implementations"
98-
OFFICIAL_SVR[mcp_sdk_server.py<br/>Official MCP SDK 1.13.1 - Default<br/>Native @server.tool() decorators<br/>Complete MCP resources support<br/>All 10 tools + resource handlers]
98+
OFFICIAL_SVR[mcp_sdk_server.py<br/>Official MCP SDK 1.13.1 - Default<br/>Native @server.tool() decorators<br/>Complete MCP resources support<br/>All tools + resource handlers<br/>Enhanced with server_health and get_ingestion_status tools]
9999
FASTMCP_SVR[fastmcp_server.py<br/>FastMCP 2.11.1 - Deprecated<br/>Schema override support<br/>Legacy compatibility layer]
100100
end
101101
@@ -130,7 +130,7 @@ graph LR
130130
SIG_EXTRACTOR[signature_extractor.py<br/>Metadata extraction (~365 LOC)<br/>Complete item extraction<br/>Macro extraction patterns<br/>Enhanced schema validation]
131131
INTELLIGENCE_EXTRACTOR[intelligence_extractor.py<br/>Code Intelligence Extraction<br/>Error types, safety info, feature requirements<br/>Pre-compiled regex patterns<br/>Session-based caching mechanism]
132132
CODE_EXAMPLES[code_examples.py<br/>Code example extraction (~343 LOC)<br/>FIXED: Character fragmentation bug at lines 234-242<br/>FIXED: Vector sync step for vec_example_embeddings<br/>Language detection via pygments<br/>30% confidence threshold<br/>Batch processing for embeddings sync<br/>JSON structure with metadata]
133-
STORAGE_MGR[storage_manager.py<br/>Batch embedding storage (~296 LOC)<br/>FIXED: NULL constraint protection for content field<br/>Enhanced robustness with explicit NULL checks<br/>Transaction management<br/>Streaming batch inserts<br/>Memory-aware chunking]
133+
STORAGE_MGR[storage_manager.py<br/>Batch embedding storage (~296 LOC)<br/>FIXED: NULL constraint protection for content field<br/>Enhanced robustness with explicit NULL checks<br/>Transaction management<br/>Streaming batch inserts<br/>Memory-aware chunking<br/>NEW: store_crate_dependencies function for dependency relationships]
134134
end
135135
136136
ING[ingest.py<br/>Backward compatibility layer<br/>Re-exports from modular components<br/>Maintains existing API surface]
@@ -309,7 +309,7 @@ The system now supports two parallel MCP implementations to ensure compatibility
309309
- **Critical Fix Applied**: Proper logging configuration to stderr prevents MCP tool failures
310310

311311
**Tool Migration Status**
312-
All 10 tools successfully migrated:
312+
All tools successfully migrated and cleaned up:
313313
1. `search_items` - Documentation search with embedding similarity
314314
2. `get_item_doc` - Individual item documentation retrieval
315315
3. `get_crate_summary` - Crate overview and metadata
@@ -320,6 +320,10 @@ All 10 tools successfully migrated:
320320
8. `get_popular_crates` - Popular crates listing
321321
9. `get_ingestion_stats` - Pipeline status monitoring
322322
10. `get_version_info` - Version-specific information
323+
11. `server_health` - Comprehensive health monitoring for MCP SDK mode
324+
12. `get_ingestion_status` - Detailed ingestion status reporting
325+
326+
**Tool Name Cleanup**: Removed duplicate camelCase tool names (`getDocumentationDetail`, `extractUsagePatterns`, `generateLearningPath`) in favor of consistent snake_case Python conventions following Python naming standards.
323327

324328
### MCP Resources Implementation
325329

@@ -681,6 +685,7 @@ src/docsrs_mcp/
681685
│ ├── connection.py # Database connection management, retry logic, performance utilities (~259 LOC)
682686
│ ├── schema.py # Database schema initialization and migrations (~542 LOC)
683687
│ ├── storage.py # Data insertion operations for crates, modules, re-exports (~155 LOC)
688+
│ │ # ENHANCED: store_crate_dependencies function for Cargo.toml dependency relationships
684689
│ ├── search.py # Vector search operations using sqlite-vec with caching (~504 LOC)
685690
│ ├── retrieval.py # Database retrieval operations and queries (~326 LOC)
686691
│ ├── ingestion.py # Ingestion status tracking and recovery support (~363 LOC)
@@ -1259,6 +1264,8 @@ sequenceDiagram
12591264
Worker->>Worker: Validate item paths with fallback generation
12601265
Worker->>Worker: Parse complete rustdoc structure
12611266
Worker->>Worker: Extract module hierarchy (build_module_hierarchy)
1267+
Worker->>Worker: Parse Cargo.toml for dependency relationships
1268+
Worker->>DB: Store dependency relationships to reexports table (link_type='dependency')
12621269
Worker->>Worker: Extract and store re-export mappings
12631270
Worker->>Worker: Extract cross-references from links field
12641271
Worker->>DB: Store re-export mappings to reexports table
@@ -2543,9 +2550,9 @@ graph TD
25432550
end
25442551
25452552
subgraph "Core Operations"
2546-
RESOLVE[resolve_import()<br/>Import path resolution<br/>Confidence scoring<br/>Alternative suggestions]
2553+
RESOLVE[resolve_import()<br/>Import path resolution<br/>Confidence scoring<br/>Alternative suggestions<br/>COMPLETED: Database query implementation<br/>with similarity matching and confidence scoring]
25472554
GRAPH[get_dependency_graph()<br/>Path-based JOIN operations<br/>String extraction from item_path<br/>Cycle detection via DFS<br/>Production schema compatible]
2548-
MIGRATE[suggest_migrations()<br/>UNION of LEFT JOINs pattern<br/>Embeddings table integration<br/>Breaking change detection<br/>SQLite-compatible operations]
2555+
MIGRATE[suggest_migrations()<br/>FIXED: Complex JOIN condition simplified<br/>Direct crate_metadata table usage<br/>Returns MigrationSuggestionsResponse object<br/>SQLite-compatible operations]
25492556
TRACE[trace_reexports()<br/>alias_path/actual_path columns<br/>Path-based relationship mapping<br/>Confidence calculation<br/>Schema-aligned queries]
25502557
end
25512558
@@ -8000,6 +8007,20 @@ if os.getenv("DOCSRS_EMBEDDINGS_WARMUP_ENABLED", "true").lower() == "true":
80008007

80018008
The embedding warmup system integrates with the existing health monitoring infrastructure to provide visibility into warmup status.
80028009

8010+
#### Enhanced Health Probing for MCP SDK Mode
8011+
8012+
The MCP SDK server now includes comprehensive health monitoring tools specifically designed for stdio-based MCP servers:
8013+
8014+
**New Health Tools**:
8015+
- `server_health`: Comprehensive health monitoring including database, memory, and pre-ingestion worker status
8016+
- `get_ingestion_status`: Detailed ingestion status reporting with subsystem checks
8017+
8018+
**Health Monitoring Architecture**:
8019+
- **Database Health**: Connection status, query performance, and schema integrity
8020+
- **Memory Health**: Process memory usage, leak detection, and garbage collection status
8021+
- **Pre-ingestion Health**: Worker status, queue depth, and processing rates
8022+
- **STDIO Compatibility**: Health data delivered through MCP JSON-RPC protocol
8023+
80038024
**Health Endpoint Response**:
80048025
```json
80058026
{

UsefulInformation.json

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1233,6 +1233,71 @@
12331233
],
12341234
"impact": "Full Claude Code client compatibility restored. Resources now discoverable via standard MCP protocol. Foundation for adding more resources in future.",
12351235
"debuggingTechnique": "Test with JSON-RPC protocol directly using resources/list and resources/read methods to verify proper MCP resource implementation"
1236+
},
1237+
{
1238+
"error": "Dependency Graph Analysis Returning Empty Dependencies",
1239+
"rootCause": "Dependencies were never stored in the database during ingestion pipeline. The ingestion process was parsing dependencies from Cargo.toml files but not persisting them to the database, causing dependency analysis tools to return empty results.",
1240+
"solution": "Added store_crate_dependencies function in storage.py to persist dependencies from parsed Cargo.toml files. Modified ingestion_orchestrator.py to download and parse Cargo.toml files, then store dependencies in reexports table with link_type='dependency' for proper relational tracking.",
1241+
"context": "Dependency graph analysis and migration suggestions were failing due to missing dependency data in database",
1242+
"lesson": "Ingestion pipelines must validate that all extracted data is properly persisted to database. Silent data loss during ingestion leads to downstream tool failures.",
1243+
"pattern": "Always verify database storage of parsed data with explicit validation queries during ingestion testing",
1244+
"dateEncountered": "2025-09-04",
1245+
"relatedFiles": ["src/docsrs_mcp/storage.py", "src/docsrs_mcp/ingestion_orchestrator.py"],
1246+
"codeExample": "def store_crate_dependencies(self, crate_name: str, version: str, dependencies: List[str]):\n \"\"\"Store crate dependencies in the database\"\"\"\n for dep_name in dependencies:\n self.cursor.execute(\n \"INSERT OR IGNORE INTO reexports (crate_name, version, item_path, target_path, link_type) VALUES (?, ?, ?, ?, ?)\",\n (crate_name, version, f\"{crate_name}::{dep_name}\", dep_name, \"dependency\")\n )",
1247+
"testingConfirmed": ["Dependencies now properly stored during ingestion", "Dependency analysis tools return populated results", "Migration suggestions work with actual dependency data"],
1248+
"preventionStrategy": "Add explicit validation steps in ingestion pipeline to verify all parsed data types are stored correctly in database"
1249+
},
1250+
{
1251+
"error": "Migration Suggestions Query Failing with Complex JOIN",
1252+
"rootCause": "Complex JOIN condition attempting to extract crate name from item_path string using SQL string functions. The query was trying to parse 'crate::item' format from item_path column which is fragile and failed with complex path structures.",
1253+
"solution": "Simplified query to use crate_metadata table directly instead of string parsing. Replaced string extraction with direct crate_metadata.id matching for more reliable and performant queries.",
1254+
"context": "Migration suggestion queries were failing due to overly complex JOIN conditions that attempted SQL string parsing",
1255+
"lesson": "Avoid complex string parsing in SQL queries when relational data is available through proper foreign keys. Direct table joins are more reliable and performant than string manipulation.",
1256+
"pattern": "Use proper relational database design with foreign keys instead of embedding identifiers in strings that require parsing",
1257+
"dateEncountered": "2025-09-04",
1258+
"relatedFiles": ["src/docsrs_mcp/migration_service.py"],
1259+
"codeExample": "# BEFORE (failing string parsing):\nSELECT DISTINCT SUBSTR(item_path, 1, INSTR(item_path, '::') - 1) as crate_name\nFROM documentation d\nJOIN crate_metadata cm ON SUBSTR(d.item_path, 1, INSTR(d.item_path, '::') - 1) = cm.name\n\n# AFTER (direct table joins):\nSELECT DISTINCT cm.name as crate_name\nFROM documentation d\nJOIN crate_metadata cm ON d.crate_name = cm.name",
1260+
"testingConfirmed": ["Migration suggestions queries now execute successfully", "Query performance improved with direct table joins", "No more SQL string parsing errors"],
1261+
"preventionStrategy": "Design database schema to avoid embedding parseable identifiers in string fields. Use proper foreign key relationships for reliable queries."
1262+
},
1263+
{
1264+
"error": "MCP SDK Mode Lacks Health Monitoring",
1265+
"rootCause": "Health monitoring endpoints existed only for REST mode, not for stdio-based MCP servers. MCP SDK mode had no mechanism to check subsystem health or ingestion status, making debugging and monitoring difficult.",
1266+
"solution": "Added server_health and get_ingestion_status tools to mcp_sdk_server.py. Implemented comprehensive subsystem monitoring for database connectivity, memory usage, and pre-ingestion worker status through native MCP tool interface.",
1267+
"context": "Debugging MCP server issues was difficult without health monitoring capabilities in SDK mode",
1268+
"lesson": "All server modes should have equivalent monitoring capabilities regardless of communication protocol. Health monitoring is essential for both development and production debugging.",
1269+
"pattern": "Implement health monitoring tools as native MCP tools for stdio-based servers to maintain consistent monitoring capabilities across all deployment modes",
1270+
"dateEncountered": "2025-09-04",
1271+
"relatedFiles": ["src/docsrs_mcp/mcp_sdk_server.py"],
1272+
"codeExample": "@server.call_tool()\nasync def server_health(arguments: dict) -> list[types.TextContent]:\n \"\"\"Get comprehensive server health status\"\"\"\n try:\n # Database health check\n db_status = await check_database_health()\n # Memory monitoring\n memory_info = get_memory_usage()\n # Worker status\n worker_status = get_worker_health()\n \n return [types.TextContent(type=\"text\", text=json.dumps({\n \"database\": db_status,\n \"memory\": memory_info,\n \"workers\": worker_status\n }, indent=2))]",
1273+
"testingConfirmed": ["Health monitoring tools work in MCP SDK mode", "Comprehensive subsystem status available", "Debugging capabilities equivalent to REST mode"],
1274+
"preventionStrategy": "Always implement equivalent monitoring capabilities across all server communication modes"
1275+
},
1276+
{
1277+
"error": "Duplicate Tool Names (camelCase vs snake_case)",
1278+
"rootCause": "Historical migration left both camelCase and snake_case versions of tool names in the codebase. Tools like getDocumentationDetail, extractUsagePatterns, and generateLearningPath existed alongside their snake_case equivalents, causing client confusion and potential conflicts.",
1279+
"solution": "Removed camelCase duplicates: getDocumentationDetail, extractUsagePatterns, generateLearningPath. Standardized all tool names to follow Python snake_case conventions consistently across the MCP server implementation.",
1280+
"context": "MCP tool registration had duplicate entries with different naming conventions from incomplete refactoring",
1281+
"lesson": "API naming conventions must be consistent and complete. Partial migrations leave confusing duplicate interfaces that reduce user experience quality.",
1282+
"pattern": "When standardizing naming conventions, audit all tool definitions to ensure complete migration with no legacy duplicates remaining",
1283+
"dateEncountered": "2025-09-04",
1284+
"relatedFiles": ["src/docsrs_mcp/mcp_sdk_server.py", "src/docsrs_mcp/mcp_tools.py"],
1285+
"codeExample": "# REMOVED (duplicate camelCase versions):\n# @server.call_tool()\n# async def getDocumentationDetail(arguments: dict):\n# @server.call_tool() \n# async def extractUsagePatterns(arguments: dict):\n# @server.call_tool()\n# async def generateLearningPath(arguments: dict):\n\n# KEPT (standardized snake_case versions):\n@server.call_tool()\nasync def get_documentation_detail(arguments: dict):\n@server.call_tool()\nasync def extract_usage_patterns(arguments: dict):\n@server.call_tool()\nasync def generate_learning_path(arguments: dict):",
1286+
"testingConfirmed": ["No duplicate tool names in MCP tool list", "All tools follow snake_case convention", "Client tool discovery shows clean, consistent naming"],
1287+
"preventionStrategy": "Use automated linting to detect naming convention violations and ensure complete migration when standardizing APIs"
1288+
},
1289+
{
1290+
"error": "Cross-Reference Service TODO for Import Alternatives",
1291+
"rootCause": "The resolve_import method in cross-reference service had incomplete implementation marked with TODO comments. When users requested import alternatives, the service returned placeholder responses instead of actual alternative suggestions.",
1292+
"solution": "Implemented database query to find similar item paths with confidence scoring. Added fuzzy matching algorithm to identify alternative import paths and return them with type classification and confidence scores for user evaluation.",
1293+
"context": "Import resolution service was returning TODO placeholders instead of actual alternative import suggestions",
1294+
"lesson": "TODO markers in user-facing functionality create poor user experience. All user-accessible features should have complete implementations, even if they start with basic algorithms.",
1295+
"pattern": "Replace TODO implementations with functional algorithms before exposing features to users. Mark incomplete features as experimental rather than leaving TODO stubs.",
1296+
"dateEncountered": "2025-09-04",
1297+
"relatedFiles": ["src/docsrs_mcp/services/cross_reference_service.py"],
1298+
"codeExample": "async def resolve_import(self, item_path: str, context: str = None) -> Dict[str, Any]:\n \"\"\"Find alternative import paths for the given item\"\"\"\n # Query similar item paths from database\n similar_items = await self.database.find_similar_paths(item_path, limit=10)\n \n alternatives = []\n for item in similar_items:\n # Calculate confidence score using fuzzy matching\n confidence = fuzz.ratio(item_path, item['path']) / 100.0\n alternatives.append({\n 'path': item['path'],\n 'crate': item['crate_name'],\n 'type': item['item_type'],\n 'confidence': confidence\n })\n \n return {\n 'original_path': item_path,\n 'alternatives': sorted(alternatives, key=lambda x: x['confidence'], reverse=True)\n }",
1299+
"testingConfirmed": ["Import alternatives now return actual suggestions", "Confidence scoring helps users evaluate options", "Fuzzy matching finds relevant alternative paths"],
1300+
"preventionStrategy": "Audit all TODO markers in user-facing code and implement basic functionality before feature release"
12361301
}
12371302
]
12381303
},

src/docsrs_mcp/database/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@
6767
# Re-export from storage module
6868
from .storage import (
6969
store_crate_metadata,
70+
store_crate_dependencies,
7071
store_modules,
7172
store_reexports,
7273
)
@@ -89,6 +90,7 @@
8990
"migrate_reexports_for_crossrefs",
9091
# Storage module
9192
"store_crate_metadata",
93+
"store_crate_dependencies",
9294
"store_modules",
9395
"store_reexports",
9496
# Search module

0 commit comments

Comments
 (0)