|
| 1 | +# MCP Architecture Fix - Critical CLI Compatibility |
| 2 | + |
| 3 | +## Epic Overview |
| 4 | + |
| 5 | +Fix the MCP server implementation to ensure proper initialization handshake timing, resolving the "Method not found" errors in CLI and ensuring compatibility between extension and CLI versions. |
| 6 | + |
| 7 | +## Root Cause |
| 8 | + |
| 9 | +CLI connections are attempting MCP capability discovery (tools/list, resources/list) before the MCP initialization handshake completes, even though transport connections are established. |
| 10 | + |
| 11 | +## Success Criteria |
| 12 | + |
| 13 | +- [ ] CLI can successfully connect to MCP servers without -32601 errors |
| 14 | +- [ ] Extension MCP functionality remains fully functional |
| 15 | +- [ ] Both stdio and SSE connections work properly |
| 16 | +- [ ] Proper error handling and recovery mechanisms |
| 17 | +- [ ] Unified connection architecture between CLI and extension |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## Story 1: Create Base MCP Connection Interface |
| 22 | + |
| 23 | +**Priority**: P0 - Critical |
| 24 | +**Estimate**: 2 story points |
| 25 | + |
| 26 | +### Description |
| 27 | + |
| 28 | +Create a unified base interface and abstract class for MCP connections that handles proper initialization handshake timing. |
| 29 | + |
| 30 | +### Acceptance Criteria |
| 31 | + |
| 32 | +- [ ] Create `BaseMcpConnection` abstract class with proper handshake flow |
| 33 | +- [ ] Implement `initializeHandshake()` method that waits for MCP ready state |
| 34 | +- [ ] Add `waitForReady()` method with configurable timeout |
| 35 | +- [ ] Include proper error handling for handshake failures |
| 36 | +- [ ] Add connection state management (connecting -> handshaking -> ready -> connected) |
| 37 | + |
| 38 | +### Technical Details |
| 39 | + |
| 40 | +```typescript |
| 41 | +abstract class BaseMcpConnection implements McpConnection { |
| 42 | + protected abstract setupTransport(): Promise<void> |
| 43 | + protected abstract createClient(): Client |
| 44 | + |
| 45 | + public async connect(): Promise<void> { |
| 46 | + await this.setupTransport() |
| 47 | + await this.initializeHandshake() |
| 48 | + await this.waitForReady() |
| 49 | + this.status = "connected" |
| 50 | + } |
| 51 | + |
| 52 | + private async initializeHandshake(): Promise<void> { |
| 53 | + // Perform MCP initialization protocol |
| 54 | + } |
| 55 | + |
| 56 | + private async waitForReady(timeout: number = 30000): Promise<void> { |
| 57 | + // Wait for server to be ready for capability requests |
| 58 | + } |
| 59 | +} |
| 60 | +``` |
| 61 | + |
| 62 | +### Files to Modify |
| 63 | + |
| 64 | +- `src/core/mcp/BaseMcpConnection.ts` (new) |
| 65 | +- `src/core/mcp/types.ts` (update interfaces) |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +## Story 2: Fix CLI Stdio Connection Implementation |
| 70 | + |
| 71 | +**Priority**: P0 - Critical |
| 72 | +**Estimate**: 3 story points |
| 73 | + |
| 74 | +### Description |
| 75 | + |
| 76 | +Update the CLI Stdio connection to use the new base class and implement proper handshake timing. |
| 77 | + |
| 78 | +### Acceptance Criteria |
| 79 | + |
| 80 | +- [ ] Extend `BaseMcpConnection` instead of implementing interface directly |
| 81 | +- [ ] Override `setupTransport()` to create stdio transport properly |
| 82 | +- [ ] Add stderr monitoring with proper error classification |
| 83 | +- [ ] Implement retry logic for failed handshakes |
| 84 | +- [ ] Add timeout handling for slow server starts |
| 85 | + |
| 86 | +### Technical Details |
| 87 | + |
| 88 | +- Follow the extension's pattern for stderr handling and transport setup |
| 89 | +- Add delay after transport connection before attempting handshake |
| 90 | +- Implement exponential backoff for retries |
| 91 | + |
| 92 | +### Files to Modify |
| 93 | + |
| 94 | +- `src/cli/connections/StdioMcpConnection.ts` |
| 95 | +- `src/cli/connections/__tests__/StdioMcpConnection.test.ts` |
| 96 | + |
| 97 | +--- |
| 98 | + |
| 99 | +## Story 3: Fix CLI SSE Connection Implementation |
| 100 | + |
| 101 | +**Priority**: P0 - Critical |
| 102 | +**Estimate**: 2 story points |
| 103 | + |
| 104 | +### Description |
| 105 | + |
| 106 | +Update the CLI SSE connection to use the new base class and implement proper handshake timing. |
| 107 | + |
| 108 | +### Acceptance Criteria |
| 109 | + |
| 110 | +- [ ] Extend `BaseMcpConnection` instead of implementing interface directly |
| 111 | +- [ ] Override `setupTransport()` to create SSE transport properly |
| 112 | +- [ ] Add proper event handling for SSE-specific errors |
| 113 | +- [ ] Implement connection health monitoring |
| 114 | +- [ ] Add reconnection logic for dropped SSE connections |
| 115 | + |
| 116 | +### Files to Modify |
| 117 | + |
| 118 | +- `src/cli/connections/SseMcpConnection.ts` |
| 119 | +- `src/cli/connections/__tests__/SseMcpConnection.test.ts` |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +## Story 4: Update CLI MCP Service for Proper Timing |
| 124 | + |
| 125 | +**Priority**: P0 - Critical |
| 126 | +**Estimate**: 2 story points |
| 127 | + |
| 128 | +### Description |
| 129 | + |
| 130 | +Update the CLI MCP service to properly handle the new connection timing and avoid premature capability discovery. |
| 131 | + |
| 132 | +### Acceptance Criteria |
| 133 | + |
| 134 | +- [ ] Add connection readiness checks before capability discovery |
| 135 | +- [ ] Implement proper error handling for handshake failures |
| 136 | +- [ ] Add retry logic for failed connections |
| 137 | +- [ ] Update server discovery to wait for full connection readiness |
| 138 | +- [ ] Add proper logging for connection states |
| 139 | + |
| 140 | +### Technical Details |
| 141 | + |
| 142 | +```typescript |
| 143 | +async discoverServers(): Promise<McpServerInfo[]> { |
| 144 | + for (const config of configs) { |
| 145 | + const connection = this.connections.get(config.id) |
| 146 | + if (connection?.status === "connected" && connection.isReady) { |
| 147 | + // Only attempt capability discovery if fully ready |
| 148 | + try { |
| 149 | + await this.discoverCapabilities(connection) |
| 150 | + } catch (error) { |
| 151 | + // Handle with proper retry logic |
| 152 | + } |
| 153 | + } |
| 154 | + } |
| 155 | +} |
| 156 | +``` |
| 157 | + |
| 158 | +### Files to Modify |
| 159 | + |
| 160 | +- `src/cli/services/CLIMcpService.ts` |
| 161 | +- `src/cli/services/__tests__/CLIMcpService.test.ts` |
| 162 | + |
| 163 | +--- |
| 164 | + |
| 165 | +## Story 5: Ensure Extension Compatibility |
| 166 | + |
| 167 | +**Priority**: P0 - Critical |
| 168 | +**Estimate**: 3 story points |
| 169 | + |
| 170 | +### Description |
| 171 | + |
| 172 | +Verify that the extension MCP implementation remains fully functional and optionally migrate to use the new base classes. |
| 173 | + |
| 174 | +### Acceptance Criteria |
| 175 | + |
| 176 | +- [ ] Extension MCP servers continue to work without regression |
| 177 | +- [ ] All existing extension tests pass |
| 178 | +- [ ] Consider migrating extension to use new base classes (optional) |
| 179 | +- [ ] Ensure no breaking changes to extension API |
| 180 | +- [ ] Add integration tests for both CLI and extension |
| 181 | + |
| 182 | +### Technical Details |
| 183 | + |
| 184 | +- The extension's `McpHub.ts` has sophisticated initialization that works correctly |
| 185 | +- We need to ensure our changes don't break the extension |
| 186 | +- Consider whether to migrate the extension to use the new base classes or keep it separate |
| 187 | + |
| 188 | +### Files to Verify |
| 189 | + |
| 190 | +- `src/services/mcp/McpHub.ts` |
| 191 | +- `src/services/mcp/__tests__/McpHub.test.ts` |
| 192 | +- All extension MCP integration tests |
| 193 | + |
| 194 | +--- |
| 195 | + |
| 196 | +## Story 6: Add Integration Tests and Error Recovery |
| 197 | + |
| 198 | +**Priority**: P1 - High |
| 199 | +**Estimate**: 3 story points |
| 200 | + |
| 201 | +### Description |
| 202 | + |
| 203 | +Add comprehensive integration tests and improve error recovery mechanisms for both CLI and extension. |
| 204 | + |
| 205 | +### Acceptance Criteria |
| 206 | + |
| 207 | +- [ ] Add end-to-end tests for CLI MCP server connections |
| 208 | +- [ ] Test both stdio and SSE connection types |
| 209 | +- [ ] Add tests for connection failure scenarios |
| 210 | +- [ ] Test handshake timeout scenarios |
| 211 | +- [ ] Add tests for server restart and reconnection |
| 212 | +- [ ] Verify error messages are helpful for debugging |
| 213 | + |
| 214 | +### Files to Create/Modify |
| 215 | + |
| 216 | +- `src/cli/integration-tests/mcp-connections.test.ts` (new) |
| 217 | +- `src/core/mcp/__tests__/integration.test.ts` (new) |
| 218 | + |
| 219 | +--- |
| 220 | + |
| 221 | +## Implementation Order |
| 222 | + |
| 223 | +1. **Story 1**: Create base connection architecture |
| 224 | +2. **Story 2**: Fix CLI Stdio connection |
| 225 | +3. **Story 3**: Fix CLI SSE connection |
| 226 | +4. **Story 4**: Update CLI service timing |
| 227 | +5. **Story 5**: Verify extension compatibility |
| 228 | +6. **Story 6**: Add comprehensive testing |
| 229 | + |
| 230 | +## Risk Mitigation |
| 231 | + |
| 232 | +- **Risk**: Breaking extension functionality |
| 233 | + - **Mitigation**: Keep extension code separate initially, thorough testing |
| 234 | +- **Risk**: CLI still has timing issues |
| 235 | + |
| 236 | + - **Mitigation**: Add configurable delays and extensive logging for debugging |
| 237 | + |
| 238 | +- **Risk**: Different MCP server implementations behave differently |
| 239 | + - **Mitigation**: Test with multiple server types, add server-specific handling |
| 240 | + |
| 241 | +## Testing Strategy |
| 242 | + |
| 243 | +1. **Unit Tests**: Each connection class and service method |
| 244 | +2. **Integration Tests**: Full CLI workflows with real MCP servers |
| 245 | +3. **Regression Tests**: Ensure extension functionality unchanged |
| 246 | +4. **Manual Testing**: Test with the problematic mssql-dpsp and github servers |
0 commit comments