|
| 1 | +# CLI MCP Hanging Issue Fix |
| 2 | + |
| 3 | +## Problem Statement |
| 4 | + |
| 5 | +The CLI hangs after task completion when MCP servers are connected, requiring manual termination (Ctrl+C). This occurs because MCP connections are established but never properly cleaned up, keeping the Node.js event loop alive. |
| 6 | + |
| 7 | +## Root Cause Analysis |
| 8 | + |
| 9 | +### Current Issues |
| 10 | + |
| 11 | +1. **Resource Leak**: `CLIMcpService` instance is created as local variable and never disposed |
| 12 | +2. **Missing Tool Implementations**: `use_mcp_tool` and `access_mcp_resource` return "not implemented for CLI mode" |
| 13 | +3. **Event Loop Persistence**: Active MCP connections (TCP sockets, timers, health checkers) prevent process termination |
| 14 | +4. **No Cleanup Chain**: Missing lifecycle management from task completion to process exit |
| 15 | + |
| 16 | +### Evidence |
| 17 | + |
| 18 | +- Console output shows task completion but process hangs |
| 19 | +- Memory pressure detected: 117% of limit used |
| 20 | +- Tool execution failures for MCP operations |
| 21 | +- Manual Ctrl+C required to terminate CLI |
| 22 | + |
| 23 | +## Solution Architecture |
| 24 | + |
| 25 | +```mermaid |
| 26 | +graph TD |
| 27 | + A[CLI Starts] --> B[Task Created] |
| 28 | + B --> C[MCP Service Initialized] |
| 29 | + C --> D[MCP Connections Established] |
| 30 | + D --> E[Task Execution] |
| 31 | + E --> F[Tool Calls - MCP Tools Available] |
| 32 | + F --> G[Task Completes] |
| 33 | + G --> H[MCP Service Disposed] |
| 34 | + H --> I[Connections Closed] |
| 35 | + I --> J[Process Exits Gracefully] |
| 36 | +
|
| 37 | + K[Error/Abort] --> H |
| 38 | +
|
| 39 | + style H fill:#ff9999 |
| 40 | + style I fill:#ff9999 |
| 41 | + style J fill:#99ff99 |
| 42 | +``` |
| 43 | + |
| 44 | +## Implementation Stories |
| 45 | + |
| 46 | +### Story 1: Task MCP Service Lifecycle Management |
| 47 | + |
| 48 | +**Priority**: HIGH |
| 49 | +**File**: `src/core/task/Task.ts` |
| 50 | + |
| 51 | +**Description**: Implement proper lifecycle management for MCP service in Task class |
| 52 | + |
| 53 | +**Acceptance Criteria**: |
| 54 | + |
| 55 | +- [ ] Add `cliMcpService` as instance property to Task class |
| 56 | +- [ ] Store MCP service reference during initialization instead of local variable |
| 57 | +- [ ] Add `dispose()` method to Task class that calls `cliMcpService.dispose()` |
| 58 | +- [ ] Call cleanup on `taskCompleted` event |
| 59 | +- [ ] Call cleanup on `taskAborted` event |
| 60 | +- [ ] Ensure no memory leaks from unreferenced MCP connections |
| 61 | + |
| 62 | +**Technical Implementation**: |
| 63 | + |
| 64 | +```typescript |
| 65 | +class Task extends EventEmitter { |
| 66 | + private cliMcpService?: CLIMcpService |
| 67 | + |
| 68 | + // In initialization code (around line 1171) |
| 69 | + this.cliMcpService = new CLIMcpService(this.mcpConfigPath) |
| 70 | + |
| 71 | + async dispose() { |
| 72 | + if (this.cliMcpService) { |
| 73 | + await this.cliMcpService.dispose() |
| 74 | + this.cliMcpService = undefined |
| 75 | + } |
| 76 | + } |
| 77 | + |
| 78 | + // Add event handlers for cleanup |
| 79 | + this.on('taskCompleted', () => this.dispose()) |
| 80 | + this.on('taskAborted', () => this.dispose()) |
| 81 | +} |
| 82 | +``` |
| 83 | + |
| 84 | +### Story 2: Implement CLI MCP Tools |
| 85 | + |
| 86 | +**Priority**: HIGH |
| 87 | +**Files**: |
| 88 | + |
| 89 | +- `src/core/task/Task.ts` (executeCliTool method) |
| 90 | +- `src/cli/services/CLIMcpService.ts` (tool delegation methods) |
| 91 | + |
| 92 | +**Description**: Implement missing MCP tools for CLI mode to enable resource listing and tool execution |
| 93 | + |
| 94 | +**Acceptance Criteria**: |
| 95 | + |
| 96 | +- [ ] Implement `use_mcp_tool` in `executeCliTool` method |
| 97 | +- [ ] Implement `access_mcp_resource` in `executeCliTool` method |
| 98 | +- [ ] Add `executeTool(serverName, toolName, arguments)` method to CLIMcpService |
| 99 | +- [ ] Add `accessResource(serverName, uri)` method to CLIMcpService |
| 100 | +- [ ] Handle tool errors gracefully with proper error messages |
| 101 | +- [ ] Return properly formatted responses for CLI context |
| 102 | + |
| 103 | +**Technical Implementation**: |
| 104 | + |
| 105 | +```typescript |
| 106 | +// In Task.ts executeCliTool method |
| 107 | +case 'use_mcp_tool': |
| 108 | + if (!this.cliMcpService) { |
| 109 | + throw new Error('MCP service not available in CLI mode') |
| 110 | + } |
| 111 | + return await this.cliMcpService.executeTool( |
| 112 | + params.server_name, |
| 113 | + params.tool_name, |
| 114 | + JSON.parse(params.arguments) |
| 115 | + ) |
| 116 | + |
| 117 | +case 'access_mcp_resource': |
| 118 | + if (!this.cliMcpService) { |
| 119 | + throw new Error('MCP service not available in CLI mode') |
| 120 | + } |
| 121 | + return await this.cliMcpService.accessResource( |
| 122 | + params.server_name, |
| 123 | + params.uri |
| 124 | + ) |
| 125 | +``` |
| 126 | + |
| 127 | +### Story 3: BatchProcessor Cleanup Integration |
| 128 | + |
| 129 | +**Priority**: MEDIUM |
| 130 | +**File**: `src/cli/commands/batch.ts` |
| 131 | + |
| 132 | +**Description**: Ensure BatchProcessor properly handles cleanup and doesn't hang after task completion |
| 133 | + |
| 134 | +**Acceptance Criteria**: |
| 135 | + |
| 136 | +- [ ] Add cleanup handling in BatchProcessor event listeners |
| 137 | +- [ ] Ensure MCP disposal on task completion events |
| 138 | +- [ ] Implement timeout-based cleanup for hung processes |
| 139 | +- [ ] Handle cleanup on process termination signals (SIGINT, SIGTERM) |
| 140 | +- [ ] Remove or increase timeout to prevent premature termination |
| 141 | + |
| 142 | +**Technical Implementation**: |
| 143 | + |
| 144 | +```typescript |
| 145 | +// In BatchProcessor.executeTask method |
| 146 | +task.on("taskCompleted", async (taskId: string, tokenUsage: any, toolUsage: any) => { |
| 147 | + this.logDebug(`[BatchProcessor] Task completed: ${taskId}`) |
| 148 | + this.logDebug(`[BatchProcessor] Token usage:`, tokenUsage) |
| 149 | + this.logDebug(`[BatchProcessor] Tool usage:`, toolUsage) |
| 150 | + |
| 151 | + // Ensure cleanup before resolving |
| 152 | + try { |
| 153 | + await task.dispose() |
| 154 | + } catch (error) { |
| 155 | + this.logDebug("Cleanup error:", error) |
| 156 | + } |
| 157 | + |
| 158 | + resolve() |
| 159 | +}) |
| 160 | + |
| 161 | +task.on("taskAborted", async () => { |
| 162 | + this.logDebug("[BatchProcessor] Task was aborted") |
| 163 | + try { |
| 164 | + await task.dispose() |
| 165 | + } catch (error) { |
| 166 | + this.logDebug("Cleanup error:", error) |
| 167 | + } |
| 168 | + reject(new Error("Task was aborted")) |
| 169 | +}) |
| 170 | +``` |
| 171 | + |
| 172 | +### Story 4: CLI Main Process Exit Handling |
| 173 | + |
| 174 | +**Priority**: MEDIUM |
| 175 | +**File**: `src/cli/index.ts` |
| 176 | + |
| 177 | +**Description**: Implement graceful shutdown sequence and ensure CLI exits automatically |
| 178 | + |
| 179 | +**Acceptance Criteria**: |
| 180 | + |
| 181 | +- [ ] Add process exit handlers for cleanup operations |
| 182 | +- [ ] Implement graceful shutdown sequence with timeout |
| 183 | +- [ ] Add timeout for forced exit if cleanup hangs (10 seconds) |
| 184 | +- [ ] Ensure proper exit codes reflect task success/failure status |
| 185 | +- [ ] Handle SIGINT and SIGTERM signals gracefully |
| 186 | + |
| 187 | +**Technical Implementation**: |
| 188 | + |
| 189 | +```typescript |
| 190 | +// In main CLI action handler |
| 191 | +process.on("SIGINT", async () => { |
| 192 | + console.log("\nReceived SIGINT, cleaning up...") |
| 193 | + // Trigger cleanup and exit |
| 194 | + setTimeout(() => process.exit(130), 10000) // Force exit after 10s |
| 195 | +}) |
| 196 | + |
| 197 | +process.on("SIGTERM", async () => { |
| 198 | + console.log("\nReceived SIGTERM, cleaning up...") |
| 199 | + setTimeout(() => process.exit(143), 10000) // Force exit after 10s |
| 200 | +}) |
| 201 | + |
| 202 | +// After task execution |
| 203 | +try { |
| 204 | + if (options.batch || options.stdin || !options.interactive) { |
| 205 | + // ... existing code ... |
| 206 | + } else { |
| 207 | + // ... existing code ... |
| 208 | + } |
| 209 | + |
| 210 | + // Ensure process exits |
| 211 | + setTimeout(() => { |
| 212 | + if (options.verbose) { |
| 213 | + logger.debug("Forcing process exit after timeout") |
| 214 | + } |
| 215 | + process.exit(0) |
| 216 | + }, 5000) |
| 217 | +} catch (error) { |
| 218 | + // ... existing error handling ... |
| 219 | + process.exit(1) |
| 220 | +} |
| 221 | +``` |
| 222 | + |
| 223 | +### Story 5: Connection Health and Cleanup Enhancement |
| 224 | + |
| 225 | +**Priority**: LOW |
| 226 | +**File**: `src/cli/services/CLIMcpService.ts` |
| 227 | + |
| 228 | +**Description**: Enhance disposal method with robust cleanup and timeout handling |
| 229 | + |
| 230 | +**Acceptance Criteria**: |
| 231 | + |
| 232 | +- [ ] Add timeout handling to dispose method (default 5 seconds) |
| 233 | +- [ ] Implement connection state tracking for cleanup verification |
| 234 | +- [ ] Add forced cleanup for unresponsive connections |
| 235 | +- [ ] Enhance logging for cleanup operations in verbose mode |
| 236 | +- [ ] Ensure dispose is idempotent (safe to call multiple times) |
| 237 | + |
| 238 | +**Technical Implementation**: |
| 239 | + |
| 240 | +```typescript |
| 241 | +async dispose(): Promise<void> { |
| 242 | + if (this.isDisposed) { |
| 243 | + return // Already disposed |
| 244 | + } |
| 245 | + |
| 246 | + const logger = getCLILogger() |
| 247 | + logger.debug('CLIMcpService: Starting disposal...') |
| 248 | + |
| 249 | + // Stop all health checkers |
| 250 | + for (const [serverId, interval] of this.healthCheckIntervals) { |
| 251 | + clearInterval(interval) |
| 252 | + this.healthCheckIntervals.delete(serverId) |
| 253 | + } |
| 254 | + |
| 255 | + // Disconnect all servers with timeout |
| 256 | + const disconnectPromises = Array.from(this.connections.keys()).map(async (serverId) => { |
| 257 | + try { |
| 258 | + const timeoutPromise = new Promise((_, reject) => |
| 259 | + setTimeout(() => reject(new Error('Disconnect timeout')), 3000) |
| 260 | + ) |
| 261 | + |
| 262 | + await Promise.race([ |
| 263 | + this.disconnectFromServer(serverId), |
| 264 | + timeoutPromise |
| 265 | + ]) |
| 266 | + |
| 267 | + logger.debug(`CLIMcpService: Disconnected from ${serverId}`) |
| 268 | + } catch (error) { |
| 269 | + logger.debug(`CLIMcpService: Force disconnect ${serverId}:`, error) |
| 270 | + // Force remove connection |
| 271 | + this.connections.delete(serverId) |
| 272 | + } |
| 273 | + }) |
| 274 | + |
| 275 | + await Promise.allSettled(disconnectPromises) |
| 276 | + |
| 277 | + this.isDisposed = true |
| 278 | + logger.debug('CLIMcpService: Disposal complete') |
| 279 | +} |
| 280 | +``` |
| 281 | + |
| 282 | +## Implementation Priority |
| 283 | + |
| 284 | +1. **Story 1 (HIGH)**: Task Lifecycle - Fixes immediate hanging issue |
| 285 | +2. **Story 2 (HIGH)**: CLI Tools - Fixes "not implemented" errors that trigger the hanging |
| 286 | +3. **Story 3 (MEDIUM)**: BatchProcessor - Improves reliability and proper cleanup |
| 287 | +4. **Story 4 (MEDIUM)**: Main Process - Ensures graceful exit as fallback |
| 288 | +5. **Story 5 (LOW)**: Enhancement - Improves robustness for edge cases |
| 289 | + |
| 290 | +## Risk Mitigation |
| 291 | + |
| 292 | +1. **Timeout Handling**: All cleanup operations have timeouts to prevent infinite hanging |
| 293 | +2. **Error Recovery**: Cleanup continues even if individual connections fail |
| 294 | +3. **Forced Exit**: Process will exit after cleanup timeout expires |
| 295 | +4. **Backward Compatibility**: Changes don't affect existing VSCode extension functionality |
| 296 | +5. **Idempotent Operations**: Cleanup methods can be called multiple times safely |
| 297 | + |
| 298 | +## Testing Strategy |
| 299 | + |
| 300 | +### Unit Tests |
| 301 | + |
| 302 | +- [ ] Test Task.dispose() method |
| 303 | +- [ ] Test CLIMcpService tool implementations |
| 304 | +- [ ] Test CLIMcpService.dispose() with timeout scenarios |
| 305 | +- [ ] Test error handling in cleanup operations |
| 306 | + |
| 307 | +### Integration Tests |
| 308 | + |
| 309 | +- [ ] Test full CLI execution with MCP servers connected |
| 310 | +- [ ] Test CLI exits cleanly after task completion |
| 311 | +- [ ] Test CLI handles SIGINT/SIGTERM during execution |
| 312 | +- [ ] Test tool execution with connected MCP servers |
| 313 | + |
| 314 | +### Process Tests |
| 315 | + |
| 316 | +- [ ] Verify CLI process terminates automatically |
| 317 | +- [ ] Test memory cleanup after task completion |
| 318 | +- [ ] Verify no orphaned processes or connections |
| 319 | +- [ ] Test timeout-based forced exit scenarios |
| 320 | + |
| 321 | +## Success Criteria |
| 322 | + |
| 323 | +1. **Primary**: CLI exits automatically after task completion without manual intervention |
| 324 | +2. **Secondary**: MCP tools (`use_mcp_tool`, `access_mcp_resource`) work in CLI mode |
| 325 | +3. **Tertiary**: No memory leaks or resource accumulation during CLI execution |
| 326 | +4. **Quality**: Robust error handling and graceful degradation when MCP operations fail |
| 327 | + |
| 328 | +## Dependencies |
| 329 | + |
| 330 | +- No breaking changes to existing MCP interfaces |
| 331 | +- Maintains compatibility with VSCode extension MCP functionality |
| 332 | +- Requires proper testing with both MCP server types (stdio and SSE) |
0 commit comments