Skip to content

Commit 9c2d7ce

Browse files
author
Eric Oliver
committed
fixes for memory leaks and hangs
1 parent e4eac7e commit 9c2d7ce

File tree

5 files changed

+489
-14
lines changed

5 files changed

+489
-14
lines changed
Lines changed: 332 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,332 @@
1+
# CLI MCP Hanging Issue Fix
2+
3+
## Problem Statement
4+
5+
The CLI hangs after task completion when MCP servers are connected, requiring manual termination (Ctrl+C). This occurs because MCP connections are established but never properly cleaned up, keeping the Node.js event loop alive.
6+
7+
## Root Cause Analysis
8+
9+
### Current Issues
10+
11+
1. **Resource Leak**: `CLIMcpService` instance is created as local variable and never disposed
12+
2. **Missing Tool Implementations**: `use_mcp_tool` and `access_mcp_resource` return "not implemented for CLI mode"
13+
3. **Event Loop Persistence**: Active MCP connections (TCP sockets, timers, health checkers) prevent process termination
14+
4. **No Cleanup Chain**: Missing lifecycle management from task completion to process exit
15+
16+
### Evidence
17+
18+
- Console output shows task completion but process hangs
19+
- Memory pressure detected: 117% of limit used
20+
- Tool execution failures for MCP operations
21+
- Manual Ctrl+C required to terminate CLI
22+
23+
## Solution Architecture
24+
25+
```mermaid
26+
graph TD
27+
A[CLI Starts] --> B[Task Created]
28+
B --> C[MCP Service Initialized]
29+
C --> D[MCP Connections Established]
30+
D --> E[Task Execution]
31+
E --> F[Tool Calls - MCP Tools Available]
32+
F --> G[Task Completes]
33+
G --> H[MCP Service Disposed]
34+
H --> I[Connections Closed]
35+
I --> J[Process Exits Gracefully]
36+
37+
K[Error/Abort] --> H
38+
39+
style H fill:#ff9999
40+
style I fill:#ff9999
41+
style J fill:#99ff99
42+
```
43+
44+
## Implementation Stories
45+
46+
### Story 1: Task MCP Service Lifecycle Management
47+
48+
**Priority**: HIGH
49+
**File**: `src/core/task/Task.ts`
50+
51+
**Description**: Implement proper lifecycle management for MCP service in Task class
52+
53+
**Acceptance Criteria**:
54+
55+
- [ ] Add `cliMcpService` as instance property to Task class
56+
- [ ] Store MCP service reference during initialization instead of local variable
57+
- [ ] Add `dispose()` method to Task class that calls `cliMcpService.dispose()`
58+
- [ ] Call cleanup on `taskCompleted` event
59+
- [ ] Call cleanup on `taskAborted` event
60+
- [ ] Ensure no memory leaks from unreferenced MCP connections
61+
62+
**Technical Implementation**:
63+
64+
```typescript
65+
class Task extends EventEmitter {
66+
private cliMcpService?: CLIMcpService
67+
68+
// In initialization code (around line 1171)
69+
this.cliMcpService = new CLIMcpService(this.mcpConfigPath)
70+
71+
async dispose() {
72+
if (this.cliMcpService) {
73+
await this.cliMcpService.dispose()
74+
this.cliMcpService = undefined
75+
}
76+
}
77+
78+
// Add event handlers for cleanup
79+
this.on('taskCompleted', () => this.dispose())
80+
this.on('taskAborted', () => this.dispose())
81+
}
82+
```
83+
84+
### Story 2: Implement CLI MCP Tools
85+
86+
**Priority**: HIGH
87+
**Files**:
88+
89+
- `src/core/task/Task.ts` (executeCliTool method)
90+
- `src/cli/services/CLIMcpService.ts` (tool delegation methods)
91+
92+
**Description**: Implement missing MCP tools for CLI mode to enable resource listing and tool execution
93+
94+
**Acceptance Criteria**:
95+
96+
- [ ] Implement `use_mcp_tool` in `executeCliTool` method
97+
- [ ] Implement `access_mcp_resource` in `executeCliTool` method
98+
- [ ] Add `executeTool(serverName, toolName, arguments)` method to CLIMcpService
99+
- [ ] Add `accessResource(serverName, uri)` method to CLIMcpService
100+
- [ ] Handle tool errors gracefully with proper error messages
101+
- [ ] Return properly formatted responses for CLI context
102+
103+
**Technical Implementation**:
104+
105+
```typescript
106+
// In Task.ts executeCliTool method
107+
case 'use_mcp_tool':
108+
if (!this.cliMcpService) {
109+
throw new Error('MCP service not available in CLI mode')
110+
}
111+
return await this.cliMcpService.executeTool(
112+
params.server_name,
113+
params.tool_name,
114+
JSON.parse(params.arguments)
115+
)
116+
117+
case 'access_mcp_resource':
118+
if (!this.cliMcpService) {
119+
throw new Error('MCP service not available in CLI mode')
120+
}
121+
return await this.cliMcpService.accessResource(
122+
params.server_name,
123+
params.uri
124+
)
125+
```
126+
127+
### Story 3: BatchProcessor Cleanup Integration
128+
129+
**Priority**: MEDIUM
130+
**File**: `src/cli/commands/batch.ts`
131+
132+
**Description**: Ensure BatchProcessor properly handles cleanup and doesn't hang after task completion
133+
134+
**Acceptance Criteria**:
135+
136+
- [ ] Add cleanup handling in BatchProcessor event listeners
137+
- [ ] Ensure MCP disposal on task completion events
138+
- [ ] Implement timeout-based cleanup for hung processes
139+
- [ ] Handle cleanup on process termination signals (SIGINT, SIGTERM)
140+
- [ ] Remove or increase timeout to prevent premature termination
141+
142+
**Technical Implementation**:
143+
144+
```typescript
145+
// In BatchProcessor.executeTask method
146+
task.on("taskCompleted", async (taskId: string, tokenUsage: any, toolUsage: any) => {
147+
this.logDebug(`[BatchProcessor] Task completed: ${taskId}`)
148+
this.logDebug(`[BatchProcessor] Token usage:`, tokenUsage)
149+
this.logDebug(`[BatchProcessor] Tool usage:`, toolUsage)
150+
151+
// Ensure cleanup before resolving
152+
try {
153+
await task.dispose()
154+
} catch (error) {
155+
this.logDebug("Cleanup error:", error)
156+
}
157+
158+
resolve()
159+
})
160+
161+
task.on("taskAborted", async () => {
162+
this.logDebug("[BatchProcessor] Task was aborted")
163+
try {
164+
await task.dispose()
165+
} catch (error) {
166+
this.logDebug("Cleanup error:", error)
167+
}
168+
reject(new Error("Task was aborted"))
169+
})
170+
```
171+
172+
### Story 4: CLI Main Process Exit Handling
173+
174+
**Priority**: MEDIUM
175+
**File**: `src/cli/index.ts`
176+
177+
**Description**: Implement graceful shutdown sequence and ensure CLI exits automatically
178+
179+
**Acceptance Criteria**:
180+
181+
- [ ] Add process exit handlers for cleanup operations
182+
- [ ] Implement graceful shutdown sequence with timeout
183+
- [ ] Add timeout for forced exit if cleanup hangs (10 seconds)
184+
- [ ] Ensure proper exit codes reflect task success/failure status
185+
- [ ] Handle SIGINT and SIGTERM signals gracefully
186+
187+
**Technical Implementation**:
188+
189+
```typescript
190+
// In main CLI action handler
191+
process.on("SIGINT", async () => {
192+
console.log("\nReceived SIGINT, cleaning up...")
193+
// Trigger cleanup and exit
194+
setTimeout(() => process.exit(130), 10000) // Force exit after 10s
195+
})
196+
197+
process.on("SIGTERM", async () => {
198+
console.log("\nReceived SIGTERM, cleaning up...")
199+
setTimeout(() => process.exit(143), 10000) // Force exit after 10s
200+
})
201+
202+
// After task execution
203+
try {
204+
if (options.batch || options.stdin || !options.interactive) {
205+
// ... existing code ...
206+
} else {
207+
// ... existing code ...
208+
}
209+
210+
// Ensure process exits
211+
setTimeout(() => {
212+
if (options.verbose) {
213+
logger.debug("Forcing process exit after timeout")
214+
}
215+
process.exit(0)
216+
}, 5000)
217+
} catch (error) {
218+
// ... existing error handling ...
219+
process.exit(1)
220+
}
221+
```
222+
223+
### Story 5: Connection Health and Cleanup Enhancement
224+
225+
**Priority**: LOW
226+
**File**: `src/cli/services/CLIMcpService.ts`
227+
228+
**Description**: Enhance disposal method with robust cleanup and timeout handling
229+
230+
**Acceptance Criteria**:
231+
232+
- [ ] Add timeout handling to dispose method (default 5 seconds)
233+
- [ ] Implement connection state tracking for cleanup verification
234+
- [ ] Add forced cleanup for unresponsive connections
235+
- [ ] Enhance logging for cleanup operations in verbose mode
236+
- [ ] Ensure dispose is idempotent (safe to call multiple times)
237+
238+
**Technical Implementation**:
239+
240+
```typescript
241+
async dispose(): Promise<void> {
242+
if (this.isDisposed) {
243+
return // Already disposed
244+
}
245+
246+
const logger = getCLILogger()
247+
logger.debug('CLIMcpService: Starting disposal...')
248+
249+
// Stop all health checkers
250+
for (const [serverId, interval] of this.healthCheckIntervals) {
251+
clearInterval(interval)
252+
this.healthCheckIntervals.delete(serverId)
253+
}
254+
255+
// Disconnect all servers with timeout
256+
const disconnectPromises = Array.from(this.connections.keys()).map(async (serverId) => {
257+
try {
258+
const timeoutPromise = new Promise((_, reject) =>
259+
setTimeout(() => reject(new Error('Disconnect timeout')), 3000)
260+
)
261+
262+
await Promise.race([
263+
this.disconnectFromServer(serverId),
264+
timeoutPromise
265+
])
266+
267+
logger.debug(`CLIMcpService: Disconnected from ${serverId}`)
268+
} catch (error) {
269+
logger.debug(`CLIMcpService: Force disconnect ${serverId}:`, error)
270+
// Force remove connection
271+
this.connections.delete(serverId)
272+
}
273+
})
274+
275+
await Promise.allSettled(disconnectPromises)
276+
277+
this.isDisposed = true
278+
logger.debug('CLIMcpService: Disposal complete')
279+
}
280+
```
281+
282+
## Implementation Priority
283+
284+
1. **Story 1 (HIGH)**: Task Lifecycle - Fixes immediate hanging issue
285+
2. **Story 2 (HIGH)**: CLI Tools - Fixes "not implemented" errors that trigger the hanging
286+
3. **Story 3 (MEDIUM)**: BatchProcessor - Improves reliability and proper cleanup
287+
4. **Story 4 (MEDIUM)**: Main Process - Ensures graceful exit as fallback
288+
5. **Story 5 (LOW)**: Enhancement - Improves robustness for edge cases
289+
290+
## Risk Mitigation
291+
292+
1. **Timeout Handling**: All cleanup operations have timeouts to prevent infinite hanging
293+
2. **Error Recovery**: Cleanup continues even if individual connections fail
294+
3. **Forced Exit**: Process will exit after cleanup timeout expires
295+
4. **Backward Compatibility**: Changes don't affect existing VSCode extension functionality
296+
5. **Idempotent Operations**: Cleanup methods can be called multiple times safely
297+
298+
## Testing Strategy
299+
300+
### Unit Tests
301+
302+
- [ ] Test Task.dispose() method
303+
- [ ] Test CLIMcpService tool implementations
304+
- [ ] Test CLIMcpService.dispose() with timeout scenarios
305+
- [ ] Test error handling in cleanup operations
306+
307+
### Integration Tests
308+
309+
- [ ] Test full CLI execution with MCP servers connected
310+
- [ ] Test CLI exits cleanly after task completion
311+
- [ ] Test CLI handles SIGINT/SIGTERM during execution
312+
- [ ] Test tool execution with connected MCP servers
313+
314+
### Process Tests
315+
316+
- [ ] Verify CLI process terminates automatically
317+
- [ ] Test memory cleanup after task completion
318+
- [ ] Verify no orphaned processes or connections
319+
- [ ] Test timeout-based forced exit scenarios
320+
321+
## Success Criteria
322+
323+
1. **Primary**: CLI exits automatically after task completion without manual intervention
324+
2. **Secondary**: MCP tools (`use_mcp_tool`, `access_mcp_resource`) work in CLI mode
325+
3. **Tertiary**: No memory leaks or resource accumulation during CLI execution
326+
4. **Quality**: Robust error handling and graceful degradation when MCP operations fail
327+
328+
## Dependencies
329+
330+
- No breaking changes to existing MCP interfaces
331+
- Maintains compatibility with VSCode extension MCP functionality
332+
- Requires proper testing with both MCP server types (stdio and SSE)

src/cli/commands/batch.ts

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -191,15 +191,37 @@ export class BatchProcessor {
191191
this.logDebug("[BatchProcessor] Setting up task event handlers...")
192192

193193
// Set up event handlers
194-
task.on("taskCompleted", (taskId: string, tokenUsage: any, toolUsage: any) => {
194+
task.on("taskCompleted", async (taskId: string, tokenUsage: any, toolUsage: any) => {
195195
this.logDebug(`[BatchProcessor] Task completed: ${taskId}`)
196196
this.logDebug(`[BatchProcessor] Token usage:`, tokenUsage)
197197
this.logDebug(`[BatchProcessor] Tool usage:`, toolUsage)
198+
199+
// Ensure cleanup before resolving
200+
try {
201+
if (typeof task.dispose === "function") {
202+
await task.dispose()
203+
this.logDebug("[BatchProcessor] Task cleanup completed")
204+
}
205+
} catch (error) {
206+
this.logDebug("[BatchProcessor] Error during cleanup:", error)
207+
}
208+
198209
resolve()
199210
})
200211

201-
task.on("taskAborted", () => {
212+
task.on("taskAborted", async () => {
202213
this.logDebug("[BatchProcessor] Task was aborted")
214+
215+
// Ensure cleanup before rejecting
216+
try {
217+
if (typeof task.dispose === "function") {
218+
await task.dispose()
219+
this.logDebug("[BatchProcessor] Task cleanup completed after abort")
220+
}
221+
} catch (error) {
222+
this.logDebug("[BatchProcessor] Error during cleanup:", error)
223+
}
224+
203225
reject(new Error("Task was aborted"))
204226
})
205227

0 commit comments

Comments
 (0)