Skip to content

Commit 710dd34

Browse files
author
Eric Oliver
committed
mcp improvements
1 parent bcdb8c4 commit 710dd34

File tree

9 files changed

+2644
-6
lines changed

9 files changed

+2644
-6
lines changed
Lines changed: 374 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,374 @@
1+
# CLI Hanging Fix - Comprehensive Solution
2+
3+
## Overview
4+
5+
Fix the CLI hanging issue by addressing task completion detection, MCP disposal, and process exit enforcement.
6+
7+
## Problem Statement
8+
9+
The CLI hangs after processing batch commands and never exits naturally, requiring users to manually terminate with Ctrl+C. This affects user experience and prevents automated usage.
10+
11+
## Root Cause Analysis
12+
13+
### 1. Task Completion Detection Issues
14+
15+
- BatchProcessor relies on event-driven completion detection
16+
- Simple Q&A tasks may not emit expected completion events
17+
- Complex timeout logic has edge cases
18+
- No fallback for non-tool response scenarios
19+
20+
### 2. MCP Server Disposal Problems
21+
22+
- Stdio child processes (dotnet, npx) don't terminate cleanly
23+
- 3-second disposal timeout insufficient for some processes
24+
- No force-kill mechanism for stubborn processes
25+
- Incomplete cleanup of event listeners and resources
26+
27+
### 3. Event Loop Prevention
28+
29+
- Timers, intervals, and event listeners keep process alive
30+
- No guaranteed exit mechanism
31+
- Async operations may hang indefinitely
32+
- Missing force-exit as last resort
33+
34+
## Solution Architecture
35+
36+
### Phase 1: Task Completion Detection Enhancement
37+
38+
**File: `src/cli/commands/batch.ts`**
39+
40+
#### Changes:
41+
42+
1. **Response-based Completion Detection**
43+
44+
- Detect completion when meaningful response is generated
45+
- Don't rely solely on task events for simple Q&A
46+
- Add completion heuristics for different response types
47+
48+
2. **Improved Timeout Logic**
49+
50+
- Simplify sliding timeout implementation
51+
- Add response completion triggers
52+
- Better handling of non-interactive responses
53+
54+
3. **Fallback Completion**
55+
- Detect when response generation completes
56+
- Set maximum response time limits
57+
- Auto-complete for information queries
58+
59+
#### Implementation:
60+
61+
```typescript
62+
// Enhanced completion detection
63+
private detectResponseCompletion(response: string): boolean {
64+
// Detect complete responses for information queries
65+
const informationIndicators = [
66+
'Available MCP servers',
67+
'Connected MCP servers',
68+
'No MCP servers',
69+
'I have access to'
70+
];
71+
72+
return informationIndicators.some(indicator =>
73+
response.toLowerCase().includes(indicator.toLowerCase())
74+
);
75+
}
76+
77+
// Response-triggered completion
78+
private setupResponseCompletionDetection(task: Task): void {
79+
let responseBuffer = '';
80+
81+
task.on('message', (event) => {
82+
if (event.action === 'response') {
83+
responseBuffer += event.message?.text || '';
84+
85+
// Check if response appears complete
86+
if (this.detectResponseCompletion(responseBuffer)) {
87+
// Trigger completion after brief delay
88+
setTimeout(() => {
89+
this.completeTask('Response completion detected');
90+
}, 1000);
91+
}
92+
}
93+
});
94+
}
95+
```
96+
97+
### Phase 2: MCP Disposal Enhancement
98+
99+
**Files: `src/cli/services/CLIMcpService.ts`, `src/cli/connections/BaseMcpConnection.ts`**
100+
101+
#### Changes:
102+
103+
1. **Force Process Termination**
104+
105+
- Add SIGTERM followed by SIGKILL for stdio processes
106+
- Increase disposal timeout to 5 seconds
107+
- Track child process PIDs for forced cleanup
108+
109+
2. **Enhanced Connection Cleanup**
110+
111+
- Properly close all transport connections
112+
- Clear all event listeners and timers
113+
- Force-dispose stubborn connections
114+
115+
3. **Disposal Monitoring**
116+
- Log disposal progress for debugging
117+
- Track which processes are hanging
118+
- Provide disposal status feedback
119+
120+
#### Implementation:
121+
122+
```typescript
123+
// Enhanced stdio process termination
124+
async forceTerminateStdioProcess(): Promise<void> {
125+
const transport = this.transport as StdioClientTransport;
126+
const childProcess = transport.process;
127+
128+
if (childProcess && !childProcess.killed) {
129+
// Graceful termination first
130+
childProcess.kill('SIGTERM');
131+
132+
// Wait up to 2 seconds for graceful exit
133+
await new Promise((resolve) => {
134+
const timeout = setTimeout(() => {
135+
// Force kill if still running
136+
if (!childProcess.killed) {
137+
childProcess.kill('SIGKILL');
138+
}
139+
resolve(void 0);
140+
}, 2000);
141+
142+
childProcess.on('exit', () => {
143+
clearTimeout(timeout);
144+
resolve(void 0);
145+
});
146+
});
147+
}
148+
}
149+
150+
// Enhanced disposal with force cleanup
151+
async dispose(): Promise<void> {
152+
const disposalTimeout = 5000; // 5 seconds
153+
154+
const disposalPromise = Promise.allSettled([
155+
...Array.from(this.connections.keys()).map(id =>
156+
this.forceDisconnectServer(id)
157+
)
158+
]);
159+
160+
// Race disposal against timeout
161+
const timeoutPromise = new Promise((_, reject) =>
162+
setTimeout(() => reject(new Error('Disposal timeout')), disposalTimeout)
163+
);
164+
165+
try {
166+
await Promise.race([disposalPromise, timeoutPromise]);
167+
} catch (error) {
168+
console.warn('Disposal timeout, forcing cleanup...');
169+
// Force cleanup will happen in finally block
170+
} finally {
171+
// Clear all remaining references
172+
this.connections.clear();
173+
this.healthCheckers.clear();
174+
this.isDisposed = true;
175+
}
176+
}
177+
```
178+
179+
### Phase 3: Process Exit Enforcement
180+
181+
**File: `src/cli/index.ts`**
182+
183+
#### Changes:
184+
185+
1. **Guaranteed Exit Mechanism**
186+
187+
- Force exit after maximum operation time
188+
- Clean shutdown with fallback to process.exit()
189+
- Exit code management for different scenarios
190+
191+
2. **Event Loop Cleanup**
192+
193+
- Clear all timers and intervals
194+
- Remove event listeners
195+
- Close remaining handles
196+
197+
3. **Exit Coordination**
198+
- Coordinate between task completion and MCP disposal
199+
- Ensure proper cleanup order
200+
- Log exit process for debugging
201+
202+
#### Implementation:
203+
204+
```typescript
205+
// Global exit enforcement
206+
private setupGlobalExitEnforcement(): void {
207+
const maxExecutionTime = 60000; // 1 minute max for any operation
208+
209+
const forceExitTimer = setTimeout(() => {
210+
console.warn('Force exiting due to execution timeout');
211+
process.exit(1);
212+
}, maxExecutionTime);
213+
214+
// Clear timer on normal completion
215+
process.on('beforeExit', () => {
216+
clearTimeout(forceExitTimer);
217+
});
218+
}
219+
220+
// Enhanced cleanup sequence
221+
private async performComprehensiveCleanup(): Promise<void> {
222+
const cleanupTasks = [
223+
this.disposeMcpServices(),
224+
this.clearGlobalTimers(),
225+
this.removeEventListeners(),
226+
this.closeOpenHandles()
227+
];
228+
229+
// Execute cleanup with timeout
230+
try {
231+
await Promise.race([
232+
Promise.allSettled(cleanupTasks),
233+
new Promise((_, reject) =>
234+
setTimeout(() => reject(new Error('Cleanup timeout')), 5000)
235+
)
236+
]);
237+
} catch (error) {
238+
console.warn('Cleanup timeout, forcing exit');
239+
} finally {
240+
// Guarantee exit
241+
setTimeout(() => process.exit(0), 1000);
242+
}
243+
}
244+
```
245+
246+
### Phase 4: Integration and Testing
247+
248+
**Files: Multiple**
249+
250+
#### Changes:
251+
252+
1. **Coordinated Shutdown**
253+
254+
- Integrate all cleanup phases
255+
- Proper error handling and logging
256+
- Graceful degradation for cleanup failures
257+
258+
2. **Debug Logging**
259+
260+
- Add comprehensive debug logging for exit process
261+
- Track hanging operations
262+
- Identify remaining event loop items
263+
264+
3. **Testing Framework**
265+
- Add tests for hanging scenarios
266+
- Verify clean exit in various conditions
267+
- Performance testing for disposal operations
268+
269+
## Implementation Steps
270+
271+
### Step 1: Task Completion Enhancement
272+
273+
- [ ] Implement response-based completion detection
274+
- [ ] Add completion heuristics for information queries
275+
- [ ] Simplify timeout logic with fallback completion
276+
- [ ] Test with various question types
277+
278+
### Step 2: MCP Disposal Enhancement
279+
280+
- [ ] Add force termination for stdio processes
281+
- [ ] Implement comprehensive connection cleanup
282+
- [ ] Increase disposal timeouts and add monitoring
283+
- [ ] Test with multiple MCP server types
284+
285+
### Step 3: Process Exit Enforcement
286+
287+
- [ ] Implement guaranteed exit mechanisms
288+
- [ ] Add event loop cleanup utilities
289+
- [ ] Coordinate shutdown sequence
290+
- [ ] Add comprehensive debug logging
291+
292+
### Step 4: Integration Testing
293+
294+
- [ ] Test complete fix with original hanging scenario
295+
- [ ] Verify clean exit in various conditions
296+
- [ ] Performance test disposal operations
297+
- [ ] Add automated tests for regression prevention
298+
299+
## Acceptance Criteria
300+
301+
### Functional Requirements
302+
303+
- [ ] CLI exits cleanly after batch command completion
304+
- [ ] No hanging processes or child processes remain
305+
- [ ] All MCP servers disconnect properly
306+
- [ ] Exit codes reflect operation success/failure
307+
308+
### Performance Requirements
309+
310+
- [ ] Total exit time < 10 seconds under normal conditions
311+
- [ ] Force exit triggers within 1 minute maximum
312+
- [ ] MCP disposal completes within 5 seconds
313+
- [ ] No memory leaks or resource leaks
314+
315+
### Reliability Requirements
316+
317+
- [ ] 100% exit success rate for batch commands
318+
- [ ] Graceful handling of disposal failures
319+
- [ ] Proper cleanup even when services hang
320+
- [ ] Comprehensive error logging and debugging info
321+
322+
## Testing Strategy
323+
324+
### Unit Tests
325+
326+
- Task completion detection logic
327+
- MCP disposal mechanisms
328+
- Process cleanup utilities
329+
- Timeout and fallback logic
330+
331+
### Integration Tests
332+
333+
- End-to-end batch command execution
334+
- Multiple MCP server scenarios
335+
- Error condition handling
336+
- Resource cleanup verification
337+
338+
### Regression Tests
339+
340+
- Original hanging scenario
341+
- Various MCP server configurations
342+
- Different task types and responses
343+
- Timeout and error conditions
344+
345+
## Risk Mitigation
346+
347+
### Potential Issues
348+
349+
- Force-killing processes may cause data loss
350+
- Aggressive timeouts might interrupt legitimate operations
351+
- Changes might affect interactive mode behavior
352+
353+
### Mitigation Strategies
354+
355+
- Graduated escalation (graceful -> force -> kill)
356+
- Separate timeout values for different operations
357+
- Extensive testing in both batch and interactive modes
358+
- Rollback plan for compatibility issues
359+
360+
## Success Metrics
361+
362+
### Primary Metrics
363+
364+
- CLI hanging incidents: 0%
365+
- Clean exit success rate: 100%
366+
- User-reported hanging issues: 0
367+
368+
### Secondary Metrics
369+
370+
- Average exit time: < 3 seconds
371+
- MCP disposal success rate: > 99%
372+
- Resource cleanup success: 100%
373+
374+
This comprehensive fix addresses all identified root causes while maintaining system reliability and user experience.

0 commit comments

Comments
 (0)