|
| 1 | +# CLI Process Exit Cleanup - Architectural Solution |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +The reviewer's feedback is **absolutely correct**. The `process.exit(0)` call at line 354 in `src/cli/cli-entry.ts` is a band-aid solution that was added when the CLI was hanging and not exiting naturally. This prevents proper cleanup of resources and is technically debt that needs to be addressed. |
| 6 | + |
| 7 | +## Root Cause Validation |
| 8 | + |
| 9 | +### Evidence of Band-Aid Solution |
| 10 | + |
| 11 | +1. **Banner text**: "Press Ctrl+C twice to **force exit**" (banner.ts:23) |
| 12 | +2. **Existing force exit logic**: Timeout mechanism already exists (index.ts:210) |
| 13 | +3. **Test comments**: "Complete the task to avoid hanging test" (BatchProcessor.test.ts:250) |
| 14 | +4. **Recovery mentions**: "hanging operations" (RecoveryManager.ts:72) |
| 15 | + |
| 16 | +### Resources Keeping Event Loop Alive |
| 17 | + |
| 18 | +Our analysis identified exactly what's preventing natural process exit: |
| 19 | + |
| 20 | +```typescript |
| 21 | +// setInterval timers keeping process alive: |
| 22 | +- SessionManager.autoSaveTimer (line 330) |
| 23 | +- CLIMcpService.healthChecker (line 507) |
| 24 | +- MemoryOptimizer.gcInterval (line 51) |
| 25 | +- CLIUIService.spinner intervals (line 378) |
| 26 | + |
| 27 | +// EventEmitters with active listeners: |
| 28 | +- SessionManager extends EventEmitter |
| 29 | +- BatchProcessor extends EventEmitter |
| 30 | +- NonInteractiveModeService extends EventEmitter |
| 31 | + |
| 32 | +// Active network connections: |
| 33 | +- SSE/WebSocket MCP connections |
| 34 | +- StdioMcpConnection child processes |
| 35 | + |
| 36 | +// Various setTimeout calls throughout codebase |
| 37 | +``` |
| 38 | + |
| 39 | +## Recommended Architecture |
| 40 | + |
| 41 | +### Phase 1: Create Centralized CleanupManager |
| 42 | + |
| 43 | +```mermaid |
| 44 | +graph TD |
| 45 | + A[CLI Entry Point] --> B[CleanupManager.performShutdown] |
| 46 | + B --> C[Timer Cleanup] |
| 47 | + B --> D[MCP Service Cleanup] |
| 48 | + B --> E[Performance Monitoring Cleanup] |
| 49 | + B --> F[Event Listener Cleanup] |
| 50 | +
|
| 51 | + C --> C1[Clear setInterval timers] |
| 52 | + C --> C2[Clear setTimeout timers] |
| 53 | + C --> C3[Cancel pending promises] |
| 54 | +
|
| 55 | + D --> D1[GlobalCLIMcpService.dispose] |
| 56 | + D --> D2[Force close SSE connections] |
| 57 | + D --> D3[SIGTERM → SIGKILL child processes] |
| 58 | +
|
| 59 | + E --> E1[memoryOptimizer.stopMonitoring] |
| 60 | + E --> E2[Clear GC intervals] |
| 61 | + E --> E3[Stop performance timers] |
| 62 | +
|
| 63 | + F --> F1[Remove all EventEmitter listeners] |
| 64 | + F --> F2[Close browser sessions] |
| 65 | + F --> F3[Emergency recovery cleanup] |
| 66 | +``` |
| 67 | + |
| 68 | +### Phase 2: Implementation Strategy |
| 69 | + |
| 70 | +```typescript |
| 71 | +// New CleanupManager service |
| 72 | +export class CleanupManager { |
| 73 | + private static instance: CleanupManager |
| 74 | + private cleanupTasks: Array<() => Promise<void>> = [] |
| 75 | + private isShuttingDown = false |
| 76 | + |
| 77 | + static getInstance(): CleanupManager { |
| 78 | + if (!this.instance) { |
| 79 | + this.instance = new CleanupManager() |
| 80 | + } |
| 81 | + return this.instance |
| 82 | + } |
| 83 | + |
| 84 | + registerCleanupTask(task: () => Promise<void>): void { |
| 85 | + this.cleanupTasks.push(task) |
| 86 | + } |
| 87 | + |
| 88 | + async performShutdown(timeoutMs: number = 10000): Promise<void> { |
| 89 | + if (this.isShuttingDown) return |
| 90 | + this.isShuttingDown = true |
| 91 | + |
| 92 | + getCLILogger().debug("[CleanupManager] Starting graceful shutdown...") |
| 93 | + |
| 94 | + try { |
| 95 | + // Execute all cleanup tasks with timeout |
| 96 | + await Promise.race([ |
| 97 | + Promise.allSettled(this.cleanupTasks.map((task) => task())), |
| 98 | + new Promise((_, reject) => setTimeout(() => reject(new Error("Cleanup timeout")), timeoutMs)), |
| 99 | + ]) |
| 100 | + |
| 101 | + getCLILogger().debug("[CleanupManager] Graceful shutdown completed") |
| 102 | + |
| 103 | + // Allow event loop to drain naturally |
| 104 | + setImmediate(() => { |
| 105 | + if (!process.exitCode) { |
| 106 | + process.exitCode = 0 |
| 107 | + } |
| 108 | + }) |
| 109 | + } catch (error) { |
| 110 | + getCLILogger().warn("[CleanupManager] Cleanup timeout, forcing exit:", error) |
| 111 | + // Last resort - but with proper logging |
| 112 | + setTimeout(() => process.exit(process.exitCode || 0), 1000) |
| 113 | + } |
| 114 | + } |
| 115 | +} |
| 116 | +``` |
| 117 | + |
| 118 | +### Phase 3: Replace process.exit(0) with Proper Cleanup |
| 119 | + |
| 120 | +**Current problematic code (cli-entry.ts:352-354):** |
| 121 | + |
| 122 | +```typescript |
| 123 | +// Exit successfully after non-interactive execution completes |
| 124 | +getCLILogger().debug("[cli-entry] Non-interactive execution completed successfully, exiting...") |
| 125 | +process.exit(0) // ❌ BAND-AID - Prevents cleanup! |
| 126 | +``` |
| 127 | + |
| 128 | +**Proposed replacement:** |
| 129 | + |
| 130 | +```typescript |
| 131 | +// Perform proper cleanup before exit |
| 132 | +getCLILogger().debug("[cli-entry] Non-interactive execution completed, performing cleanup...") |
| 133 | + |
| 134 | +try { |
| 135 | + const cleanupManager = CleanupManager.getInstance() |
| 136 | + |
| 137 | + // Register essential cleanup tasks |
| 138 | + cleanupManager.registerCleanupTask(async () => { |
| 139 | + if (options.mcpAutoConnect) { |
| 140 | + const { GlobalCLIMcpService } = await import("./services/GlobalCLIMcpService") |
| 141 | + const globalMcpService = GlobalCLIMcpService.getInstance() |
| 142 | + await globalMcpService.dispose() |
| 143 | + } |
| 144 | + }) |
| 145 | + |
| 146 | + cleanupManager.registerCleanupTask(async () => { |
| 147 | + memoryOptimizer.stopMonitoring() |
| 148 | + }) |
| 149 | + |
| 150 | + cleanupManager.registerCleanupTask(async () => { |
| 151 | + // Clear any remaining timers/intervals |
| 152 | + // This would be implemented in each service |
| 153 | + }) |
| 154 | + |
| 155 | + // Perform graceful shutdown |
| 156 | + await cleanupManager.performShutdown() |
| 157 | + |
| 158 | + getCLILogger().debug("[cli-entry] Cleanup completed successfully") |
| 159 | +} catch (error) { |
| 160 | + getCLILogger().error("[cli-entry] Cleanup failed:", error) |
| 161 | + process.exitCode = 1 |
| 162 | +} |
| 163 | + |
| 164 | +// Process will exit naturally once event loop drains |
| 165 | +``` |
| 166 | + |
| 167 | +## Implementation Priority |
| 168 | + |
| 169 | +### High Priority (Immediate) |
| 170 | + |
| 171 | +1. **Create CleanupManager service** - Centralized cleanup coordination |
| 172 | +2. **Replace process.exit(0)** - Remove the band-aid solution |
| 173 | +3. **Add MCP disposal cleanup** - Ensure GlobalCLIMcpService.dispose() is called |
| 174 | +4. **Add timer cleanup** - Clear setInterval/setTimeout keeping process alive |
| 175 | + |
| 176 | +### Medium Priority |
| 177 | + |
| 178 | +1. **Enhanced MCP disposal** - Force terminate stubborn child processes |
| 179 | +2. **EventEmitter cleanup** - Remove all listeners before exit |
| 180 | +3. **Browser session cleanup** - Close any open browser sessions |
| 181 | +4. **Performance monitoring cleanup** - Stop all monitoring services |
| 182 | + |
| 183 | +### Low Priority |
| 184 | + |
| 185 | +1. **Comprehensive testing** - Regression tests for hanging scenarios |
| 186 | +2. **Debug logging enhancement** - Better visibility into cleanup process |
| 187 | +3. **Timeout tuning** - Optimize cleanup timeouts for performance |
| 188 | + |
| 189 | +## Benefits of This Approach |
| 190 | + |
| 191 | +### ✅ Technical Benefits |
| 192 | + |
| 193 | +- **Removes technical debt** - Eliminates the band-aid `process.exit(0)` |
| 194 | +- **Prevents resource leaks** - Proper cleanup of all resources |
| 195 | +- **Enables natural exit** - Process terminates when event loop drains |
| 196 | +- **Maintains reliability** - Fallback force-exit if cleanup hangs |
| 197 | + |
| 198 | +### ✅ User Experience Benefits |
| 199 | + |
| 200 | +- **Faster CLI response** - No hanging while waiting for cleanup |
| 201 | +- **Cleaner automation** - Reliable exit codes for scripts |
| 202 | +- **Better error handling** - Graceful shutdown with proper logging |
| 203 | +- **Reduced memory usage** - No lingering processes or connections |
| 204 | + |
| 205 | +## Risk Mitigation |
| 206 | + |
| 207 | +### Potential Risks |
| 208 | + |
| 209 | +- Cleanup might take too long in some scenarios |
| 210 | +- Some services might resist graceful shutdown |
| 211 | +- Changes could affect interactive mode |
| 212 | + |
| 213 | +### Mitigation Strategies |
| 214 | + |
| 215 | +- **Graduated timeouts** - Quick cleanup with fallback force-exit |
| 216 | +- **Comprehensive testing** - Test both batch and interactive modes |
| 217 | +- **Rollback plan** - Keep existing logic as emergency fallback |
| 218 | +- **Monitoring** - Log cleanup performance to identify issues |
| 219 | + |
| 220 | +## Success Criteria |
| 221 | + |
| 222 | +- ✅ CLI exits naturally without `process.exit(0)` |
| 223 | +- ✅ All MCP connections disposed properly |
| 224 | +- ✅ No hanging timers or intervals |
| 225 | +- ✅ Clean exit codes (0 for success, 1 for error) |
| 226 | +- ✅ Total cleanup time < 10 seconds |
| 227 | +- ✅ No regression in interactive mode functionality |
| 228 | + |
| 229 | +This architectural approach transforms the band-aid solution into a robust, maintainable cleanup system that addresses the root causes of the hanging issue. |
0 commit comments