-
Notifications
You must be signed in to change notification settings - Fork 513
fix(server): prevent Windows memory leak from orphaned Node.js processes #711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.14.0rc
Are you sure you want to change the base?
fix(server): prevent Windows memory leak from orphaned Node.js processes #711
Conversation
On Windows, child processes spawned via cmd.exe or sh.exe don't terminate when the parent process is killed (unlike Unix where SIGTERM propagates). This caused dev servers and MCP processes to accumulate, consuming 30GB+ memory over time. Changes: - Add tree-kill integration for cross-platform process tree termination - Add cleanupOrphanedProcesses() to kill lingering processes when features complete - Add startup cleanup to catch orphans from previous runs - Fix PowerShell pattern matching (no backslash escaping needed for -like) - Fix PowerShell script syntax (semicolons required for single-line execution) - Apply tree-kill to subprocess.ts, dev-server-service, codex-app-server-service - Add completion tracking for graceful feature cleanup on stop - Add pid to mock process in tests for tree-kill condition Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Create completionPromise and signalCompletion in acquireRunningFeature - Call signalCompletion in finally blocks of all execute methods - Fixes CodeRabbit review: completionPromise was defined but never wired Co-Authored-By: Claude Opus 4.5 <[email protected]>
PowerShell returns the literal string "null" when no processes match, which JSON.parse accepts as valid JSON (returning null). Added check for this case to prevent null reference errors. Co-Authored-By: Claude Opus 4.5 <[email protected]>
📝 WalkthroughWalkthroughAdds cross-platform process-tree management and Windows-specific orphan cleanup; exposes AutoModeService.initialize() for startup cleanup; adds per-feature completion signaling and an enhanced stopFeature API that can wait (with timeout) for cleanup before releasing features. Changes
Sequence Diagram(s)sequenceDiagram
actor Server
participant AutoModeService
participant ProcessCleanup
participant RunningFeature
Server->>AutoModeService: initialize()
activate AutoModeService
AutoModeService->>ProcessCleanup: cleanupOrphanedProcesses()
ProcessCleanup-->>AutoModeService: cleanup complete
deactivate AutoModeService
Server->>AutoModeService: startFeature(featureId)
activate AutoModeService
AutoModeService->>RunningFeature: create (completionPromise)
RunningFeature->>RunningFeature: run feature
RunningFeature-->>AutoModeService: signalCompletion()
AutoModeService-->>Server: feature finished
deactivate AutoModeService
sequenceDiagram
actor Client
participant StopFeatureAPI
participant AutoModeService
participant RunningFeature
participant ProcessCleanup
Client->>StopFeatureAPI: POST /stop-feature (waitForCleanup=true)
activate StopFeatureAPI
StopFeatureAPI->>AutoModeService: stopFeature(featureId, true, 10000)
activate AutoModeService
AutoModeService->>RunningFeature: abort execution
AutoModeService->>RunningFeature: await completionPromise (timeout)
alt completion within timeout
RunningFeature-->>AutoModeService: resolved
AutoModeService->>ProcessCleanup: cleanup resources
ProcessCleanup-->>AutoModeService: done
AutoModeService-->>StopFeatureAPI: true
else timeout / not resolved
AutoModeService->>AutoModeService: force release running feature
AutoModeService-->>StopFeatureAPI: false
end
deactivate AutoModeService
StopFeatureAPI-->>Client: 200/timeout response
deactivate StopFeatureAPI
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@apps/server/src/services/auto-mode-service.ts`:
- Around line 103-120: The PowerShell command in psCommand interpolates
workDir-derived strings directly (normalizedDir, unixStyleDir) which allows
single quotes or wildcard chars to break or over-match the -like filters; fix by
escaping those values before embedding: implement an escape routine used to
produce escapedNormalizedDir and escapedUnixStyleDir that doubles single quotes
and escapes PowerShell wildcard/special characters (*, ?, [, ]) (e.g., prefix
with backtick or otherwise neutralize them for -like) and then use those escaped
variables in psCommand (retain processFilter and the Get-CimInstance |
Where-Object ... construction).
🧹 Nitpick comments (5)
apps/server/src/services/dev-server-service.ts (1)
597-606: Consider usingforceKillProcessTreeon Unix for robustness.The current implementation uses
SIGTERMon Unix, which may leave processes running if they ignore the signal. TheforceKillProcessTreeutility was added specifically to handle escalation fromSIGTERMtoSIGKILL. This would provide more consistent cleanup behavior across platforms.♻️ Optional: Use forceKillProcessTree for consistent cleanup
-import { killProcessTree } from '@automaker/platform'; +import { killProcessTree, forceKillProcessTree } from '@automaker/platform';// Kill the process tree (important on Windows where child processes aren't auto-terminated) if (server.process && !server.process.killed && server.process.pid) { if (IS_WINDOWS) { // Use tree-kill to terminate the entire process tree // This prevents orphaned child processes (e.g., Next.js start-server.js) await killProcessTree(server.process.pid); } else { - server.process.kill('SIGTERM'); + // Use forceKillProcessTree for graceful termination with escalation + await forceKillProcessTree(server.process.pid, 3000); } }libs/platform/src/subprocess.ts (1)
261-272: Consider reusing thekillProcesshelper to reduce duplication.The abort handler in
spawnProcessduplicates the Windows/Unix conditional logic that's already encapsulated in thekillProcesshelper defined earlier in this file.♻️ Optional: Reuse killProcess helper
if (abortController) { abortHandler = () => { cleanupAbortListener(); - // Use tree-kill on Windows to terminate entire process tree - if (IS_WINDOWS && childProcess.pid) { - killProcessTree(childProcess.pid).catch((err) => { - console.error('[SubprocessManager] Error killing process tree:', err); - }); - } else { - childProcess.kill('SIGTERM'); + if (childProcess.pid) { + killProcess().catch((err) => { + console.error('[SubprocessManager] Error killing process:', err); + }); } reject(new Error('Process aborted')); };Note: This would require moving
killProcessto be defined outside the generator function or passingchildProcessas a parameter.apps/server/tests/unit/services/dev-server-service.test.ts (1)
271-277: Platform-specific assertion may not exercise Windows path in CI.The test correctly branches on
process.platform, but if CI runs on Linux/macOS, the Windows code path withkillProcessTreewon't be exercised. Consider adding a dedicated test that mocksprocess.platformto ensure the Windows branch is covered.♻️ Add explicit Windows path test
it('should use killProcessTree on Windows', async () => { // Temporarily override platform check const originalPlatform = process.platform; Object.defineProperty(process, 'platform', { value: 'win32' }); vi.resetModules(); // Re-import to get fresh module with Windows detection const { getDevServerService } = await import('@/services/dev-server-service.js'); // ... test setup and assertions for killProcessTree // Restore Object.defineProperty(process, 'platform', { value: originalPlatform }); });Alternatively, consider refactoring the service to accept the platform as a dependency for easier testing.
apps/server/src/services/auto-mode-service.ts (1)
2095-2109: Clear the timeout afterPromise.raceto avoid stray timers.
Even if cleanup finishes early, the timeout still fires later.♻️ Suggested refactor
- try { - await Promise.race([ - running.completionPromise, - new Promise<void>((_, reject) => - setTimeout(() => reject(new Error('Cleanup timeout')), timeoutMs) - ), - ]); + let timeoutId: NodeJS.Timeout | undefined; + const timeoutPromise = new Promise<void>((_, reject) => { + timeoutId = setTimeout(() => reject(new Error('Cleanup timeout')), timeoutMs); + }); + try { + await Promise.race([running.completionPromise, timeoutPromise]); logger.info(`Feature ${featureId} cleanup completed gracefully`); } catch (error) { // Timeout or other error - force release logger.warn(`Feature ${featureId} cleanup timed out or failed, forcing removal:`, error); + } finally { + if (timeoutId) clearTimeout(timeoutId); }apps/server/src/index.ts (1)
262-265: Preferlogger.erroroverconsole.errorfor consistency.
Keeps log routing/levels consistent with the rest of the server.♻️ Suggested tweak
-autoModeService.initialize().catch((err) => { - console.error('[AutoModeService] Initialization error:', err); -}); +autoModeService.initialize().catch((err) => { + logger.error('[AutoModeService] Initialization error:', err); +});
Re: PowerShell
|
Address CodeRabbit review feedback: - Clear the timeout timer after Promise.race resolves to avoid stray timers - Use logger.error instead of console.error for consistency Co-Authored-By: Claude Opus 4.5 <[email protected]>
Summary
Fixes a critical Windows memory leak where Node.js processes spawned during auto mode persist indefinitely, consuming 100% memory over time. Even after stopping features, orphaned processes remain because:
cmd /corsh.execreate process trees where killing the parent doesn't kill childrenstopFeature()didn't wait for cleanup: Calledabort()and immediately removed from Map, leaving async streams runningChanges
New: Cross-platform process tree management (
libs/platform/src/process-utils.ts)killProcessTree(pid, signal)- Usestree-killpackage to terminate entire process treeswaitForProcessExit(pid, timeoutMs)- Polls for process exit with timeoutforceKillProcessTree(pid, gracePeriodMs)- SIGTERM → SIGKILL escalation (Windows uses SIGKILL immediately)Fix: Graceful feature cleanup (
apps/server/src/services/auto-mode-service.ts)completionPromise/signalCompletiontoRunningFeaturefor tracking when execution actually finishesstopFeature()now waits for completion with configurable timeout before force-releasingcleanupOrphanedProcesses()for per-feature Node.js process cleanup on WindowscleanupOrphanedProcessesOnStartup()to kill leftover processes from previous sessionsinitialize()method called at server startupFix: Windows process termination in services
dev-server-service.ts- UseskillProcessTreeinstead ofprocess.kill('SIGTERM')on Windowscodex-app-server-service.ts- UseskillProcessTreeinstead ofprocess.kill('SIGTERM')on Windowssubprocess.ts- UseskillProcessTreefor abort handling and timeout kills on WindowsOther
stop-feature.tsroute accepts optionalwaitForCleanupparameterapps/server/src/index.tscallsautoModeService.initialize()at startuptree-killdependency to@automaker/platformkillProcessTreein dev-server-service testsTest plan
dev-server-service.test.tsupdated withkillProcessTreemock)Platform notes
tree-killwith SIGKILL (force) since SIGTERM isn't reliably supported"null"string output, empty results)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.