-
Notifications
You must be signed in to change notification settings - Fork 2.5k
fix: improve IPC stability in headless Docker environments with xvfb #7815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add robust error handling and recovery mechanisms to IPC server/client - Implement graceful shutdown handling for IPC connections - Add connection timeouts and retry logic for headless environments - Enhance logging for debugging in virtual display scenarios - Handle SIGTERM, SIGINT, and SIGHUP signals properly - Add socket cleanup and directory creation for Docker containers - Implement reconnection logic with exponential backoff This fix addresses the issue where VSCode gets killed (signal 9) when running RooCode in headless Docker environments with xvfb during IPC operations. The improvements make IPC communication more resilient to the challenges of virtual display environments. Fixes #7814
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed my own code and found issues. Shocking, I know.
| // Connection timeout for headless environments (ms) | ||
| connectionTimeout: 30000, | ||
| // Enable verbose logging in headless mode | ||
| verboseLogging: process.env.DISPLAY === ":99" || process.env.XVFB_DISPLAY !== undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional? The hardcoded check for DISPLAY === ":99" is too specific. Different xvfb setups might use different display numbers (:0, :1, :99, etc.). Consider:
| verboseLogging: process.env.DISPLAY === ":99" || process.env.XVFB_DISPLAY !== undefined, | |
| verboseLogging: process.env.DISPLAY?.startsWith(':') || process.env.XVFB_DISPLAY !== undefined, |
| // Socket timeout for headless environments (ms) | ||
| socketTimeout: 30000, | ||
| // Enable verbose logging in headless mode | ||
| verboseLogging: process.env.DISPLAY === ":99" || process.env.XVFB_DISPLAY !== undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional? Same issue here - the display detection is too specific. Should match any virtual display, not just :99.
| private readonly _log: (...args: unknown[]) => void | ||
| private readonly _clients: Map<string, Socket> | ||
| private _shutdownInProgress = false | ||
| private _connectionTimeouts: Map<string, NodeJS.Timeout> = new Map() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we approach this differently to improve maintainability? The connection timeout Map could accumulate entries if clients connect/disconnect rapidly. Consider adding a periodic cleanup or limiting the map size.
| }) | ||
| try { | ||
| // Configure socket for headless environments | ||
| socket.setKeepAlive(true, 5000) // Keep-alive every 5 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These magic numbers (5000ms for keep-alive) should be named constants at the top of the file for better maintainability:
| socket.setKeepAlive(true, 5000) // Keep-alive every 5 seconds | |
| const SOCKET_KEEPALIVE_INTERVAL = 5000 // ms | |
| socket.setKeepAlive(true, SOCKET_KEEPALIVE_INTERVAL) |
| let apiInstance: API | undefined | ||
|
|
||
| // Get the API instance from the extension exports | ||
| const extension = vscode.extensions.getExtension(Package.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this approach reliable? Getting the API instance from extension exports during deactivation might not work if the extension is already partially deactivated. Consider storing the API instance as a module-level variable when it's created in activate().
| } | ||
| break | ||
| try { | ||
| const ipc = (this.ipc = new IpcServer(socketPath, this.log)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make these timeout and retry values configurable via environment variables? This would allow different deployment scenarios to tune these values without code changes:
| const ipc = (this.ipc = new IpcServer(socketPath, this.log)) | |
| maxRetries: parseInt(process.env.IPC_MAX_RETRIES || '10'), | |
| retryDelay: parseInt(process.env.IPC_RETRY_DELAY || '1000'), | |
| connectionTimeout: parseInt(process.env.IPC_CONNECTION_TIMEOUT || '30000'), |
|
The issue needs some info and then scoping |
Summary
This PR addresses Issue #7814 where VSCode gets killed (signal 9) when running RooCode in headless Docker environments with xvfb during IPC operations.
Problem
When running RooCode in a headless Docker environment (Ubuntu 22.04 with VSCode and RooCode) using xvfb to create a virtual display, the VSCode process was getting killed with signal 9 when IPC calls were made. This wasn't deterministic - sometimes a few calls would work before the crash, sometimes the kill signal was sent just as the task started.
Solution
This fix implements comprehensive improvements to make IPC communication more resilient in virtual display environments:
Key Changes
Headless Environment Detection & Configuration
Robust Connection Management
Graceful Shutdown Handling
Error Recovery Mechanisms
Enhanced Logging
Testing
Future Improvements
As noted in the review, future enhancements could include:
Related Issue
Fixes #7814
Feedback Welcome
This PR attempts to address the IPC stability issues in headless environments. Feedback and testing in actual Docker environments with xvfb would be greatly appreciated!
Important
Improves IPC stability in headless Docker environments by enhancing connection management, error handling, and logging in
ipc-client.ts,ipc-server.ts, andextension.ts.ipc-client.tsandipc-server.ts.ipc-client.tsandipc-server.ts.cleanup()method inapi.tsfor graceful shutdown of IPC server and clearing task maps.deactivate()inextension.tsto utilize the newcleanup()method for resource cleanup.This description was created by
for ba2077c. You can customize this summary. It will automatically update as commits are pushed.