|
| 1 | +# Orphan Process Cleanup Implementation |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This document describes the implementation of a sophisticated orphan process cleanup mechanism for the Deepnote server starter that prevents terminating active servers from other VS Code windows. |
| 6 | + |
| 7 | +## Problem Statement |
| 8 | + |
| 9 | +Previously, the cleanup logic in `cleanupOrphanedProcesses()` would force-kill **every** process matching "deepnote_toolkit server", which could terminate active servers from other VS Code windows, causing disruption to users working in multiple windows. |
| 10 | + |
| 11 | +## Solution |
| 12 | + |
| 13 | +The new implementation uses a lock file system combined with parent process verification to only kill genuine orphan processes. |
| 14 | + |
| 15 | +## Key Components |
| 16 | + |
| 17 | +### 1. Session Management |
| 18 | + |
| 19 | +- **Session ID**: Each VS Code window instance generates a unique session ID using `generateUuid()` |
| 20 | +- **Lock File Directory**: Lock files are stored in `${os.tmpdir()}/vscode-deepnote-locks/` |
| 21 | +- **Lock File Format**: JSON files named `server-{pid}.json` containing: |
| 22 | + ```typescript |
| 23 | + interface ServerLockFile { |
| 24 | + sessionId: string; // Unique ID for the VS Code window |
| 25 | + pid: number; // Process ID of the server |
| 26 | + timestamp: number; // When the server was started |
| 27 | + } |
| 28 | + ``` |
| 29 | + |
| 30 | +### 2. Lock File Lifecycle |
| 31 | + |
| 32 | +#### Creation |
| 33 | + |
| 34 | +- When a server starts successfully, a lock file is written with the server's PID and current session ID |
| 35 | +- Location: `writeLockFile()` called in `startServerImpl()` after the server process is spawned |
| 36 | + |
| 37 | +#### Deletion |
| 38 | + |
| 39 | +- Lock files are deleted when: |
| 40 | + 1. The server is explicitly stopped via `stopServerImpl()` |
| 41 | + 2. The extension is disposed and all servers are shut down |
| 42 | + 3. An orphaned process is successfully killed during cleanup |
| 43 | + |
| 44 | +### 3. Orphan Detection Logic |
| 45 | + |
| 46 | +The `isProcessOrphaned()` method checks if a process is truly orphaned by verifying its parent process: |
| 47 | + |
| 48 | +#### Unix/Linux/macOS |
| 49 | + |
| 50 | +```bash |
| 51 | +# Get parent process ID |
| 52 | +ps -o ppid= -p <pid> |
| 53 | + |
| 54 | +# Check if parent exists (using -o pid= to get only PID with no header) |
| 55 | +ps -p <ppid> -o pid= |
| 56 | +``` |
| 57 | + |
| 58 | +- If PPID is 1 (init/systemd), the process is orphaned |
| 59 | +- If parent process doesn't exist (empty stdout from `ps -o pid=`), the process is orphaned |
| 60 | +- The `-o pid=` flag ensures no header is printed, so empty output reliably indicates a missing process |
| 61 | + |
| 62 | +#### Windows |
| 63 | + |
| 64 | +```cmd |
| 65 | +# Get parent process ID |
| 66 | +wmic process where ProcessId=<pid> get ParentProcessId |
| 67 | +
|
| 68 | +# Check if parent exists |
| 69 | +tasklist /FI "PID eq <ppid>" /FO CSV /NH |
| 70 | +``` |
| 71 | + |
| 72 | +- If parent process doesn't exist or PPID is 0, the process is orphaned |
| 73 | + |
| 74 | +### 4. Cleanup Decision Flow |
| 75 | + |
| 76 | +When `cleanupOrphanedProcesses()` runs (at extension startup): |
| 77 | + |
| 78 | +1. **Find all deepnote_toolkit server processes** |
| 79 | + |
| 80 | + - Use `ps aux` (Unix) or `tasklist` (Windows) |
| 81 | + - Extract PIDs of matching processes |
| 82 | + |
| 83 | +2. **For each candidate PID:** |
| 84 | + |
| 85 | + a. **Check for lock file** |
| 86 | + |
| 87 | + - If lock file exists: |
| 88 | + |
| 89 | + - If session ID matches current session → **SKIP** (shouldn't happen at startup) |
| 90 | + - If session ID differs: |
| 91 | + - Check if process is orphaned |
| 92 | + - If orphaned → **KILL** |
| 93 | + - If not orphaned → **SKIP** (active in another window) |
| 94 | + |
| 95 | + - If no lock file exists: |
| 96 | + - Check if process is orphaned |
| 97 | + - If orphaned → **KILL** |
| 98 | + - If not orphaned → **SKIP** (might be from older version without lock files) |
| 99 | + |
| 100 | +3. **Kill orphaned processes** |
| 101 | + |
| 102 | + - Use `kill -9` (Unix) or `taskkill /F /T` (Windows) |
| 103 | + - Delete lock file after successful kill |
| 104 | + |
| 105 | +4. **Log all decisions** |
| 106 | + - Processes to kill: logged with reason |
| 107 | + - Processes to skip: logged with reason |
| 108 | + - Provides full audit trail for debugging |
| 109 | + |
| 110 | +## Code Changes |
| 111 | + |
| 112 | +### Modified Files |
| 113 | + |
| 114 | +- `src/kernels/deepnote/deepnoteServerStarter.node.ts` |
| 115 | + |
| 116 | +### New Imports |
| 117 | + |
| 118 | +```typescript |
| 119 | +import { IExtensionSyncActivationService } from '../../platform/activation/types'; |
| 120 | +import * as fs from 'fs-extra'; |
| 121 | +import * as os from 'os'; |
| 122 | +import * as path from '../../platform/vscode-path/path'; |
| 123 | +import { generateUuid } from '../../platform/common/uuid'; |
| 124 | +``` |
| 125 | + |
| 126 | +### New Class Members |
| 127 | + |
| 128 | +```typescript |
| 129 | +private readonly sessionId: string = generateUuid(); |
| 130 | +private readonly lockFileDir: string = path.join(os.tmpdir(), 'vscode-deepnote-locks'); |
| 131 | +``` |
| 132 | + |
| 133 | +### New Methods |
| 134 | + |
| 135 | +1. `initializeLockFileDirectory()` - Creates lock file directory |
| 136 | +2. `getLockFilePath(pid)` - Returns path to lock file for a PID |
| 137 | +3. `writeLockFile(pid)` - Creates lock file for a server process |
| 138 | +4. `readLockFile(pid)` - Reads lock file data |
| 139 | +5. `deleteLockFile(pid)` - Removes lock file |
| 140 | +6. `isProcessOrphaned(pid)` - Checks if process is orphaned by verifying parent |
| 141 | + |
| 142 | +### Modified Methods |
| 143 | + |
| 144 | +1. `constructor()` - Minimal initialization (dependency injection only) |
| 145 | +2. `activate()` - Initializes lock file directory and triggers cleanup (implements IExtensionSyncActivationService) |
| 146 | +3. `startServerImpl()` - Writes lock file after server starts |
| 147 | +4. `stopServerImpl()` - Deletes lock file when server stops |
| 148 | +5. `dispose()` - Deletes lock files for all stopped servers |
| 149 | +6. `cleanupOrphanedProcesses()` - Implements sophisticated orphan detection |
| 150 | + |
| 151 | +## Benefits |
| 152 | + |
| 153 | +1. **Multi-Window Safety**: Active servers in other VS Code windows are never killed |
| 154 | +2. **Backward Compatible**: Handles processes from older versions without lock files |
| 155 | +3. **Robust Orphan Detection**: Uses OS-level parent process verification |
| 156 | +4. **Full Audit Trail**: Comprehensive logging of all cleanup decisions |
| 157 | +5. **Automatic Cleanup**: Stale lock files are removed when processes are killed |
| 158 | +6. **Session Isolation**: Each VS Code window operates independently |
| 159 | + |
| 160 | +## Testing Recommendations |
| 161 | + |
| 162 | +1. **Single Window**: Verify servers start and stop correctly |
| 163 | +2. **Multiple Windows**: Open multiple VS Code windows with Deepnote files, verify servers in other windows aren't killed |
| 164 | +3. **Orphan Cleanup**: Kill VS Code process forcefully, restart, verify orphaned servers are cleaned up |
| 165 | +4. **Lock File Cleanup**: Verify lock files are created and deleted appropriately |
| 166 | +5. **Cross-Platform**: Test on Windows, macOS, and Linux |
| 167 | + |
| 168 | +## Edge Cases Handled |
| 169 | + |
| 170 | +1. **No lock file + active parent**: Process is skipped (might be from older version) |
| 171 | +2. **Lock file + different session + active parent**: Process is skipped (active in another window) |
| 172 | +3. **Lock file + same session**: Process is skipped (shouldn't happen at startup) |
| 173 | +4. **No lock file + orphaned**: Process is killed (genuine orphan) |
| 174 | +5. **Lock file + different session + orphaned**: Process is killed (orphaned from crashed window) |
| 175 | +6. **Failed parent check**: Process is assumed not orphaned (safer default) |
| 176 | + |
| 177 | +## Future Enhancements |
| 178 | + |
| 179 | +1. **Stale Lock File Cleanup**: Periodically clean up lock files for non-existent processes |
| 180 | +2. **Lock File Expiry**: Add TTL to lock files to handle edge cases |
| 181 | +3. **Health Monitoring**: Periodic checks to ensure servers are still responsive |
| 182 | +4. **Graceful Shutdown**: Try SIGTERM before SIGKILL for orphaned processes |
0 commit comments