-
Notifications
You must be signed in to change notification settings - Fork 35
perf: add persistent playwright executor daemon #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @@ -0,0 +1,266 @@ | |||
| /** | |||
| * Persistent Playwright Executor Daemon | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if i want to add tracing functionality a first class citizen, is this a place to add that given the architecture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially! Currently there are two main ways users can execute playwright code against a Kernel browser:
(1) spin up a kernel browser, and then use the CDP url from a playwright script. Here we have no control over playwright tracing config since it's user code
(2) use the playwright execute API (https://www.kernel.sh/docs/browsers/playwright-execution) which is what this PR modifies. This is an API that takes typescript an executes it against the browser
i'd say most people use (1) right now so if we added a parameter to enable tracing in this endpoint it might not be useful in practice
Replace per-request tsx subprocess with a persistent Node.js daemon that maintains a warm CDP connection and uses esbuild for fast TypeScript transformation. Performance improvement: ~94% faster for simple operations - Before: ~750ms per request (tsx startup + module load + CDP connect) - After: ~45ms per request (reuses daemon + warm connection) The daemon is lazy-started on first request and communicates via Unix socket using newline-delimited JSON.
7ce91a2 to
eba83f5
Compare
Test that the playwright daemon recovers after chromium is restarted via supervisorctl. Moves playwright tests to their own file.
|
|
||
| const SOCKET_PATH = process.env.PLAYWRIGHT_DAEMON_SOCKET || '/tmp/playwright-daemon.sock'; | ||
| const CDP_ENDPOINT = process.env.CDP_ENDPOINT || 'ws://127.0.0.1:9222'; | ||
| const USE_PATCHRIGHT = process.env.PLAYWRIGHT_ENGINE !== 'playwright-core'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default Playwright engine changed from playwright-core to patchright
High Severity
The PLAYWRIGHT_ENGINE environment variable logic was inverted. The old executor checked === 'patchright' to use patchright, defaulting to playwright-core when unset. The new daemon checks !== 'playwright-core', which means when PLAYWRIGHT_ENGINE is undefined (the common case), patchright is now used instead of playwright-core. This silently changes the default browser automation library for all existing deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intentional
| const result = await Promise.race([ | ||
| userFunction(page, context, browserInstance), | ||
| timeoutPromise, | ||
| ]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeout timer causes unhandled rejection and memory leak
Medium Severity
The timeoutPromise uses setTimeout without storing the timer reference. When the user function completes before the timeout, Promise.race returns but the timer continues running. When it fires, it rejects a promise that nothing is listening to, causing an unhandled rejection. In Node.js 22, this generates warnings in logs and can crash the daemon depending on configuration. The timer reference needs to be stored and cleared via clearTimeout after the race completes.
| const result = await Promise.race([ | ||
| userFunction(page, context, browserInstance), | ||
| timeoutPromise, | ||
| ]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timed-out user code continues running and corrupts page state
High Severity
When a timeout occurs, Promise.race returns immediately but the user function continues executing with access to the shared page object. Since the daemon reuses the same page for subsequent requests (lines 148-149 prefer existing pages), a timed-out request's code can still be running when the next request starts, causing race conditions and unpredictable behavior. The old subprocess-based executor was killed by the OS on timeout via exec.CommandContext, providing clean isolation. The daemon lacks any mechanism to abort user code when timeout fires.
Additional Locations (1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged. This is an inherent tradeoff of the persistent daemon architecture. A few mitigating factors:
- The Go API serializes all requests with a mutex, so concurrent execution isn't possible
- Timed-out code is typically blocked on I/O (awaiting network), not CPU-bound, so it won't starve subsequent requests
- The practical risk of page state corruption is low - most timeouts occur during navigation/network issues where the page is already in a bad state
True isolation would require per-request subprocesses (the old model) or worker threads with their own CDP connections, both of which sacrifice the ~94% latency improvement this PR provides.
We accept this tradeoff for the performance gains. If it becomes a problem in practice, we can revisit with a "fresh page after timeout" mitigation.
| @@ -0,0 +1,266 @@ | |||
| /** | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just manage the daemon via supervisord instead? that way we can also automatically resume from failures, etc. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since CDP connections present some bot detection risk i figured we should only run the daemon if the user is using this playwright endpoint, so that's why it's not running by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg! exciting stuff!
|
|
||
| // If we can't parse the output, return a generic error | ||
| success := false | ||
| errorMsg := fmt.Sprintf("execution failed: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth distinguishing daemon vs code execution errors to make debugging easier
Sayan-
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sweet! qq on daemon management, lgtm otherwise
Summary
Replace per-request
tsxsubprocess spawning with a persistent Node.js daemon that maintains a warm CDP connection to the browser.Performance improvement: ~94% faster for simple operations
Changes
playwright-daemon.ts: persistent daemon that listens on Unix socket, maintains warm CDP connection, uses esbuild for fast TypeScript transformationplaywright.go: communicate with daemon via Unix socket instead of spawning tsx subprocess; lazy-start daemon on first requestArchitecture
Test plan
Note
Introduces a persistent Playwright execution daemon and switches the API to use it over a Unix socket, removing per-request tsx subprocesses.
server/runtime/playwright-daemon.ts(compiled to/usr/local/lib/playwright-daemon.js) that listens on/tmp/playwright-daemon.sock, keeps a warm CDP connection, transforms TS viaesbuild, and reconnects on browser restarts.api/playwright.gonow lazy-starts the daemon, communicates via Unix socket with request/response JSON, and returns structured errors;ApiServicegains fields for daemon process control.esbuild, removetsxfrom globals, and compile+embed the daemon in both headful/headless images; update neko base image reference.e2e_playwright_test.goand add a recovery test that restarts Chromium and verifies daemon reconnection.Written by Cursor Bugbot for commit b4be529. This will update automatically on new commits. Configure here.