Skip to content

Conversation

@rgarcia
Copy link
Contributor

@rgarcia rgarcia commented Jan 7, 2026

Summary

Replace per-request tsx subprocess spawning with a persistent Node.js daemon that maintains a warm CDP connection to the browser.

Performance improvement: ~94% faster for simple operations

  • Before: ~750ms per request (tsx startup + module load + CDP connect)
  • After: ~45ms per request (reuses daemon + warm CDP connection)

Changes

  • Add playwright-daemon.ts: persistent daemon that listens on Unix socket, maintains warm CDP connection, uses esbuild for fast TypeScript transformation
  • Update playwright.go: communicate with daemon via Unix socket instead of spawning tsx subprocess; lazy-start daemon on first request
  • Update Dockerfiles: install esbuild, pre-compile daemon to JavaScript during build

Architecture

HTTP Request → Go API → Unix Socket → Playwright Daemon → CDP → Chromium
                              ↑
                    (persistent, warm connection)

Test plan

  • Build headless Docker image
  • Run performance benchmarks comparing before/after
  • Test TypeScript syntax support (type annotations, interfaces, async/await)
  • Verify existing Go tests pass
  • Manual testing in production environment

Note

Introduces a persistent Playwright execution daemon and switches the API to use it over a Unix socket, removing per-request tsx subprocesses.

  • New daemon: Adds server/runtime/playwright-daemon.ts (compiled to /usr/local/lib/playwright-daemon.js) that listens on /tmp/playwright-daemon.sock, keeps a warm CDP connection, transforms TS via esbuild, and reconnects on browser restarts.
  • API changes: api/playwright.go now lazy-starts the daemon, communicates via Unix socket with request/response JSON, and returns structured errors; ApiService gains fields for daemon process control.
  • Dockerfiles: Install esbuild, remove tsx from globals, and compile+embed the daemon in both headful/headless images; update neko base image reference.
  • Tests: Move Playwright execute test into e2e_playwright_test.go and add a recovery test that restarts Chromium and verifies daemon reconnection.

Written by Cursor Bugbot for commit b4be529. This will update automatically on new commits. Configure here.

@@ -0,0 +1,266 @@
/**
* Persistent Playwright Executor Daemon

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i want to add tracing functionality a first class citizen, is this a place to add that given the architecture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially! Currently there are two main ways users can execute playwright code against a Kernel browser:
(1) spin up a kernel browser, and then use the CDP url from a playwright script. Here we have no control over playwright tracing config since it's user code
(2) use the playwright execute API (https://www.kernel.sh/docs/browsers/playwright-execution) which is what this PR modifies. This is an API that takes typescript an executes it against the browser

i'd say most people use (1) right now so if we added a parameter to enable tracing in this endpoint it might not be useful in practice

Replace per-request tsx subprocess with a persistent Node.js daemon that
maintains a warm CDP connection and uses esbuild for fast TypeScript
transformation.

Performance improvement: ~94% faster for simple operations
- Before: ~750ms per request (tsx startup + module load + CDP connect)
- After: ~45ms per request (reuses daemon + warm connection)

The daemon is lazy-started on first request and communicates via Unix
socket using newline-delimited JSON.
@rgarcia rgarcia force-pushed the playwright-daemon-perf branch from 7ce91a2 to eba83f5 Compare January 7, 2026 23:02

const SOCKET_PATH = process.env.PLAYWRIGHT_DAEMON_SOCKET || '/tmp/playwright-daemon.sock';
const CDP_ENDPOINT = process.env.CDP_ENDPOINT || 'ws://127.0.0.1:9222';
const USE_PATCHRIGHT = process.env.PLAYWRIGHT_ENGINE !== 'playwright-core';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default Playwright engine changed from playwright-core to patchright

High Severity

The PLAYWRIGHT_ENGINE environment variable logic was inverted. The old executor checked === 'patchright' to use patchright, defaulting to playwright-core when unset. The new daemon checks !== 'playwright-core', which means when PLAYWRIGHT_ENGINE is undefined (the common case), patchright is now used instead of playwright-core. This silently changes the default browser automation library for all existing deployments.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional

const result = await Promise.race([
userFunction(page, context, browserInstance),
timeoutPromise,
]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout timer causes unhandled rejection and memory leak

Medium Severity

The timeoutPromise uses setTimeout without storing the timer reference. When the user function completes before the timeout, Promise.race returns but the timer continues running. When it fires, it rejects a promise that nothing is listening to, causing an unhandled rejection. In Node.js 22, this generates warnings in logs and can crash the daemon depending on configuration. The timer reference needs to be stored and cleared via clearTimeout after the race completes.

Fix in Cursor Fix in Web

const result = await Promise.race([
userFunction(page, context, browserInstance),
timeoutPromise,
]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timed-out user code continues running and corrupts page state

High Severity

When a timeout occurs, Promise.race returns immediately but the user function continues executing with access to the shared page object. Since the daemon reuses the same page for subsequent requests (lines 148-149 prefer existing pages), a timed-out request's code can still be running when the next request starts, causing race conditions and unpredictable behavior. The old subprocess-based executor was killed by the OS on timeout via exec.CommandContext, providing clean isolation. The daemon lacks any mechanism to abort user code when timeout fires.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. This is an inherent tradeoff of the persistent daemon architecture. A few mitigating factors:

  1. The Go API serializes all requests with a mutex, so concurrent execution isn't possible
  2. Timed-out code is typically blocked on I/O (awaiting network), not CPU-bound, so it won't starve subsequent requests
  3. The practical risk of page state corruption is low - most timeouts occur during navigation/network issues where the page is already in a bad state

True isolation would require per-request subprocesses (the old model) or worker threads with their own CDP connections, both of which sacrifice the ~94% latency improvement this PR provides.

We accept this tradeoff for the performance gains. If it becomes a problem in practice, we can revisit with a "fresh page after timeout" mitigation.

@@ -0,0 +1,266 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just manage the daemon via supervisord instead? that way we can also automatically resume from failures, etc. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since CDP connections present some bot detection risk i figured we should only run the daemon if the user is using this playwright endpoint, so that's why it's not running by default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg! exciting stuff!


// If we can't parse the output, return a generic error
success := false
errorMsg := fmt.Sprintf("execution failed: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth distinguishing daemon vs code execution errors to make debugging easier

Copy link
Contributor

@Sayan- Sayan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet! qq on daemon management, lgtm otherwise

@rgarcia rgarcia merged commit e30df23 into main Jan 8, 2026
5 checks passed
@rgarcia rgarcia deleted the playwright-daemon-perf branch January 8, 2026 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants