From c8972e71c5cd52d1b54721d0c4856bf469329624 Mon Sep 17 00:00:00 2001 From: Peter Dedene Date: Thu, 19 Mar 2026 12:18:07 +0100 Subject: [PATCH 1/2] feat(browser): attach to running browsers via direct CDP --- CHANGELOG.md | 8 + README.md | 15 +- bin/oracle-cli.ts | 13 +- docs/browser-mode.md | 55 ++++- docs/configuration.md | 2 +- docs/manual-tests.md | 17 ++ src/browser/attachRunning.ts | 57 +++++ src/browser/chromeLifecycle.ts | 221 ++++++++++++++++++- src/browser/config.ts | 8 + src/browser/detect.ts | 304 +++++++++++++++++++++++--- src/browser/index.ts | 64 +++++- src/browser/reattach.ts | 105 +++++++-- src/browser/sessionRunner.ts | 5 + src/browser/types.ts | 11 + src/cli/browserConfig.ts | 41 ++++ src/cli/browserDefaults.ts | 4 + src/cli/sessionDisplay.ts | 6 +- src/config.ts | 1 + src/sessionManager.ts | 4 + tests/browser/attachRunning.test.ts | 107 +++++++++ tests/browser/chromeLifecycle.test.ts | 128 ++++++++++- tests/browser/detect.test.ts | 90 ++++++++ tests/browser/index.test.ts | 22 +- tests/browser/reattach.test.ts | 60 +++++ tests/browser/sessionRunner.test.ts | 67 ++++++ tests/cli/browserConfig.test.ts | 68 ++++++ tests/cli/browserDefaults.test.ts | 13 ++ 27 files changed, 1406 insertions(+), 90 deletions(-) create mode 100644 src/browser/attachRunning.ts create mode 100644 tests/browser/attachRunning.test.ts create mode 100644 tests/browser/detect.test.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index b35b95da5..fff838caa 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,14 @@ ## Unreleased +### Added + +- Browser: add `--browser-attach-running` to reuse a local already-running signed-in Chrome through Chrome’s local remote-debugging toggle. Oracle opens a dedicated tab, stores attach metadata for reattach, and leaves the browser itself untouched. + +### Docs + +- Browser: document the new attach-running workflow and add a manual smoke test for the direct attach path. + ## 0.9.0 — 2026-03-08 ### Changed diff --git a/README.md b/README.md index ff42a29e2..e03cdbdd7 100644 --- a/README.md +++ b/README.md @@ -211,9 +211,10 @@ oracle --engine browser \ | `--chatgpt-url ` | Target a ChatGPT workspace/folder (browser). | | `--browser-model-strategy ` | Control ChatGPT model selection in browser mode (current keeps the active model; ignore skips the picker). | | `--browser-manual-login` | Skip cookie copy; reuse a persistent automation profile and wait for manual ChatGPT login. | +| `--browser-attach-running` | Reuse your current local browser session through local `DevToolsActivePort` discovery; Oracle opens a dedicated tab instead of launching Chrome (defaults to `127.0.0.1:9222`, or combine with `--remote-chrome ` to hint a different local endpoint). | | `--browser-thinking-time ` | Set ChatGPT thinking-time intensity (browser; Thinking/Pro models only). | | `--browser-port ` | Pin the Chrome DevTools port (WSL/Windows firewall helper). | -| `--browser-inline-cookies[(-file)] ` | Supply cookies without Chrome/Keychain (browser). | +| `--browser-inline-cookies[(-file)] ` | Supply cookies without Chrome/Keychain (browser). | | `--browser-timeout`, `--browser-input-timeout` | Control overall/browser input timeouts (supports h/m/s/ms). | | `--browser-recheck-delay`, `--browser-recheck-timeout` | Delayed recheck for long Pro runs: wait then retry capture after timeout (supports h/m/s/ms). | | `--browser-reuse-wait` | Wait for a shared Chrome profile before launching (parallel browser runs). | @@ -229,7 +230,7 @@ oracle --engine browser \ | `--files-report` | Print per-file token usage. | | `--dry-run [summary\|json\|full]` | Preview without sending. | | `--remote-host`, `--remote-token` | Use a remote `oracle serve` host (browser). | -| `--remote-chrome ` | Attach to an existing remote Chrome session (browser). | +| `--remote-chrome ` | Attach to an existing remote Chrome session (browser), or when combined with `--browser-attach-running` use this host:port as the local attach hint. | | `--youtube ` | YouTube video URL to analyze (Gemini browser mode). | | `--generate-image ` | Generate image and save to file (Gemini browser mode). | | `--edit-image ` | Edit existing image with `--output` (Gemini browser mode). | @@ -255,11 +256,11 @@ See [docs/configuration.md](docs/configuration.md) for precedence and full schem Advanced flags -| Area | Flags | -| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Browser | `--browser-manual-login`, `--browser-thinking-time`, `--browser-timeout`, `--browser-input-timeout`, `--browser-recheck-delay`, `--browser-recheck-timeout`, `--browser-reuse-wait`, `--browser-profile-lock-timeout`, `--browser-auto-reattach-delay`, `--browser-auto-reattach-interval`, `--browser-auto-reattach-timeout`, `--browser-cookie-wait`, `--browser-inline-cookies[(-file)]`, `--browser-attachments`, `--browser-inline-files`, `--browser-bundle-files`, `--browser-keep-browser`, `--browser-headless`, `--browser-hide-window`, `--browser-no-cookie-sync`, `--browser-allow-cookie-errors`, `--browser-chrome-path`, `--browser-cookie-path`, `--chatgpt-url` | -| Run control | `--background`, `--no-background`, `--http-timeout`, `--zombie-timeout`, `--zombie-last-activity` | -| Azure/OpenAI | `--azure-endpoint`, `--azure-deployment`, `--azure-api-version`, `--base-url` | +| Area | Flags | +| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Browser | `--browser-manual-login`, `--browser-attach-running`, `--browser-thinking-time`, `--browser-timeout`, `--browser-input-timeout`, `--browser-recheck-delay`, `--browser-recheck-timeout`, `--browser-reuse-wait`, `--browser-profile-lock-timeout`, `--browser-auto-reattach-delay`, `--browser-auto-reattach-interval`, `--browser-auto-reattach-timeout`, `--browser-cookie-wait`, `--browser-inline-cookies[(-file)]`, `--browser-attachments`, `--browser-inline-files`, `--browser-bundle-files`, `--browser-keep-browser`, `--browser-headless`, `--browser-hide-window`, `--browser-no-cookie-sync`, `--browser-allow-cookie-errors`, `--browser-chrome-path`, `--browser-cookie-path`, `--chatgpt-url` | +| Run control | `--background`, `--no-background`, `--http-timeout`, `--zombie-timeout`, `--zombie-last-activity` | +| Azure/OpenAI | `--azure-endpoint`, `--azure-deployment`, `--azure-api-version`, `--base-url` | Remote browser example diff --git a/bin/oracle-cli.ts b/bin/oracle-cli.ts index f8768328c..8a6690741 100755 --- a/bin/oracle-cli.ts +++ b/bin/oracle-cli.ts @@ -125,6 +125,7 @@ interface CliOptions extends OptionValues { browserChromeProfile?: string; browserChromePath?: string; browserCookiePath?: string; + browserAttachRunning?: boolean; chatgptUrl?: string; browserUrl?: string; browserTimeout?: string; @@ -460,6 +461,12 @@ program "Explicit Chrome/Chromium cookie DB path for session reuse.", ), ) + .addOption( + new Option( + "--browser-attach-running", + "Attach to a running local browser session instead of launching Chrome (defaults to 127.0.0.1:9222; combine with --remote-chrome to hint a different host:port).", + ), + ) .addOption( new Option( "--chatgpt-url ", @@ -609,7 +616,7 @@ program .addOption( new Option( "--remote-chrome ", - "Connect to remote Chrome DevTools Protocol (e.g., 192.168.1.10:9222 or [2001:db8::1]:9222 for IPv6).", + "Connect to remote Chrome DevTools Protocol, or when combined with --browser-attach-running use this host:port as the local attach hint.", ), ) .addOption( @@ -2091,6 +2098,10 @@ function printDebugHelp(cliName: string): void { ["--browser-chrome-profile ", "Reuse cookies from a specific Chrome profile."], ["--browser-chrome-path ", "Point to a custom Chrome/Chromium binary."], ["--browser-cookie-path ", "Use a specific Chrome/Chromium cookie store file."], + [ + "--browser-attach-running", + "Attach to your current Chrome session through its local remote debugging toggle.", + ], ["--browser-url ", "Alias for --chatgpt-url."], ["--browser-timeout ", "Cap total wait time for the assistant response."], ["--browser-input-timeout ", "Cap how long we wait for the composer textarea."], diff --git a/docs/browser-mode.md b/docs/browser-mode.md index c45c15e21..b9b815523 100644 --- a/docs/browser-mode.md +++ b/docs/browser-mode.md @@ -1,8 +1,9 @@ # Browser Mode -Oracle’s `--engine browser` supports two different execution paths: +Oracle’s `--engine browser` supports three different execution paths: -- **ChatGPT automation** (GPT-\* models): drives the ChatGPT web UI with Chrome automation. +- **ChatGPT launcher mode** (GPT-\* models): Oracle launches Chrome itself and drives the ChatGPT web UI over CDP. +- **ChatGPT attach-running mode** (GPT-\* models): Oracle attaches to your already-running local Chrome session through Chrome’s local remote-debugging toggle, opens a dedicated tab, and leaves the browser process/profile alone. - **Gemini web mode** (Gemini models): talks directly to `gemini.google.com` using your signed-in Chrome cookies (no ChatGPT automation). If you’re running Gemini, also see `docs/gemini.md`. @@ -41,17 +42,48 @@ oracle --engine browser \ You can pass the same payload inline (`--browser-inline-cookies ''`) or via env (`ORACLE_BROWSER_COOKIES_JSON`, `ORACLE_BROWSER_COOKIES_FILE`). Cloudflare cookies (`cf_clearance`, `__cf_bm`, etc.) are only needed when you hit a challenge. +## Quick example: attach to your running Chrome + +Use this when you already have a signed-in Chrome session running with DevTools access enabled and want Oracle to reuse that browser instead of launching its own copy. + +```bash +oracle --engine browser \ + --browser-attach-running \ + --model "GPT-5.4 Pro" \ + -p "Summarize the last assistant response in one paragraph" +``` + +Notes: + +- `--browser-attach-running` defaults to local attach discovery at `127.0.0.1:9222`. +- If the browser UI shows a different local endpoint, you can point Oracle at it explicitly: + ```bash + oracle --engine browser \ + --browser-attach-running \ + --remote-chrome 127.0.0.1:63332 \ + --model "GPT-5.4 Pro" \ + -p "Summarize the last assistant response in one paragraph" + ``` +- Oracle reads local `DevToolsActivePort` metadata, connects to the browser websocket directly, and then reuses the normal CDP automation flow. +- If Chrome shows a remote-debugging approval prompt on first attach, Oracle issues one attach request and waits briefly for you to allow it before failing. +- Attach mode always opens a fresh Oracle-owned tab and closes only that tab after a successful run. +- Cookie sync, Chrome launch flags, and profile lifecycle flags are skipped because the browser is already running. +- If Chrome is not exposing a classic `/json/version` endpoint, use `--browser-attach-running` instead of standalone `--remote-chrome`. + ## Current Pipeline 1. **Prompt assembly** – we reuse the normal prompt builder (`buildPrompt`) and the markdown renderer. Browser mode pastes the system + user text (no special markers) into the ChatGPT composer and, by default, pastes resolved file contents inline until the total pasted content reaches ~60k characters (then switches to uploads). -2. **Automation stack** – code lives in `src/browserMode.ts` and is a lightly refactored version of the `oraclecheap` utility: - - Launches Chrome via `chrome-launcher` and connects with `chrome-remote-interface`. - - (Optional) copies cookies from the requested browser profile via Oracle’s built-in cookie reader (Keychain/DPAPI aware) so you stay signed in. - - Navigates to `chatgpt.com`, switches the model to the requested GPT-5.4 / GPT-5.2 variant, pastes the prompt, waits for completion, and copies the markdown via the built-in “copy turn” button. - - Immediately probes `/backend-api/me` in the ChatGPT tab to verify the session is authenticated; if the endpoint returns 401/403 we abort early with a login-specific error instead of timing out waiting for the composer. - - When `--file` inputs would push the pasted composer content over ~60k characters, we switch to uploading attachments (optionally bundled) and wait for ChatGPT to re-enable the send button before submitting the combined system+user prompt. - - Cleans up the temporary profile unless `--browser-keep-browser` is passed. -3. **Session integration** – browser sessions use the normal log writer, add `mode: "browser"` plus `browser.config/runtime` metadata, and log the Chrome PID/port so `oracle session ` (or `oracle status `) shows a marker for the background Chrome process. +2. **Automation stack** – code lives under `src/browser/`: + - Launcher mode starts Chrome via `chrome-launcher` and connects with `chrome-remote-interface`. + +- Attach-running mode reads local `DevToolsActivePort` metadata for the selected local port, connects to the browser websocket, opens a dedicated tab, and reuses the same DOM automation/capture flow against that attached browser. +- Launcher mode can optionally copy cookies from the requested browser profile via Oracle’s built-in cookie reader (Keychain/DPAPI aware) so you stay signed in. +- Navigates to `chatgpt.com`, switches the model to the requested GPT-5.4 / GPT-5.2 variant, pastes the prompt, waits for completion, and copies the markdown via the built-in “copy turn” button. +- Immediately probes `/backend-api/me` in the ChatGPT tab to verify the session is authenticated; if the endpoint returns 401/403 we abort early with a login-specific error instead of timing out waiting for the composer. +- When `--file` inputs would push the pasted composer content over ~60k characters, we switch to uploading attachments (optionally bundled) and wait for ChatGPT to re-enable the send button before submitting the combined system+user prompt. +- Launcher mode cleans up the temporary profile unless `--browser-keep-browser` is passed. + +3. **Session integration** – browser sessions use the normal log writer, add `mode: "browser"` plus `browser.config/runtime` metadata, and persist Chrome pid/port or websocket attach metadata plus the Oracle-owned target/tab URL for reattach. 4. **Usage accounting** – we estimate input tokens with the same tokenizer used for API runs and estimate output tokens via `estimateTokenCount`. `oracle status` therefore shows comparable cost/timing info even though the call ran through the browser. ### CLI Options @@ -59,6 +91,7 @@ You can pass the same payload inline (`--browser-inline-cookies '` to use a different local attach hint. - `--chatgpt-url`: override the ChatGPT base URL. Works with the root homepage (`https://chatgpt.com/`) **or** a specific workspace/folder link such as `https://chatgpt.com/g/.../project`. `--browser-url` stays as a hidden alias. - `--browser-timeout`, `--browser-input-timeout`: `1200s (20m)`/`60s` defaults. Durations accept `ms`, `s`, `m`, or `h` and can be chained (`1h2m10s`). - `--browser-recheck-delay`, `--browser-recheck-timeout`: after an assistant timeout, wait the delay, revisit the conversation, and retry capture (default recheck timeout 120s). Useful for Pro runs that finish later. @@ -77,6 +110,7 @@ You can pass the same payload inline (`--browser-inline-cookies '