docs: Final spec for the Codex browser automation

zah · zah · commit be75c21e8beb · 2025-08-26T16:07:46.000+03:00
diff --git a/docs/browser-automation/README.md b/docs/browser-automation/README.md
@@ -0,0 +1,16 @@
+## Browser Automation
+
+Each document in this folder describes an automation targeting a specific site that agents‑workflow interacts with. Automations share the Agent Browser Profiles convention in `../agent-browsers/spec.md` for persistent, named profiles.
+
+### Structure
+
+- `<site>.md` — High‑level behavior of the automation (e.g., `codex.md`).
+- `<site>-testing.md` — Testing strategy and edge cases for the automation.
+
+### Common Principles
+
+- Use Playwright persistent contexts bound to a selected profile.
+- Prefer headless execution when the profile’s login expectations are met; otherwise, switch to headful and guide the user.
+- Detect UI drift and fail fast with actionable diagnostics. When possible, surface the browser window to help the user investigate.
+
+
diff --git a/docs/browser-automation/codex-testing.md b/docs/browser-automation/codex-testing.md
@@ -0,0 +1,85 @@
+## Testing Strategy — Codex Browser Automation
+
+Goal: validate Playwright-driven automation that navigates `https://chatgpt.com/codex`, selects a workspace/branch, enters "go", and starts coding — while honoring Agent Browser Profiles visibility and login expectations.
+
+### Levels of Testing
+
+1) Unit-like checks (fast):
+- Validate profile path resolution across platforms given environment overrides.
+- Validate parsing and semantics of `meta.json` (visibility policy, login expectations, TTL/grace).
+- Validate selector maps/config fallbacks without launching a browser.
+
+2) Playwright integration tests (headless/headful):
+- Use persistent contexts tied to ephemeral copies of real profiles (or synthetic profiles) to avoid mutating a user’s primary profiles.
+- Mock or guard network calls as needed, but prefer real navigation to detect UI drift.
+
+3) OS‑level visibility assertions:
+- Verify that the browser starts hidden (headless) when login is known good.
+- Verify that the browser is displayed (headful) only when login is unknown/expired/failing or when UI drift is detected.
+
+### Headless vs Headful Verification
+
+- Headless: Assert Playwright is launched with `headless: true` and no windows are created at the OS level. For Linux/macOS/Windows, implement a platform helper that samples top‑level windows via:
+  - macOS: `CGWindowListCopyWindowInfo` via a small helper binary or `osascript -e 'tell app "System Events" to ...'` as fallback.
+  - Linux: `xprop`/`wmctrl` on X11; `gdbus`/`gsettings`/`swaymsg` on Wayland (best‑effort; may skip when unavailable).
+  - Windows: Win32 `EnumWindows` via a helper, or `powershell` COM query fallback.
+- Headful required: Assert at least one top‑level browser window becomes visible within a timeout after a failed login probe or drift detection.
+
+These helpers should be wrapped with feature detection and skipped when the environment cannot reliably report window state (e.g., headless CI without a virtual display).
+
+### CI Considerations
+
+- Use containerized jobs with a virtual display (Xvfb or Xwayland) and a minimal window manager to support headful tests.
+- For macOS runners, prefer native headless for most tests; restrict window‑visibility tests to self‑hosted runners capable of GUI automation.
+- For Windows, run in a session with desktop interaction enabled.
+
+### Login Expectation Scenarios
+
+Test cases should cover:
+- Known good login: `lastValidated` fresh and check passes → remain headless.
+- Stale login: `lastValidated` older than `graceSeconds` → perform probe; if probe fails, switch to headful and wait for user.
+- No expectations configured: proceed headless by default; do not block.
+- Cookie present but selector absent: treat as not logged in (conservative), switch to headful.
+
+### UI Drift and Resilience
+
+Detection:
+- Missing critical selectors (workspace picker, branch selector, "Code" button) must fail fast with a machine‑readable error.
+- Automation should then show the browser (headful), optionally open DevTools, and present an inline banner/toast explaining what failed and how to proceed.
+
+Tests:
+- Simulate selector renames by injecting CSS/JS to remove/alter test ids via Playwright route interception or a local test proxy. Assert that:
+  - The automation raises a drift error quickly.
+  - The browser is brought to foreground (headful).
+  - A diagnostic message is visible to the user and logs include selector keys that failed.
+
+### Workspace/Branch Selection Edge Cases
+
+- Multiple workspaces; selection requires scrolling or dynamic loading.
+- Branch list too long; search/filter interaction required.
+- Permissions errors (workspace not accessible) — assert graceful message and headful fallback.
+
+### Rate Limits and Captcha Handling
+
+- If navigation returns a rate‑limit or captcha page, switch to headful, surface instructions, and pause. Tests simulate this by stubbing responses to return challenge pages and assert the fallback behavior.
+
+### Telemetry and Artifacts
+
+- Save Playwright traces, console logs, and screenshots on failure.
+- Update `lastValidated` on successful login checks; avoid writes in tests unless operating on disposable profile copies.
+
+### Fully Automated Local and CI Execution
+
+- Provide a test harness that:
+  - Creates a temporary profile directory seeded with synthetic cookies/selectors to emulate login.
+  - Runs headless success path and asserts no windows.
+  - Runs stale/failed login paths and asserts window visibility transitioned as expected.
+  - Runs UI drift scenarios using selector overrides.
+  - Cleans up all temporary artifacts.
+
+### Developer Ergonomics
+
+- `--update-selectors` test mode to record new stable selectors when UI drift is acknowledged by a developer.
+- `--show-browser` override to force headful during local debugging.
+
+
diff --git a/docs/browser-automation/codex.md b/docs/browser-automation/codex.md
@@ -0,0 +1,42 @@
+## Codex Browser Automation (Playwright)
+
+### Purpose
+
+Automate the Codex WebUI to initiate a coding session for a repository/branch using a shared agent browser profile. This is the first automation built on the Agent Browser Profiles convention.
+
+### Behavior (happy path)
+
+1. Determine ChatGPT username: accept optional `--chatgpt-username` (see `docs/cli-spec.md`).
+2. Discover profiles: list agent browser profiles whose `loginExpectations.origins` include `https://chatgpt.com`.
+3. Filter by username: if `--chatgpt-username` is provided, restrict to profiles whose `loginExpectations.username` matches.
+4. Select or create profile:
+   - If one or more profiles match, choose the best candidate (prompt if multiple).
+   - If none match, create a new profile named `chatgpt-<username>` when a username is provided, otherwise `chatgpt`.
+5. Override behavior: if `--browser-profile` is provided, skip discovery/creation and use that profile name directly (create fresh if missing).
+6. Launch Playwright with a persistent context in headless mode.
+7. If the expected login is not present, relaunch in visible mode to let the user authenticate, then continue.
+8. Navigate to Codex, select workspace and branch, enter the task description, and press "Code":
+   - Workspace comes from `--codex-workspace` or `config: codex.workspace` (see `docs/configuration.md`).
+   - Branch comes from the `aw task --branch` value.
+9. Record success.
+
+If the automation code fails to execute due to potential changes in the Codex WebUI. Report detailed diagnostic information for the user (e.g. which UI element you were trying to locate; Which selectors were used and what happened - the expected element was not found, more than one element was found, etc).
+
+### Visibility and Login Flow
+
+- Runs headless by default; when login is not present, restarts headful to allow the user to log in, then proceeds automatically.
+
+### Configuration
+
+Controlled via AW configuration (see `docs/cli-spec.md` and `docs/configuration.md`):
+
+- Enable/disable automation for `aw task`.
+- Select or override the agent browser profile name.
+- Set default Codex workspace: `codex.workspace`.
+
+### Notes
+
+- Playwright selectors should prefer role/aria/test id attributes to resist UI text changes.
+- Use stable navigation points inside Codex (workspace and branch selectors) and fail fast with helpful error messages when not found; optionally open DevTools in headful mode for investigation.
+
+
diff --git a/docs/cli-spec.md b/docs/cli-spec.md
@@ -34,6 +34,8 @@ Configuration mapping examples:
 - `tui.defaultMode` ↔ `--mode`
 - `terminal.multiplexer` ↔ `--multiplexer <tmux|zellij|screen>`
 - `editor.default` ↔ `--editor`
+- `browserAutomation.enabled` ↔ `--browser-automation`, `AGENTS_WORKFLOW_BROWSER_AUTOMATION_ENABLED`
+- `browserAutomation.profile` ↔ `--browser-profile`, `AGENTS_WORKFLOW_BROWSER_PROFILE`
 
 ### Subcommands
 
@@ -56,13 +58,14 @@ Task launch behavior in TUI:
 
 #### 2) Tasks
 
-- `aw task [create] [--prompt <TEXT> | --prompt-file <FILE>] [--repo <PATH|URL>] [--branch <NAME>] [--agent <TYPE>[@VERSION]] [--instances <N>] [--runtime <devcontainer|local|unsandboxed>] [--devcontainer-path <PATH>] [--labels k=v ...] [--delivery <pr|branch|patch>] [--target-branch <NAME>] [--yes]`
+- `aw task [create] [--prompt <TEXT> | --prompt-file <FILE>] [--repo <PATH|URL>] [--branch <NAME>] [--agent <TYPE>[@VERSION]] [--instances <N>] [--runtime <devcontainer|local|unsandboxed>] [--devcontainer-path <PATH>] [--labels k=v ...] [--delivery <pr|branch|patch>] [--target-branch <NAME>] [--browser-automation <true|false>] [--browser-profile <NAME>] [--yes]`
 
 Behavior:
 
 - In local mode, prepares a per-task workspace using snapshot preference order (ZFS > Btrfs > Overlay > copy) and launches the agent.
 - In rest mode, calls `POST /api/v1/tasks` with the provided parameters.
 - Creates/updates a local PID-like session record when launching locally (see “Local Discovery”).
+- When `--browser-automation true` (default), launches site-specific browser automation (e.g., Codex) using the selected agent browser profile. When `false`, web automation is skipped.
 - Branch autocompletion uses standard git protocol:
   - Local mode: `git for-each-ref` on the repo; cached with debounce.
   - REST mode: server uses `git ls-remote`/refs against admin-configured URL to populate its cache; CLI/Web query capability endpoints for suggestions.
@@ -218,6 +221,12 @@ Create a task locally and immediately open TUI window/panes:
 aw task --prompt "Refactor checkout service for reliability" --repo . --agent openhands --runtime devcontainer --branch main --instances 2
 ```
 
+Specify a browser profile and disable automation explicitly:
+
+```bash
+aw task --prompt "Kick off Codex" --browser-profile work-codex --browser-automation false
+```
+
 List and tail logs for sessions:
 
 ```bash
diff --git a/docs/configuration.md b/docs/configuration.md