Skip to content

Commit be75c21

Browse files
committed
docs: Final spec for the Codex browser automation
1 parent 84e8951 commit be75c21

File tree

5 files changed

+159
-217
lines changed

5 files changed

+159
-217
lines changed

docs/browser-automation/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
## Browser Automation
2+
3+
Each document in this folder describes an automation targeting a specific site that agents‑workflow interacts with. Automations share the Agent Browser Profiles convention in `../agent-browsers/spec.md` for persistent, named profiles.
4+
5+
### Structure
6+
7+
- `<site>.md` — High‑level behavior of the automation (e.g., `codex.md`).
8+
- `<site>-testing.md` — Testing strategy and edge cases for the automation.
9+
10+
### Common Principles
11+
12+
- Use Playwright persistent contexts bound to a selected profile.
13+
- Prefer headless execution when the profile’s login expectations are met; otherwise, switch to headful and guide the user.
14+
- Detect UI drift and fail fast with actionable diagnostics. When possible, surface the browser window to help the user investigate.
15+
16+
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
## Testing Strategy — Codex Browser Automation
2+
3+
Goal: validate Playwright-driven automation that navigates `https://chatgpt.com/codex`, selects a workspace/branch, enters "go", and starts coding — while honoring Agent Browser Profiles visibility and login expectations.
4+
5+
### Levels of Testing
6+
7+
1) Unit-like checks (fast):
8+
- Validate profile path resolution across platforms given environment overrides.
9+
- Validate parsing and semantics of `meta.json` (visibility policy, login expectations, TTL/grace).
10+
- Validate selector maps/config fallbacks without launching a browser.
11+
12+
2) Playwright integration tests (headless/headful):
13+
- Use persistent contexts tied to ephemeral copies of real profiles (or synthetic profiles) to avoid mutating a user’s primary profiles.
14+
- Mock or guard network calls as needed, but prefer real navigation to detect UI drift.
15+
16+
3) OS‑level visibility assertions:
17+
- Verify that the browser starts hidden (headless) when login is known good.
18+
- Verify that the browser is displayed (headful) only when login is unknown/expired/failing or when UI drift is detected.
19+
20+
### Headless vs Headful Verification
21+
22+
- Headless: Assert Playwright is launched with `headless: true` and no windows are created at the OS level. For Linux/macOS/Windows, implement a platform helper that samples top‑level windows via:
23+
- macOS: `CGWindowListCopyWindowInfo` via a small helper binary or `osascript -e 'tell app "System Events" to ...'` as fallback.
24+
- Linux: `xprop`/`wmctrl` on X11; `gdbus`/`gsettings`/`swaymsg` on Wayland (best‑effort; may skip when unavailable).
25+
- Windows: Win32 `EnumWindows` via a helper, or `powershell` COM query fallback.
26+
- Headful required: Assert at least one top‑level browser window becomes visible within a timeout after a failed login probe or drift detection.
27+
28+
These helpers should be wrapped with feature detection and skipped when the environment cannot reliably report window state (e.g., headless CI without a virtual display).
29+
30+
### CI Considerations
31+
32+
- Use containerized jobs with a virtual display (Xvfb or Xwayland) and a minimal window manager to support headful tests.
33+
- For macOS runners, prefer native headless for most tests; restrict window‑visibility tests to self‑hosted runners capable of GUI automation.
34+
- For Windows, run in a session with desktop interaction enabled.
35+
36+
### Login Expectation Scenarios
37+
38+
Test cases should cover:
39+
- Known good login: `lastValidated` fresh and check passes → remain headless.
40+
- Stale login: `lastValidated` older than `graceSeconds` → perform probe; if probe fails, switch to headful and wait for user.
41+
- No expectations configured: proceed headless by default; do not block.
42+
- Cookie present but selector absent: treat as not logged in (conservative), switch to headful.
43+
44+
### UI Drift and Resilience
45+
46+
Detection:
47+
- Missing critical selectors (workspace picker, branch selector, "Code" button) must fail fast with a machine‑readable error.
48+
- Automation should then show the browser (headful), optionally open DevTools, and present an inline banner/toast explaining what failed and how to proceed.
49+
50+
Tests:
51+
- Simulate selector renames by injecting CSS/JS to remove/alter test ids via Playwright route interception or a local test proxy. Assert that:
52+
- The automation raises a drift error quickly.
53+
- The browser is brought to foreground (headful).
54+
- A diagnostic message is visible to the user and logs include selector keys that failed.
55+
56+
### Workspace/Branch Selection Edge Cases
57+
58+
- Multiple workspaces; selection requires scrolling or dynamic loading.
59+
- Branch list too long; search/filter interaction required.
60+
- Permissions errors (workspace not accessible) — assert graceful message and headful fallback.
61+
62+
### Rate Limits and Captcha Handling
63+
64+
- If navigation returns a rate‑limit or captcha page, switch to headful, surface instructions, and pause. Tests simulate this by stubbing responses to return challenge pages and assert the fallback behavior.
65+
66+
### Telemetry and Artifacts
67+
68+
- Save Playwright traces, console logs, and screenshots on failure.
69+
- Update `lastValidated` on successful login checks; avoid writes in tests unless operating on disposable profile copies.
70+
71+
### Fully Automated Local and CI Execution
72+
73+
- Provide a test harness that:
74+
- Creates a temporary profile directory seeded with synthetic cookies/selectors to emulate login.
75+
- Runs headless success path and asserts no windows.
76+
- Runs stale/failed login paths and asserts window visibility transitioned as expected.
77+
- Runs UI drift scenarios using selector overrides.
78+
- Cleans up all temporary artifacts.
79+
80+
### Developer Ergonomics
81+
82+
- `--update-selectors` test mode to record new stable selectors when UI drift is acknowledged by a developer.
83+
- `--show-browser` override to force headful during local debugging.
84+
85+

docs/browser-automation/codex.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
## Codex Browser Automation (Playwright)
2+
3+
### Purpose
4+
5+
Automate the Codex WebUI to initiate a coding session for a repository/branch using a shared agent browser profile. This is the first automation built on the Agent Browser Profiles convention.
6+
7+
### Behavior (happy path)
8+
9+
1. Determine ChatGPT username: accept optional `--chatgpt-username` (see `docs/cli-spec.md`).
10+
2. Discover profiles: list agent browser profiles whose `loginExpectations.origins` include `https://chatgpt.com`.
11+
3. Filter by username: if `--chatgpt-username` is provided, restrict to profiles whose `loginExpectations.username` matches.
12+
4. Select or create profile:
13+
- If one or more profiles match, choose the best candidate (prompt if multiple).
14+
- If none match, create a new profile named `chatgpt-<username>` when a username is provided, otherwise `chatgpt`.
15+
5. Override behavior: if `--browser-profile` is provided, skip discovery/creation and use that profile name directly (create fresh if missing).
16+
6. Launch Playwright with a persistent context in headless mode.
17+
7. If the expected login is not present, relaunch in visible mode to let the user authenticate, then continue.
18+
8. Navigate to Codex, select workspace and branch, enter the task description, and press "Code":
19+
- Workspace comes from `--codex-workspace` or `config: codex.workspace` (see `docs/configuration.md`).
20+
- Branch comes from the `aw task --branch` value.
21+
9. Record success.
22+
23+
If the automation code fails to execute due to potential changes in the Codex WebUI. Report detailed diagnostic information for the user (e.g. which UI element you were trying to locate; Which selectors were used and what happened - the expected element was not found, more than one element was found, etc).
24+
25+
### Visibility and Login Flow
26+
27+
- Runs headless by default; when login is not present, restarts headful to allow the user to log in, then proceeds automatically.
28+
29+
### Configuration
30+
31+
Controlled via AW configuration (see `docs/cli-spec.md` and `docs/configuration.md`):
32+
33+
- Enable/disable automation for `aw task`.
34+
- Select or override the agent browser profile name.
35+
- Set default Codex workspace: `codex.workspace`.
36+
37+
### Notes
38+
39+
- Playwright selectors should prefer role/aria/test id attributes to resist UI text changes.
40+
- Use stable navigation points inside Codex (workspace and branch selectors) and fail fast with helpful error messages when not found; optionally open DevTools in headful mode for investigation.
41+
42+

docs/cli-spec.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ Configuration mapping examples:
3434
- `tui.defaultMode``--mode`
3535
- `terminal.multiplexer``--multiplexer <tmux|zellij|screen>`
3636
- `editor.default``--editor`
37+
- `browserAutomation.enabled``--browser-automation`, `AGENTS_WORKFLOW_BROWSER_AUTOMATION_ENABLED`
38+
- `browserAutomation.profile``--browser-profile`, `AGENTS_WORKFLOW_BROWSER_PROFILE`
3739

3840
### Subcommands
3941

@@ -56,13 +58,14 @@ Task launch behavior in TUI:
5658

5759
#### 2) Tasks
5860

59-
- `aw task [create] [--prompt <TEXT> | --prompt-file <FILE>] [--repo <PATH|URL>] [--branch <NAME>] [--agent <TYPE>[@VERSION]] [--instances <N>] [--runtime <devcontainer|local|unsandboxed>] [--devcontainer-path <PATH>] [--labels k=v ...] [--delivery <pr|branch|patch>] [--target-branch <NAME>] [--yes]`
61+
- `aw task [create] [--prompt <TEXT> | --prompt-file <FILE>] [--repo <PATH|URL>] [--branch <NAME>] [--agent <TYPE>[@VERSION]] [--instances <N>] [--runtime <devcontainer|local|unsandboxed>] [--devcontainer-path <PATH>] [--labels k=v ...] [--delivery <pr|branch|patch>] [--target-branch <NAME>] [--browser-automation <true|false>] [--browser-profile <NAME>] [--yes]`
6062

6163
Behavior:
6264

6365
- In local mode, prepares a per-task workspace using snapshot preference order (ZFS > Btrfs > Overlay > copy) and launches the agent.
6466
- In rest mode, calls `POST /api/v1/tasks` with the provided parameters.
6567
- Creates/updates a local PID-like session record when launching locally (see “Local Discovery”).
68+
- When `--browser-automation true` (default), launches site-specific browser automation (e.g., Codex) using the selected agent browser profile. When `false`, web automation is skipped.
6669
- Branch autocompletion uses standard git protocol:
6770
- Local mode: `git for-each-ref` on the repo; cached with debounce.
6871
- REST mode: server uses `git ls-remote`/refs against admin-configured URL to populate its cache; CLI/Web query capability endpoints for suggestions.
@@ -218,6 +221,12 @@ Create a task locally and immediately open TUI window/panes:
218221
aw task --prompt "Refactor checkout service for reliability" --repo . --agent openhands --runtime devcontainer --branch main --instances 2
219222
```
220223

224+
Specify a browser profile and disable automation explicitly:
225+
226+
```bash
227+
aw task --prompt "Kick off Codex" --browser-profile work-codex --browser-automation false
228+
```
229+
221230
List and tail logs for sessions:
222231

223232
```bash

0 commit comments

Comments
 (0)