|
| 1 | +## Testing Strategy — Codex Browser Automation |
| 2 | + |
| 3 | +Goal: validate Playwright-driven automation that navigates `https://chatgpt.com/codex`, selects a workspace/branch, enters "go", and starts coding — while honoring Agent Browser Profiles visibility and login expectations. |
| 4 | + |
| 5 | +### Levels of Testing |
| 6 | + |
| 7 | +1) Unit-like checks (fast): |
| 8 | +- Validate profile path resolution across platforms given environment overrides. |
| 9 | +- Validate parsing and semantics of `meta.json` (visibility policy, login expectations, TTL/grace). |
| 10 | +- Validate selector maps/config fallbacks without launching a browser. |
| 11 | + |
| 12 | +2) Playwright integration tests (headless/headful): |
| 13 | +- Use persistent contexts tied to ephemeral copies of real profiles (or synthetic profiles) to avoid mutating a user’s primary profiles. |
| 14 | +- Mock or guard network calls as needed, but prefer real navigation to detect UI drift. |
| 15 | + |
| 16 | +3) OS‑level visibility assertions: |
| 17 | +- Verify that the browser starts hidden (headless) when login is known good. |
| 18 | +- Verify that the browser is displayed (headful) only when login is unknown/expired/failing or when UI drift is detected. |
| 19 | + |
| 20 | +### Headless vs Headful Verification |
| 21 | + |
| 22 | +- Headless: Assert Playwright is launched with `headless: true` and no windows are created at the OS level. For Linux/macOS/Windows, implement a platform helper that samples top‑level windows via: |
| 23 | + - macOS: `CGWindowListCopyWindowInfo` via a small helper binary or `osascript -e 'tell app "System Events" to ...'` as fallback. |
| 24 | + - Linux: `xprop`/`wmctrl` on X11; `gdbus`/`gsettings`/`swaymsg` on Wayland (best‑effort; may skip when unavailable). |
| 25 | + - Windows: Win32 `EnumWindows` via a helper, or `powershell` COM query fallback. |
| 26 | +- Headful required: Assert at least one top‑level browser window becomes visible within a timeout after a failed login probe or drift detection. |
| 27 | + |
| 28 | +These helpers should be wrapped with feature detection and skipped when the environment cannot reliably report window state (e.g., headless CI without a virtual display). |
| 29 | + |
| 30 | +### CI Considerations |
| 31 | + |
| 32 | +- Use containerized jobs with a virtual display (Xvfb or Xwayland) and a minimal window manager to support headful tests. |
| 33 | +- For macOS runners, prefer native headless for most tests; restrict window‑visibility tests to self‑hosted runners capable of GUI automation. |
| 34 | +- For Windows, run in a session with desktop interaction enabled. |
| 35 | + |
| 36 | +### Login Expectation Scenarios |
| 37 | + |
| 38 | +Test cases should cover: |
| 39 | +- Known good login: `lastValidated` fresh and check passes → remain headless. |
| 40 | +- Stale login: `lastValidated` older than `graceSeconds` → perform probe; if probe fails, switch to headful and wait for user. |
| 41 | +- No expectations configured: proceed headless by default; do not block. |
| 42 | +- Cookie present but selector absent: treat as not logged in (conservative), switch to headful. |
| 43 | + |
| 44 | +### UI Drift and Resilience |
| 45 | + |
| 46 | +Detection: |
| 47 | +- Missing critical selectors (workspace picker, branch selector, "Code" button) must fail fast with a machine‑readable error. |
| 48 | +- Automation should then show the browser (headful), optionally open DevTools, and present an inline banner/toast explaining what failed and how to proceed. |
| 49 | + |
| 50 | +Tests: |
| 51 | +- Simulate selector renames by injecting CSS/JS to remove/alter test ids via Playwright route interception or a local test proxy. Assert that: |
| 52 | + - The automation raises a drift error quickly. |
| 53 | + - The browser is brought to foreground (headful). |
| 54 | + - A diagnostic message is visible to the user and logs include selector keys that failed. |
| 55 | + |
| 56 | +### Workspace/Branch Selection Edge Cases |
| 57 | + |
| 58 | +- Multiple workspaces; selection requires scrolling or dynamic loading. |
| 59 | +- Branch list too long; search/filter interaction required. |
| 60 | +- Permissions errors (workspace not accessible) — assert graceful message and headful fallback. |
| 61 | + |
| 62 | +### Rate Limits and Captcha Handling |
| 63 | + |
| 64 | +- If navigation returns a rate‑limit or captcha page, switch to headful, surface instructions, and pause. Tests simulate this by stubbing responses to return challenge pages and assert the fallback behavior. |
| 65 | + |
| 66 | +### Telemetry and Artifacts |
| 67 | + |
| 68 | +- Save Playwright traces, console logs, and screenshots on failure. |
| 69 | +- Update `lastValidated` on successful login checks; avoid writes in tests unless operating on disposable profile copies. |
| 70 | + |
| 71 | +### Fully Automated Local and CI Execution |
| 72 | + |
| 73 | +- Provide a test harness that: |
| 74 | + - Creates a temporary profile directory seeded with synthetic cookies/selectors to emulate login. |
| 75 | + - Runs headless success path and asserts no windows. |
| 76 | + - Runs stale/failed login paths and asserts window visibility transitioned as expected. |
| 77 | + - Runs UI drift scenarios using selector overrides. |
| 78 | + - Cleans up all temporary artifacts. |
| 79 | + |
| 80 | +### Developer Ergonomics |
| 81 | + |
| 82 | +- `--update-selectors` test mode to record new stable selectors when UI drift is acknowledged by a developer. |
| 83 | +- `--show-browser` override to force headful during local debugging. |
| 84 | + |
| 85 | + |
0 commit comments