You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/agent-browsers/spec.md
+2-11Lines changed: 2 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@
4
4
5
5
Defines a shared, cross‑platform convention for storing named browser profiles used by automated agents that require authenticated access to particular websites. A profile represents a persistent browser user data directory plus lightweight metadata that describes login expectations and provenance. This spec’s primary purpose is to make such profiles discoverable by applications while allowing users to transparently know which profile and authentication will be used by the application. The same profile name can be referenced by multiple applications. A default profile is used when none is specified.
6
6
7
-
#### Motivating example
7
+
#### Motivation
8
8
9
-
Multiple agentic applications (e.g., a research assistant, an issue triager, and an expense reporter) need to act on behalf of the user across several websites (e.g., `chatgpt.com`, `jira.example.com`, `expense.example.com`). Instead of each app asking the user to log in separately, they discover existing agent browser profiles by matching `loginExpectations` (site `id`/`origins`) and reuse the corresponding user data directories. Typically these apps run headless using a browser automation framework such as Playwright. When an expected login is not actually present, the app restarts the automation engine in a visible state so the user can complete the login, then resumes and finishes the task.
9
+
Multiple agentic applications (e.g., a research assistant, an issue triager, and an expense reporter) need to act on behalf of the user across several websites (e.g., `chatgpt.com`, `jira.example.com`, `expense.example.com`). Instead of each app asking the user to log in separately, they discover existing agent browser profiles by matching the sites/username metadata that each profile provides. Typically these apps run headless using a browser automation framework such as Playwright. When an expected login is not actually acomplished, the app restarts the automation engine in a visible state so the user can complete the login, then resumes and finishes the task.
10
10
11
11
If the app discovers multiple candidate profiles for the same website (for example, different `username` values), our guidance is to ask the user which profile to use for the current task. Applications should communicate profile names clearly and expose options to create new profiles or rename existing ones. Users are expected to become familiar with these profile names, which are reused across applications.
12
12
@@ -58,7 +58,6 @@ Format: JSON, UTF‑8. Unknown fields must be ignored for forward compatibility.
-`createdAt` / `updatedAt` (RFC3339 strings): For auditing.
74
73
-`createdBy` (array<string>): Application and version that created this profile, e.g., `["app-name", "v1.2.3"]`.
75
74
-`loginExpectations` (array): Zero or more per‑site discovery hints. Each entry:
76
-
-`id` (string): Stable identifier for the site (e.g., `chatgpt-com`).
77
75
-`origins` (array<string>): Allowed origins for the site (schemes required).
78
76
-`username` (string): Account identifier expected to be logged in (email, handle, or user ID).
79
77
Applications MAY include additional, application‑specific keys inside `loginExpectations` entries to support their own check mechanisms; such keys are not standardized by this spec.
80
78
81
-
Semantics:
82
-
- Applications MAY add engine‑specific data under `browsers/*` and MUST NOT modify fields they do not own.
83
-
- This spec does not define a login‑check format. Applications and libraries are expected to implement authentication checks in an application‑specific way and may publish reusable packages for popular sites.
84
-
- Recommended (non‑normative) UX guidance: start headless; if a check indicates login is required, relaunch the same user data directory headful to allow the user to complete login, then continue the task.
85
-
- Discoverability intent: when an application needs to act on a site (e.g., `chatgpt.com`), it can search for profiles with matching `loginExpectations.id`/`origins`. If multiple profiles exist with different `username` values, the application may select automatically per policy or prompt the user to choose which account to use for the task.
86
-
87
79
### Environment Variables
88
80
89
81
-`AGENT_BROWSER_PROFILES_DIR`: Absolute path override for the base directory.
@@ -93,6 +85,5 @@ Semantics:
93
85
94
86
- Profile contents may include cookies and tokens protected by OS keychains. Profiles generally do not port across different machines/OSes. Treat them as per‑user, per‑machine.
95
87
- Never commit profile directories to source control.
96
-
- Prefer role/aria selectors in `selector-present` checks to minimize locale‑specific fragility.
- In local mode, prepares a per-task workspace using snapshot preference order (ZFS > Btrfs > Overlay > copy) and launches the agent.
66
68
- In rest mode, calls `POST /api/v1/tasks` with the provided parameters.
67
69
- Creates/updates a local PID-like session record when launching locally (see “Local Discovery”).
68
70
- When `--browser-automation true` (default), launches site-specific browser automation (e.g., Codex) using the selected agent browser profile. When `false`, web automation is skipped.
71
+
- Codex integration: if `--browser-profile` is not specified, discovers or creates a ChatGPT profile per `docs/browser-automation/codex.md`, optionally filtered by `--chatgpt-username`. Workspace is taken from `--codex-workspace` or config; branch is taken from `--branch`.
69
72
- Branch autocompletion uses standard git protocol:
70
73
- Local mode: `git for-each-ref` on the repo; cached with debounce.
71
74
- REST mode: server uses `git ls-remote`/refs against admin-configured URL to populate its cache; CLI/Web query capability endpoints for suggestions.
Thanks for the clarifications. I’ll revise the specification to include:
2
+
## AW Configuration
3
+
4
+
### Overview
3
5
4
6
*`aw config` subcommand with Git-like interface for reading and updating configuration.
5
7
* Schema validation on both config file loading and CLI-based modification.
6
8
* Precedence for `~/.config` over `%APPDATA%` on Windows only when both are present.
7
-
* Motivation and support for tracking the origin of each configuration value, with use cases such as: debug-level log reporting, enforced setting explanation, and editor pre-fill messages.
9
+
* Motivation and support for tracking the origin of each configuration value, with use cases such as: debug-level log reporting, enforced setting explanation, and editor pre-fill mes
10
+
sages.
11
+
12
+
Layered configuration supports system, user, project, and project-user scopes. Values can also be supplied via environment variables and CLI flags. See `docs/cli-spec.md` for flag mappings.
13
+
14
+
### Keys
15
+
16
+
- browserAutomation.enabled: boolean — enable/disable site automation.
0 commit comments