Skip to content

Commit e38c13b

Browse files
shrey150claude
andauthored
[feat]: add browse CLI package (#1793)
## Summary - Adds the `@browserbasehq/browse-cli` package (`packages/cli`) to the stagehand monorepo, open-sourcing browser automation for AI agents - CLI provides stateful browser control via a daemon architecture — navigation, clicking, typing, screenshots, accessibility snapshots, multi-tab, network capture, and env switching (local/remote) - Uses `@browserbasehq/stagehand` as a workspace dependency (bundled into the CLI binary via tsup) - Includes full test suite and documentation ## Changes - `packages/cli/` — all CLI source code, config, tests, and docs - `pnpm-workspace.yaml` — added `packages/cli` to workspace - `.github/workflows/ci.yml` — added CLI path filters and build artifact uploads - `.changeset/open-source-browse-cli.md` — changeset for initial release - `pnpm-lock.yaml` — updated lockfile ## Test plan - [x] CLI builds successfully (`pnpm --filter @browserbasehq/browse-cli run build`) - [x] Full monorepo build passes (`turbo run build` — 9/9 tasks) - [x] `browse --help` and `browse --version` output correctly - [x] `browse status` returns valid JSON - [x] Lint passes clean (`pnpm --filter @browserbasehq/browse-cli run lint`) - [x] Source verified identical to stagent-cli (only import path changed) - [x] Empirically tested Browserbase credential requirements match core - [ ] Run `pnpm --filter @browserbasehq/browse-cli run test` (requires Chrome/browser environment) ## Known issues (pre-existing from stagent-cli, not introduced by this PR) - Network capture `response.json` always writes `status: 0` — response metadata from `responseReceived` CDP event is not persisted to `loadingFinished` handler - Ref-based `click` command silently ignores `--button`/`--count`/`--force` flags (coordinate-based `click_xy` handles them correctly) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent e36ddd6 commit e38c13b

File tree

15 files changed

+4486
-0
lines changed

15 files changed

+4486
-0
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@browserbasehq/browse-cli": minor
3+
---
4+
5+
Initial release of browse CLI - browser automation for AI agents

.github/workflows/ci.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ jobs:
4040
runs-on: ubuntu-latest
4141
outputs:
4242
core: ${{ steps.filter.outputs.core }}
43+
cli: ${{ steps.filter.outputs.cli }}
4344
evals: ${{ steps.filter.outputs.evals }}
4445
server: ${{ steps.filter.outputs.server }}
4546
docs-only: ${{ steps.filter.outputs.docs-only }}
@@ -83,6 +84,11 @@ jobs:
8384
- 'package.json'
8485
- 'pnpm-lock.yaml'
8586
- 'turbo.json'
87+
cli:
88+
- 'packages/cli/**'
89+
- 'packages/core/**'
90+
- 'package.json'
91+
- 'pnpm-lock.yaml'
8692
evals:
8793
- 'packages/evals/**'
8894
- 'package.json'
@@ -243,6 +249,7 @@ jobs:
243249
packages/core/lib/version.ts
244250
packages/core/lib/dom/build/**
245251
packages/core/lib/v3/dom/build/**
252+
packages/cli/dist/**
246253
packages/evals/dist/**
247254
packages/server-v3/dist/**
248255
packages/server-v3/openapi.v3.yaml

packages/cli/.prettierignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
dist/
2+
node_modules/
3+
*.json

packages/cli/.prettierrc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{}

packages/cli/README.md

Lines changed: 341 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,341 @@
1+
# Browse CLI
2+
3+
Browser automation CLI for AI agents. Built on [Stagehand](https://github.com/browserbase/stagehand), providing raw browser control without requiring LLM integration.
4+
5+
## Installation
6+
7+
```bash
8+
npm install -g @browserbasehq/browse-cli
9+
```
10+
11+
Requires Chrome/Chromium installed on the system.
12+
13+
## Quick Start
14+
15+
```bash
16+
# Navigate to a URL (auto-starts browser daemon)
17+
browse open https://example.com
18+
19+
# Take a snapshot to get element refs
20+
browse snapshot -c
21+
22+
# Click an element by ref
23+
browse click @0-5
24+
25+
# Type text
26+
browse type "Hello, world!"
27+
28+
# Take a screenshot
29+
browse screenshot ./page.png
30+
31+
# Stop the browser
32+
browse stop
33+
```
34+
35+
## How It Works
36+
37+
Browse uses a daemon architecture for fast, stateful interactions:
38+
39+
1. **First command** auto-starts a Chrome browser daemon
40+
2. **Subsequent commands** reuse the same browser session
41+
3. **State persists** between commands (cookies, refs, etc.)
42+
4. **Multiple sessions** supported via `--session` or `BROWSE_SESSION` env var
43+
44+
### Self-Healing Sessions
45+
46+
The CLI automatically recovers from stale sessions. If the daemon or Chrome crashes:
47+
1. Detects the failure
48+
2. Cleans up stale processes and files
49+
3. Restarts the daemon
50+
4. Retries the command
51+
52+
Agents don't need to handle recovery - commands "just work".
53+
54+
## Commands
55+
56+
### Navigation
57+
58+
```bash
59+
browse open <url> [--wait load|domcontentloaded|networkidle] [-t|--timeout ms]
60+
browse reload
61+
browse back
62+
browse forward
63+
```
64+
65+
The `--timeout` flag (default: 30000ms) controls how long to wait for the page load state. Use longer timeouts for slow-loading pages:
66+
67+
```bash
68+
browse open https://slow-site.com --timeout 60000
69+
```
70+
71+
### Click Actions
72+
73+
```bash
74+
browse click <ref> [-b left|right|middle] [-c count] # Click by ref (e.g., @0-5)
75+
browse click_xy <x> <y> [--button] [--xpath] # Click at coordinates
76+
```
77+
78+
### Coordinate Actions
79+
80+
```bash
81+
browse hover <x> <y> [--xpath]
82+
browse scroll <x> <y> <deltaX> <deltaY> [--xpath]
83+
browse drag <fromX> <fromY> <toX> <toY> [--steps n] [--xpath]
84+
```
85+
86+
### Keyboard
87+
88+
```bash
89+
browse type <text> [-d delay] [--mistakes]
90+
browse press <key> # e.g., Enter, Tab, Cmd+A
91+
```
92+
93+
### Forms
94+
95+
```bash
96+
browse fill <selector> <value> [--no-press-enter]
97+
browse select <selector> <values...>
98+
browse highlight <selector> [-d duration]
99+
```
100+
101+
### Page Info
102+
103+
```bash
104+
browse get url
105+
browse get title
106+
browse get text <selector>
107+
browse get html <selector>
108+
browse get value <selector>
109+
browse get box <selector> # Returns center coordinates
110+
111+
browse snapshot [-c|--compact] # Accessibility tree with refs
112+
browse screenshot [path] [-f|--full-page] [-t png|jpeg]
113+
```
114+
115+
### Waiting
116+
117+
```bash
118+
browse wait load [state]
119+
browse wait selector <selector> [-t timeout] [-s visible|hidden|attached|detached]
120+
browse wait timeout <ms>
121+
```
122+
123+
### Multi-Tab
124+
125+
```bash
126+
browse pages # List all tabs
127+
browse newpage [url] # Open new tab
128+
browse tab_switch <n> # Switch to tab by index
129+
browse tab_close [n] # Close tab (default: last)
130+
```
131+
132+
### Network Capture
133+
134+
Capture HTTP requests to the filesystem for inspection:
135+
136+
```bash
137+
browse network on # Start capturing requests
138+
browse network off # Stop capturing
139+
browse network path # Get capture directory path
140+
browse network clear # Clear captured requests
141+
```
142+
143+
Captured requests are saved as directories:
144+
145+
```
146+
/tmp/browse-default-network/
147+
001-GET-api.github.com-repos/
148+
request.json # method, url, headers, body
149+
response.json # status, headers, body, duration
150+
```
151+
152+
### Daemon Control
153+
154+
```bash
155+
browse start # Explicitly start daemon
156+
browse stop [--force] # Stop daemon
157+
browse status # Check daemon status
158+
browse env [target] # Show or switch environment: local | remote
159+
```
160+
161+
### Environment Switching (Local vs Remote)
162+
163+
Use environment switching when an agent should keep the same command flow, but the
164+
browser runtime needs to change:
165+
166+
- `local` runs Chrome on your machine (best for local debugging/dev loops)
167+
- `remote` runs a Browserbase session (best for anti-bot hardening and cloud runs)
168+
169+
```bash
170+
# Show active environment (if running) and desired environment for next start
171+
browse env
172+
173+
# Switch current session to Browserbase (restarts daemon if needed)
174+
browse env remote
175+
176+
# Switch back to local Chrome
177+
browse env local
178+
```
179+
180+
Behavior details:
181+
182+
- Environment is scoped per `--session`
183+
- `browse env <target>` persists an override and restarts the daemon
184+
- `browse stop` clears the override so next start falls back to env-var-based auto detection
185+
- Auto detection defaults to:
186+
- `remote` when `BROWSERBASE_API_KEY` and `BROWSERBASE_PROJECT_ID` are set
187+
- `local` otherwise
188+
189+
## Global Options
190+
191+
| Option | Description |
192+
|--------|-------------|
193+
| `--session <name>` | Session name for multiple browsers (default: "default") |
194+
| `--headless` | Run Chrome in headless mode |
195+
| `--headed` | Run Chrome with visible window (default) |
196+
| `--ws <url>` | Connect to existing Chrome via CDP WebSocket |
197+
| `--json` | Output as JSON |
198+
199+
## Environment Variables
200+
201+
| Variable | Description |
202+
|----------|-------------|
203+
| `BROWSE_SESSION` | Default session name (alternative to `--session`) |
204+
| `BROWSERBASE_API_KEY` | Browserbase API key (required for `browse env remote`) |
205+
| `BROWSERBASE_PROJECT_ID` | Browserbase project ID (required for `browse env remote`) |
206+
207+
## Element References
208+
209+
After running `browse snapshot`, you can reference elements by their ref ID:
210+
211+
```bash
212+
# Get snapshot with refs
213+
browse snapshot -c
214+
215+
# Output includes refs like [0-5], [1-2], etc.
216+
# RootWebArea "Example" url="https://example.com"
217+
# [0-0] link "Home"
218+
# [0-1] link "About"
219+
# [0-2] button "Sign In"
220+
221+
# Click using ref (multiple formats supported)
222+
browse click @0-2 # @ prefix
223+
browse click 0-2 # Plain ref
224+
browse click ref=0-2 # Explicit prefix
225+
```
226+
227+
The full snapshot output includes mappings:
228+
- **xpathMap**: Cross-frame XPath selectors
229+
- **cssMap**: Fast CSS selectors when available
230+
- **urlMap**: Extracted URLs from links
231+
232+
## Multiple Sessions
233+
234+
Run multiple browser instances simultaneously:
235+
236+
```bash
237+
# Terminal 1
238+
BROWSE_SESSION=session1 browse open https://google.com
239+
240+
# Terminal 2
241+
BROWSE_SESSION=session2 browse open https://github.com
242+
243+
# Or use --session flag
244+
browse --session work open https://slack.com
245+
browse --session personal open https://twitter.com
246+
```
247+
248+
## Direct CDP Connection
249+
250+
Connect to an existing Chrome instance:
251+
252+
```bash
253+
# Start Chrome with remote debugging
254+
google-chrome --remote-debugging-port=9222
255+
256+
# Connect via WebSocket
257+
browse --ws ws://localhost:9222/devtools/browser/... open https://example.com
258+
```
259+
260+
## Optimal AI Workflow
261+
262+
1. **Navigate** to target page (browser auto-starts)
263+
2. **Snapshot** to get the accessibility tree with refs
264+
3. **Click/Fill** using refs directly (e.g., `@0-5`)
265+
4. **Re-snapshot** after actions to verify state changes
266+
5. **Stop** when done
267+
268+
```bash
269+
browse open https://example.com
270+
browse snapshot -c
271+
# [0-5] textbox: Search
272+
# [0-8] button: Submit
273+
browse fill @0-5 "my query"
274+
browse click @0-8
275+
browse snapshot -c # Verify result
276+
browse stop
277+
```
278+
279+
## Troubleshooting
280+
281+
### Chrome not found
282+
283+
The CLI uses your system Chrome/Chromium. If not found:
284+
285+
```bash
286+
# macOS - Install Chrome or set path
287+
export CHROME_PATH=/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
288+
289+
# Linux - Install chromium
290+
sudo apt install chromium-browser
291+
```
292+
293+
### Stale daemon
294+
295+
If the daemon becomes unresponsive:
296+
297+
```bash
298+
browse stop --force
299+
```
300+
301+
### Permission denied on socket
302+
303+
```bash
304+
# Clean up stale socket files
305+
rm /tmp/browse-*.sock /tmp/browse-*.pid
306+
```
307+
308+
## Platform Support
309+
310+
- macOS (Intel and Apple Silicon)
311+
- Linux (x64 and arm64)
312+
313+
Windows support requires WSL or TCP socket implementation.
314+
315+
## Development
316+
317+
```bash
318+
# Clone and setup (in monorepo)
319+
cd packages/cli
320+
pnpm install # Install dependencies first!
321+
pnpm run build # Build the CLI
322+
323+
# Run without building (for development)
324+
pnpm run dev -- <command>
325+
326+
# Or with tsx directly
327+
npx tsx src/index.ts <command>
328+
329+
# Run linting and formatting
330+
pnpm run lint
331+
pnpm run format
332+
```
333+
334+
## License
335+
336+
MIT - see [LICENSE](./LICENSE)
337+
338+
## Related
339+
340+
- [Stagehand](https://github.com/browserbase/stagehand) - AI web browser automation framework
341+
- [Browserbase](https://browserbase.com) - Cloud browser infrastructure

0 commit comments

Comments
 (0)