WebPilot

Go browser automation library using WebDriver BiDi for real-time bidirectional communication with browsers, ideal for AI-assisted automation.

Overview

This project provides:

Component	Description
Go Client SDK	Programmatic browser control
MCP Server	159 tools across 20 namespaces for AI assistants
CLI	Command-line browser automation
Script Runner	Deterministic test execution
Session Recording	Capture actions as replayable scripts

Architecture

WebPilot uses a dual-protocol architecture connecting to a single Chrome browser via both WebDriver BiDi (through VibiumDev clicker) and Chrome DevTools Protocol (CDP):

┌────────────────────────────────────────────────────────────────┐
│                         webpilot                               │
├─────────────┬─────────────┬─────────────┬──────────────────────┤
│  Go Client  │ MCP Server  │    CLI      │   Script Runner      │
│    SDK      │ (159 tools) │  (webpilot) │   (webpilot run)     │
├─────────────┴─────────────┴─────────────┴──────────────────────┤
│                       Pilot Core                               │
│     ┌─────────────────────┐    ┌─────────────────────┐         │
│     │    BiDi Client      │    │     CDP Client      │         │
│     │  (page automation)  │    │ (profiling/network) │         │
│     └──────────┬──────────┘    └──────────┬──────────┘         │
│                │                          │                    │
├────────────────┼──────────────────────────┼────────────────────┤
│                ▼                          ▼                    │
│         VibiumDev Clicker          Chrome DevTools             │
│         (WebDriver BiDi)           (CDP WebSocket)             │
├────────────────────────────────────────────────────────────────┤
│                    Chrome / Chromium                           │
└────────────────────────────────────────────────────────────────┘

Why Dual-Protocol?

WebPilot combines two complementary protocols for complete browser control:

Protocol	Purpose	Strengths
WebDriver BiDi	Automation & Testing	Semantic selectors, real-time events, cross-browser potential, future-proof standard
Chrome DevTools Protocol	Inspection & Profiling	Heap profiling, network bodies, CPU/network emulation, coverage analysis

BiDi (via VibiumDev clicker) excels at:

Page automation (navigation, clicks, typing)
Semantic element finding (by role, label, text, testid)
Screenshots and accessibility trees
Tracing and session recording
Human-in-the-loop workflows (CAPTCHA, SSO)

CDP (direct connection) excels at:

Memory profiling (heap snapshots)
Network response body capture
Performance emulation (Slow 3G, CPU throttling)
Code coverage analysis
Low-level debugging

Both protocols connect to the same Chrome browser instance, allowing you to automate with BiDi while profiling with CDP simultaneously.

Installation

go get github.com/plexusone/webpilot

Quick Start

Go Client SDK

package main

import (
    "context"
    "log"

    "github.com/plexusone/webpilot"
)

func main() {
    ctx := context.Background()

    // Launch browser
    pilot, err := webpilot.Launch(ctx)
    if err != nil {
        log.Fatal(err)
    }
    defer pilot.Quit(ctx)

    // Navigate and interact
    pilot.Go(ctx, "https://example.com")

    link, _ := pilot.Find(ctx, "a", nil)
    link.Click(ctx, nil)
}

MCP Server

Start the MCP server for AI assistant integration:

webpilot mcp --headless

Configure in Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "webpilot": {
      "command": "webpilot",
      "args": ["mcp", "--headless"]
    }
  }
}

CLI Commands

# Launch browser and run commands
webpilot launch --headless
webpilot go https://example.com
webpilot fill "#email" "user@example.com"
webpilot click "#submit"
webpilot screenshot result.png
webpilot quit

Script Runner

Execute deterministic test scripts:

webpilot run test.json

Script format (JSON or YAML):

{
  "name": "Login Test",
  "steps": [
    {"action": "navigate", "url": "https://example.com/login"},
    {"action": "fill", "selector": "#email", "value": "user@example.com"},
    {"action": "fill", "selector": "#password", "value": "secret"},
    {"action": "click", "selector": "#submit"},
    {"action": "assertUrl", "expected": "https://example.com/dashboard"}
  ]
}

Feature Comparison

Client SDK

Feature	Status
Browser launch/quit	✅
Navigation (go, back, forward, reload)	✅
Element finding (CSS selectors)	✅
Click, type, fill	✅
Screenshots	✅
JavaScript evaluation	✅
Keyboard/mouse controllers	✅
Browser context management	✅
Network interception	✅
Tracing	✅
Clock control	✅

CDP Features (via Chrome DevTools Protocol)

Feature	Status
Heap snapshots	✅
Network emulation (Slow 3G, Fast 3G, 4G)	✅
CPU throttling	✅
Direct CDP command access	✅

Additional Features

Feature	Description
MCP Server	159 tools across 20 namespaces for AI-assisted automation
CLI	`webpilot` command with subcommands
Script Runner	Execute JSON/YAML test scripts
Session Recording	Capture MCP actions as replayable scripts
JSON Schema	Validated script format
Test Reporting	Structured test results with diagnostics

MCP Server Tools

The MCP server provides 159 tools across 20 namespaces. Export the full list as JSON with webpilot mcp --list-tools.

Namespaces:

Namespace	Tools	Examples
`accessibility_`	1	`accessibility_snapshot`
`browser_`	2	`browser_launch`, `browser_quit`
`cdp_`	20	`cdp_take_heap_snapshot`, `cdp_run_lighthouse`, `cdp_start_coverage`
`config_`	1	`config_get`
`console_`	2	`console_get_messages`, `console_clear`
`dialog_`	2	`dialog_handle`, `dialog_get`
`element_`	33	`element_click`, `element_fill`, `element_get_text`, `element_is_visible`
`frame_`	2	`frame_select`, `frame_select_main`
`human_`	1	`human_pause`
`input_`	12	`input_keyboard_press`, `input_mouse_click`, `input_touch_tap`
`js_`	4	`js_evaluate`, `js_add_script`, `js_add_style`, `js_init_script`
`network_`	6	`network_get_requests`, `network_route`, `network_set_offline`
`page_`	19	`page_navigate`, `page_go_back`, `page_screenshot`, `page_emulate_media`
`record_`	5	`record_start`, `record_stop`, `record_export`
`storage_`	17	`storage_get_cookies`, `storage_local_get`, `storage_session_set`
`tab_`	3	`tab_list`, `tab_select`, `tab_close`
`test_`	15	`test_assert_text`, `test_verify_value`, `test_generate_locator`
`trace_`	6	`trace_start`, `trace_stop`, `trace_chunk_start`
`video_`	2	`video_start`, `video_stop`
`wait_`	6	`wait_for_state`, `wait_for_url`, `wait_for_load`, `wait_for_text`

See docs/reference/mcp-tools.md for the complete reference.

Session Recording Workflow

Convert natural language test plans into deterministic scripts:

┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  Markdown Test   │     │   LLM + MCP      │     │   JSON Script    │
│  Plan (English)  │ ──▶ │   (exploration)  │ ──▶ │ (deterministic)  │
└──────────────────┘     └──────────────────┘     └──────────────────┘

Write test plan in Markdown
LLM executes via MCP with record_start
LLM explores, finds selectors, handles edge cases
Export with record_export to get JSON
Run deterministically with webpilot run

API Reference

See pkg.go.dev for full API documentation.

Key Types

// Launch browser
pilot, err := webpilot.Launch(ctx)
pilot, err := webpilot.LaunchHeadless(ctx)

// Navigation
pilot.Go(ctx, url)
pilot.Back(ctx)
pilot.Forward(ctx)
pilot.Reload(ctx)

// Finding elements by CSS selector
elem, err := pilot.Find(ctx, selector, nil)
elems, err := pilot.FindAll(ctx, selector, nil)

// Element interactions
elem.Click(ctx, nil)
elem.Fill(ctx, value, nil)
elem.Type(ctx, text, nil)

// Input controllers
pilot.Keyboard().Press(ctx, "Enter")
pilot.Mouse().Click(ctx, x, y)

// Capture
data, err := pilot.Screenshot(ctx)

Semantic Selectors

Find elements by accessibility attributes instead of brittle CSS selectors. This is especially useful for AI-assisted automation where element structure may change but semantics remain stable.

SDK Usage

// Find by ARIA role and text content
elem, err := pilot.Find(ctx, "", &webpilot.FindOptions{
    Role: "button",
    Text: "Submit",
})

// Find by label (for form inputs)
elem, err := pilot.Find(ctx, "", &webpilot.FindOptions{
    Label: "Email address",
})

// Find by placeholder
elem, err := pilot.Find(ctx, "", &webpilot.FindOptions{
    Placeholder: "Enter your email",
})

// Find by data-testid (recommended for testing)
elem, err := pilot.Find(ctx, "", &webpilot.FindOptions{
    TestID: "login-button",
})

// Combine CSS selector with semantic filtering
elem, err := pilot.Find(ctx, "form", &webpilot.FindOptions{
    Role: "textbox",
    Label: "Password",
})

// Find all buttons
buttons, err := pilot.FindAll(ctx, "", &webpilot.FindOptions{Role: "button"})

// Find element near another element
elem, err := pilot.Find(ctx, "", &webpilot.FindOptions{
    Role: "button",
    Near: "#username-input",
})

MCP Tool Usage

Semantic selectors work with element_click, element_type, element_fill, and element_press tools:

// Click a button by role and text
{"name": "element_click", "arguments": {"role": "button", "text": "Sign In"}}

// Fill input by label
{"name": "element_fill", "arguments": {"label": "Email", "value": "user@example.com"}}

// Type in input by placeholder
{"name": "element_type", "arguments": {"placeholder": "Search...", "text": "query"}}

// Click by data-testid
{"name": "element_click", "arguments": {"testid": "submit-btn"}}

Available Selectors

Selector	Description	Example
`role`	ARIA role	`button`, `textbox`, `link`, `checkbox`
`text`	Visible text content	`"Submit"`, `"Learn more"`
`label`	Associated label text	`"Email address"`, `"Password"`
`placeholder`	Input placeholder	`"Enter email"`
`testid`	`data-testid` attribute	`"login-btn"`
`alt`	Image alt text	`"Company logo"`
`title`	Element title attribute	`"Close dialog"`
`xpath`	XPath expression	`"//button[@type='submit']"`
`near`	CSS selector of nearby element	`"#username"`

Init Scripts

Inject JavaScript that runs before any page scripts on every navigation. Useful for mocking APIs, injecting test helpers, or setting up authentication.

SDK Usage

// Add init script to inject before page scripts
err := pilot.AddInitScript(ctx, `window.testMode = true;`)

// Mock an API
err := pilot.AddInitScript(ctx, `
    window.fetch = async (url, opts) => {
        if (url.includes('/api/user')) {
            return { json: () => ({ id: 1, name: 'Test User' }) };
        }
        return originalFetch(url, opts);
    };
`)

CLI Usage

# Inject scripts when launching
webpilot mcp --init-script=./mock-api.js --init-script=./test-helpers.js

# Or with the standalone binary
webpilot-mcp -init-script=./mock-api.js

MCP Tool Usage

{"name": "js_init_script", "arguments": {"script": "window.testMode = true;"}}

Storage State

Save and restore complete browser state including cookies, localStorage, and sessionStorage. Essential for maintaining login sessions across browser restarts.

SDK Usage

// Get complete storage state
state, err := pilot.StorageState(ctx)

// Save to file
jsonBytes, _ := json.Marshal(state)
os.WriteFile("auth-state.json", jsonBytes, 0600)

// Restore from file
var savedState webpilot.StorageState
json.Unmarshal(jsonBytes, &savedState)
err := pilot.SetStorageState(ctx, &savedState)

// Clear all storage
err := pilot.ClearStorage(ctx)

MCP Tool Usage

// Save session
{"name": "storage_get_state"}

// Restore session
{"name": "storage_set_state", "arguments": {"state": "<json from storage_get_state>"}}

// Clear all storage
{"name": "storage_clear_all"}

Tracing

Record browser actions with screenshots and DOM snapshots for debugging and test creation.

SDK Usage

// Start tracing
tracing := pilot.Tracing()
err := tracing.Start(ctx, &webpilot.TracingStartOptions{
    Screenshots: true,
    Snapshots:   true,
    Title:       "Login Flow Test",
})

// Perform actions...
pilot.Go(ctx, "https://example.com")
elem, _ := pilot.Find(ctx, "button", nil)
elem.Click(ctx, nil)

// Stop and save trace
data, err := tracing.Stop(ctx, nil)
os.WriteFile("trace.zip", data, 0600)

MCP Tool Usage

// Start trace
{"name": "trace_start", "arguments": {"screenshots": true, "title": "My Test"}}

// Stop and get trace data
{"name": "trace_stop", "arguments": {"path": "/tmp/trace.zip"}}

CDP Features (Chrome DevTools Protocol)

WebPilot provides direct CDP access for advanced profiling and emulation that isn't available through WebDriver BiDi.

Heap Snapshots

Capture V8 heap snapshots for memory profiling:

// Take heap snapshot
snapshot, err := pilot.TakeHeapSnapshot(ctx, "/tmp/snapshot.heapsnapshot")
fmt.Printf("Snapshot: %s (%d bytes)\n", snapshot.Path, snapshot.Size)

// Load in Chrome DevTools: Memory tab → Load

Network Emulation

Simulate various network conditions:

import "github.com/plexusone/webpilot/cdp"

// Throttle to Slow 3G
err := pilot.EmulateNetwork(ctx, cdp.NetworkSlow3G)

// Or use presets
err := pilot.EmulateNetwork(ctx, cdp.NetworkFast3G)
err := pilot.EmulateNetwork(ctx, cdp.Network4G)

// Custom conditions
err := pilot.EmulateNetwork(ctx, cdp.NetworkConditions{
    Latency:            100,  // ms
    DownloadThroughput: 500 * 1024,  // 500 KB/s
    UploadThroughput:   250 * 1024,  // 250 KB/s
})

// Clear emulation
err := pilot.ClearNetworkEmulation(ctx)

CPU Emulation

Simulate slower CPUs for performance testing:

import "github.com/plexusone/webpilot/cdp"

// 4x CPU slowdown (mid-tier mobile)
err := pilot.EmulateCPU(ctx, cdp.CPU4xSlowdown)

// Other presets
err := pilot.EmulateCPU(ctx, cdp.CPU2xSlowdown)
err := pilot.EmulateCPU(ctx, cdp.CPU6xSlowdown)

// Clear emulation
err := pilot.ClearCPUEmulation(ctx)

Direct CDP Access

For advanced use cases, access the CDP client directly:

if pilot.HasCDP() {
    cdpClient := pilot.CDP()

    // Send any CDP command
    result, err := cdpClient.Send(ctx, "Performance.getMetrics", nil)
}

Testing

# Unit tests
go test -v ./...

# Integration tests
go test -tags=integration -v ./integration/...

# Headless mode
WEBPILOT_HEADLESS=1 go test -tags=integration -v ./integration/...

Debug Logging

WEBPILOT_DEBUG=1 webpilot mcp

Related Projects

WebDriver BiDi - Protocol specification

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.github		.github
cdp		cdp
cmd		cmd
docs		docs
integration		integration
mcp		mcp
rpa		rpa
script		script
tests		tests
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
CHANGELOG.json		CHANGELOG.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
TASKS.md		TASKS.md
UPSTREAM.md		UPSTREAM.md
bidi.go		bidi.go
bidi_test.go		bidi_test.go
binary.go		binary.go
clicker.go		clicker.go
clock.go		clock.go
context.go		context.go
debug.go		debug.go
dialog.go		dialog.go
doc.go		doc.go
download.go		download.go
element.go		element.go
errors.go		errors.go
example_test.go		example_test.go
go.mod		go.mod
go.sum		go.sum
keyboard.go		keyboard.go
lighthouse.go		lighthouse.go
mkdocs.yml		mkdocs.yml
mouse.go		mouse.go
performance.go		performance.go
pilot.go		pilot.go
route.go		route.go
touch.go		touch.go
tracing.go		tracing.go
transport_pipe.go		transport_pipe.go
transport_ws.go		transport_ws.go
types.go		types.go
video.go		video.go
websocket.go		websocket.go

Folders and files

Latest commit

History

Repository files navigation

WebPilot

Overview

Architecture

Why Dual-Protocol?

Installation

Quick Start

Go Client SDK

MCP Server

CLI Commands

Script Runner

Feature Comparison

Client SDK

CDP Features (via Chrome DevTools Protocol)

Additional Features

MCP Server Tools

Session Recording Workflow

API Reference

Key Types

Semantic Selectors

SDK Usage

MCP Tool Usage

Available Selectors

Init Scripts

SDK Usage

CLI Usage

MCP Tool Usage

Storage State

SDK Usage

MCP Tool Usage

Tracing

SDK Usage

MCP Tool Usage

CDP Features (Chrome DevTools Protocol)

Heap Snapshots

Network Emulation

CPU Emulation

Direct CDP Access

Testing

Debug Logging

Related Projects

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Contributors

Uh oh!

Languages