Skip to content

Commit 5cf0aad

Browse files
committed
Add browser agent example with session reuse
Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com>
1 parent 4507820 commit 5cf0aad

File tree

6 files changed

+692
-0
lines changed

6 files changed

+692
-0
lines changed

example/browser-agent/Dockerfile

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Browser Agent Image
2+
# Build context: repository root
3+
#
4+
# Build:
5+
# docker build -t browser-agent:latest -f example/browser-agent/Dockerfile .
6+
7+
FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim
8+
9+
WORKDIR /app
10+
11+
COPY example/browser-agent/requirements.txt ./
12+
RUN uv venv && uv pip install -r requirements.txt
13+
14+
COPY example/browser-agent/browser_agent.py ./
15+
16+
ENV PYTHONPATH="/app" \
17+
PYTHONDONTWRITEBYTECODE=1 \
18+
PYTHONUNBUFFERED=1
19+
20+
EXPOSE 8000
21+
22+
CMD ["uv", "run", "browser_agent.py"]

example/browser-agent/README.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Browser Agent with Playwright MCP Tool
2+
3+
> An AI-powered browser agent that handles web search and analysis requests,
4+
> using the official [Playwright MCP](https://github.com/microsoft/playwright-mcp)
5+
> tool running in an isolated AgentCube sandbox.
6+
7+
## Architecture
8+
9+
```
10+
┌───────────────┐ ┌────────────────┐ ┌───────────────────────────────┐
11+
│ Client │──HTTP──▶ Browser Agent │──HTTP──▶ Router (AgentCube) │
12+
│ (curl/SDK) │ │ (Deployment) │ │ session mgmt + JWT + proxy │
13+
└───────────────┘ └────────────────┘ └───────────────┬───────────────┘
14+
│ reverse proxy
15+
┌───────────────▼───────────────┐
16+
│ Playwright MCP Tool (sandbox) │
17+
│ AgentRuntime microVM pod │
18+
│ official MCP browser service │
19+
└───────────────────────────────┘
20+
```
21+
22+
### Components
23+
24+
| Component | Type | Image | Description |
25+
|-----------|------|-------|-------------|
26+
| **Playwright MCP Tool** | `AgentRuntime` CRD | `mcr.microsoft.com/playwright/mcp:latest` | Official Playwright MCP container from Microsoft. Runs as a real browser tool server in the sandbox, not as a custom in-repo agent. |
27+
| **Browser Agent** | `Deployment` | `browser-agent:latest` | LLM-powered orchestrator that receives user requests, plans browser tasks, and calls the Playwright MCP tool via the AgentCube Router. |
28+
29+
### How It Works
30+
31+
1. **User sends a request** (e.g., "Search for the latest Kubernetes release notes")
32+
2. **Browser Agent** uses an LLM to plan a concrete browser task
33+
3. **Browser Agent** connects to the Playwright MCP tool via the AgentCube Router
34+
4. **Router** provisions a sandbox pod (or reuses an existing session), signs a JWT, and proxies the request
35+
5. **Playwright MCP Tool** inside the sandbox exposes browser automation tools over MCP
36+
6. **Browser Agent** summarizes the result using the LLM and returns it to the user
37+
38+
Session reuse: the `session_id` returned in the first response can be passed in subsequent requests to reuse the same browser sandbox. The MCP server is started with `--shared-browser-context`, so repeated requests can keep the same browser state inside that sandbox.
39+
40+
## Prerequisites
41+
42+
- AgentCube deployed in a Kubernetes cluster (Router + Workload Manager running)
43+
- An OpenAI-compatible LLM API key
44+
- `kubectl` configured to access the cluster
45+
46+
## Quick Start
47+
48+
### 1. Create the API key secret
49+
50+
```bash
51+
kubectl create secret generic browser-agent-secrets \
52+
--from-literal=openai-api-key=<YOUR_API_KEY>
53+
```
54+
55+
### 2. Deploy the Playwright MCP Tool (AgentRuntime)
56+
57+
```bash
58+
# Create the AgentRuntime CRD using the official Microsoft image
59+
kubectl apply -f example/browser-agent/browser-use-tool.yaml
60+
```
61+
62+
### 3. Deploy the Browser Agent
63+
64+
```bash
65+
# Build the agent image (from repo root)
66+
docker build -t browser-agent:latest \
67+
-f example/browser-agent/Dockerfile .
68+
69+
# Deploy
70+
kubectl apply -f example/browser-agent/deployment.yaml
71+
```
72+
73+
### 4. Test
74+
75+
```bash
76+
# Port-forward to the agent
77+
kubectl port-forward deploy/browser-agent 8000:8000
78+
79+
# Send a search request
80+
curl -s http://localhost:8000/chat \
81+
-H 'Content-Type: application/json' \
82+
-d '{"message": "Search for the latest news about Kubernetes 1.33 release"}' \
83+
| python -m json.tool
84+
85+
# Reuse the same browser session (pass session_id from previous response)
86+
curl -s http://localhost:8000/chat \
87+
-H 'Content-Type: application/json' \
88+
-d '{"message": "Now find the deprecation list from the same release", "session_id": "<SESSION_ID>"}' \
89+
| python -m json.tool
90+
```
91+
92+
## Configuration
93+
94+
### Browser Agent (Deployment)
95+
96+
| Env Var | Default | Description |
97+
|---------|---------|-------------|
98+
| `OPENAI_API_KEY` | (required) | LLM API key |
99+
| `OPENAI_API_BASE` | `https://api.openai.com/v1` | LLM API base URL |
100+
| `OPENAI_MODEL` | `gpt-4o` | LLM model name |
101+
| `ROUTER_URL` | `http://router.agentcube.svc.cluster.local:8080` | AgentCube Router URL |
102+
| `PLAYWRIGHT_MCP_NAME` | `browser-use-tool` | Name of the Playwright MCP AgentRuntime CRD |
103+
| `PLAYWRIGHT_MCP_NAMESPACE` | `default` | Namespace of the AgentRuntime |
104+
| `BROWSER_TASK_TIMEOUT` | `300` | Timeout (seconds) for browser task execution |
105+
| `MAX_TOOL_ROUNDS` | `10` | Maximum LLM-to-tool interaction rounds |
106+
107+
### Playwright MCP Tool (AgentRuntime)
108+
109+
| Env Var | Default | Description |
110+
|---------|---------|-------------|
111+
| `--port` | `8931` | MCP HTTP endpoint port |
112+
| `--host` | `0.0.0.0` | Bind address |
113+
| `--shared-browser-context` | enabled | Reuse the same browser context for repeat clients in the same sandbox |
114+
| `--caps=vision` | enabled | Coordinate-based actions and screenshots |
115+
116+
## Files
117+
118+
```
119+
example/browser-agent/
120+
├── README.md # This file
121+
├── browser_agent.py # Browser Agent: LLM planner + MCP client
122+
├── browser-use-tool.yaml # AgentRuntime CRD for the Playwright MCP tool
123+
├── deployment.yaml # K8s Deployment for the browser agent
124+
├── Dockerfile # Dockerfile for browser agent
125+
├── requirements.txt # Python deps for browser agent
126+
```
127+
128+
## Why This Design
129+
130+
- `playwright-python` is a library, not a tool server. By itself it does not give AgentCube an MCP or HTTP endpoint to proxy.
131+
- `microsoft/playwright-mcp` is already a real browser tool server with official Docker packaging and HTTP transport support.
132+
- This removes the custom in-repo tool wrapper and keeps the sandboxed browser component as a pure tool.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
apiVersion: runtime.agentcube.volcano.sh/v1alpha1
2+
kind: AgentRuntime
3+
metadata:
4+
name: browser-use-tool
5+
namespace: default
6+
spec:
7+
targetPort:
8+
- pathPrefix: "/"
9+
port: 8931
10+
protocol: "HTTP"
11+
podTemplate:
12+
labels:
13+
app: browser-use-tool
14+
spec:
15+
containers:
16+
- name: playwright-mcp
17+
image: mcr.microsoft.com/playwright/mcp:latest
18+
imagePullPolicy: IfNotPresent
19+
args:
20+
- "--port"
21+
- "8931"
22+
- "--host"
23+
- "0.0.0.0"
24+
- "--allowed-hosts"
25+
- "*"
26+
- "--shared-browser-context"
27+
- "--caps=vision"
28+
ports:
29+
- containerPort: 8931
30+
protocol: TCP
31+
resources:
32+
requests:
33+
cpu: "500m"
34+
memory: "512Mi"
35+
limits:
36+
cpu: "1"
37+
memory: "1Gi"
38+
sessionTimeout: "30m"
39+
maxSessionDuration: "8h"

0 commit comments

Comments
 (0)