Warning
This is simply a proof of concept. Browserbase aims not to compete with web agents, but rather to provide all the necessary tools for anybody to build their own web agent. We strongly recommend you check out both Browserbase and our open source project Stagehand to build your own web agent.
First, install the dependencies for this repository. This requires npm
npm install
Next, copy the example environment variables:
cp .env.example .env.local
You'll need to set up your API keys:
- Get your OpenAI API key from OpenAI's dashboard
- Get your Browserbase API key and project ID from Browserbase
Update .env.local
with your API keys:
OPENAI_API_KEY
: Your OpenAI API keyBROWSERBASE_API_KEY
: Your Browserbase API keyBROWSERBASE_PROJECT_ID
: Your Browserbase project ID
Then, run the development server:
npm dev
Open http://localhost:3000 with your browser to see CUA Browser in action.
- Browserbase: Powers the core browser automation and interaction capabilities
- Stagehand: Handles precise DOM manipulation and state management
- Next.js: Provides the modern web framework foundation
- OpenAI: Enable natural language understanding and decision making
We welcome contributions! Whether it's:
- Adding new features
- Improving documentation
- Reporting bugs
- Suggesting enhancements
Please feel free to open issues and pull requests.
CUA Browser is open source software licensed under the MIT license.
This project is inspired by OpenAI's CUA feature and builds upon various open source technologies including Next.js, React, Browserbase, and Stagehand.
This is a TypeScript implementation of the Browserbase agent, which allows you to control browsers programmatically using the OpenAI API.
- Install dependencies:
npm install
- Create a
.env
file with your API keys:
OPENAI_API_KEY=your_openai_api_key
OPENAI_ORG=your_openai_org_id (optional)
BROWSERBASE_API_KEY=your_browserbase_api_key
BROWSERBASE_PROJECT_ID=your_browserbase_project_id
- Compile TypeScript:
npx tsc
Here's a basic example of how to use the agent:
import { Agent } from './app/api/agent/agent';
import { BrowserbaseBrowser } from './app/api/agent/browserbase';
async function main() {
// Initialize the browser
const browser = new BrowserbaseBrowser(1024, 768);
await browser.connect();
// Initialize the agent
const agent = new Agent(
"computer-use-preview-2025-02-04",
browser,
[],
(message) => {
console.log(`Safety check: ${message}`);
return true; // Acknowledge all safety checks
}
);
// Run the agent with a prompt
const result = await agent.runFullTurn([
{
role: "user",
content: [
{
type: "text",
text: "Go to google.com and search for 'Browserbase'"
}
]
}
], true, false, true);
// Disconnect the browser
await browser.disconnect();
}
main().catch(console.error);
agent.ts
: The main Agent class that handles interactions with the OpenAI APIbase_playwright.ts
: Base class for Playwright-based browser automationbrowserbase.ts
: Implementation of the Browserbase browserutils.ts
: Utility functions for API calls and image handling
playwright
: For browser automationaxios
: For making HTTP requestsdotenv
: For loading environment variables