Skip to content

Commit 929a7ee

Browse files
committed
Merge branch 'development'
2 parents d73f49e + e4be819 commit 929a7ee

24 files changed

+708
-46
lines changed

README.md

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,10 @@ _For an in-depth guide see [here](https://docs.bytebot.ai/deployment/railway)._
4646

4747
- Docker ≥ 20.10
4848
- Docker Compose
49-
- Anthropic API key ([get one here](https://console.anthropic.com))
49+
- AI API key from one of these providers:
50+
- Anthropic ([get one here](https://console.anthropic.com)) - Claude models
51+
- OpenAI ([get one here](https://platform.openai.com/api-keys)) - GPT models
52+
- Google ([get one here](https://makersuite.google.com/app/apikey)) - Gemini models
5053

5154
### Start Your Desktop Agent (2 minutes)
5255

@@ -55,7 +58,11 @@ _For an in-depth guide see [here](https://docs.bytebot.ai/deployment/railway)._
5558
```bash
5659
git clone https://github.com/bytebot-ai/bytebot.git
5760
cd bytebot
58-
echo "ANTHROPIC_API_KEY=your_api_key_here" > docker/.env
61+
62+
# Configure your AI provider (choose one):
63+
echo "ANTHROPIC_API_KEY=your_api_key_here" > docker/.env # For Claude
64+
# echo "OPENAI_API_KEY=your_api_key_here" > docker/.env # For OpenAI
65+
# echo "GOOGLE_API_KEY=your_api_key_here" > docker/.env # For Gemini
5966
```
6067

6168
2. **Start the agent stack:**
@@ -80,6 +87,16 @@ That's it! Start chatting with your AI desktop agent. Watch it work in real-time
8087
- "Download all PDFs from this website and organize them by date"
8188
- "Monitor this webpage and alert me when the price drops below $50"
8289

90+
## 🤖 Supported AI Models
91+
92+
Bytebot supports multiple AI providers to power your desktop agent:
93+
94+
- **Anthropic Claude**: Claude 3.5 Sonnet (default) - Best for complex reasoning and visual tasks
95+
- **OpenAI**: GPT-4, GPT-4o - Excellent for general automation tasks
96+
- **Google Gemini**: Gemini 1.5 Pro, Flash - Fast and efficient for routine tasks
97+
98+
Choose the model that best fits your needs and budget. Simply set the appropriate API key in your environment configuration.
99+
83100
## 🏗️ Architecture Overview
84101

85102
Bytebot consists of four main components working together:
@@ -99,7 +116,7 @@ Bytebot consists of four main components working together:
99116
│ WebSocket
100117
┌─────────────────────▼───────────────────────────────────────┐
101118
│ Bytebot Agent (NestJS) │
102-
│ • LLM integration
119+
│ • Multi-LLM integration (Claude/GPT/Gemini)
103120
│ • Task orchestration │
104121
│ • Action planning │
105122
└─────────────────────┬───────────────────────────────────────┘
@@ -158,8 +175,10 @@ Bytebot consists of four main components working together:
158175
Create `docker/.env`:
159176

160177
```bash
161-
# Required
162-
ANTHROPIC_API_KEY=sk-ant-...
178+
# Required - Choose one of these AI providers:
179+
ANTHROPIC_API_KEY=sk-ant-... # For Claude models
180+
# OPENAI_API_KEY=sk-... # For OpenAI models
181+
# GOOGLE_API_KEY=... # For Google Gemini models
163182
```
164183

165184
### Desktop Customization
@@ -182,7 +201,7 @@ COPY configs/.config /home/user/.config
182201

183202
## 🔒 Security Considerations
184203

185-
- **API Key**: Keep your Anthropic API key secure and never commit it
204+
- **API Keys**: Keep your AI provider API keys secure and never commit them
186205
- **Network**: By default, services are only accessible from localhost
187206
- **VNC**: Change the default VNC password for production use
188207
- **Updates**: Regularly update the container images for security patches
@@ -325,6 +344,8 @@ Built with amazing open source projects:
325344

326345
- [nutjs](https://github.com/nut-tree/nut.js) - Desktop automation framework
327346
- [Anthropic Claude](https://www.anthropic.com) - AI reasoning engine
347+
- [OpenAI](https://openai.com) - GPT models for automation
348+
- [Google AI](https://ai.google.dev) - Gemini models for efficient tasks
328349
- [noVNC](https://novnc.com) - Browser-based VNC client
329350
- Inspired by Anthropic's [computer-use demo](https://github.com/anthropics/anthropic-quickstarts)
330351

docker/docker-compose.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ services:
4949
- DATABASE_URL=${DATABASE_URL:-postgresql://postgres:postgres@postgres:5432/bytebotdb}
5050
- BYTEBOT_DESKTOP_BASE_URL=${BYTEBOT_DESKTOP_BASE_URL:-http://bytebot-desktop:9990}
5151
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
52+
- OPENAI_API_KEY=${OPENAI_API_KEY}
53+
- GOOGLE_API_KEY=${GOOGLE_API_KEY}
5254
depends_on:
5355
- postgres
5456
networks:

docs/core-concepts/agent-system.mdx

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,26 @@ The Bytebot Agent System transforms a simple desktop container into an intellige
1515

1616
## How the AI Agent Works
1717

18-
### The Brain: Claude AI Integration
18+
### The Brain: Multi-Model AI Integration
1919

20-
At the heart of Bytebot is Claude, Anthropic's advanced AI assistant. The agent:
20+
At the heart of Bytebot is a flexible AI integration that supports multiple models. Choose the AI that best fits your needs:
21+
22+
**Anthropic Claude** (Default):
23+
- Best for complex reasoning and visual understanding
24+
- Excellent at following detailed instructions
25+
- Superior performance on desktop automation tasks
26+
27+
**OpenAI GPT Models**:
28+
- Fast and reliable for general automation
29+
- Strong code understanding and generation
30+
- Cost-effective for routine tasks
31+
32+
**Google Gemini**:
33+
- Efficient for high-volume tasks
34+
- Good balance of speed and capability
35+
- Excellent multilingual support
36+
37+
The agent with any model:
2138

2239
1. **Understands Context**: Processes your natural language requests with full conversation history
2340
2. **Plans Actions**: Breaks down complex tasks into executable computer actions
@@ -31,7 +48,7 @@ At the heart of Bytebot is Claude, Anthropic's advanced AI assistant. The agent:
3148
"Research competitors for my SaaS product and create a comparison table"
3249
</Step>
3350
<Step title="AI Plans the Approach">
34-
Claude understands the request and plans: open browser → search → visit sites → extract data → create document
51+
The AI model understands the request and plans: open browser → search → visit sites → extract data → create document
3552
</Step>
3653
<Step title="Executes Actions">
3754
The agent controls the desktop: clicking, typing, taking screenshots, reading content
@@ -119,7 +136,7 @@ The agent processes tasks intelligently:
119136
### Core Components
120137

121138
1. **NestJS Agent Service**
122-
- Integrates with Anthropic API
139+
- Integrates with multiple AI provider APIs (Anthropic, OpenAI, Google)
123140
- Handles WebSocket connections
124141
- Coordinates with desktop API
125142

@@ -194,7 +211,7 @@ The web interface provides:
194211

195212
### Data Isolation
196213
- All processing happens in your infrastructure
197-
- No data sent to external services (except Claude API)
214+
- No data sent to external services (except your chosen AI provider API)
198215
- Conversations stored locally
199216
- Complete audit trail
200217

@@ -223,17 +240,18 @@ The web interface provides:
223240

224241
<AccordionGroup>
225242
<Accordion title="Agent not responding">
226-
- Check Anthropic API key is valid
243+
- Check your AI provider API key is valid
227244
- Verify agent service is running
228245
- Review logs for errors
229-
- Ensure sufficient API credits
246+
- Ensure sufficient API credits/quota with your provider
230247
</Accordion>
231248

232249
<Accordion title="Slow task execution">
233250
- Monitor system resources
234251
- Check network latency
235252
- Reduce screenshot frequency
236-
- Optimize Claude prompts
253+
- Optimize AI prompts for your chosen model
254+
- Consider switching to a faster model (e.g., Gemini Flash)
237255
</Accordion>
238256
</AccordionGroup>
239257

docs/core-concepts/architecture.mdx

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ The foundation of the system - a containerized Linux desktop that provides:
2424
- **XFCE4 Desktop** for a lightweight, responsive UI
2525
- **bytebotd Daemon** - The automation service built on nutjs that executes computer actions
2626
- **Pre-installed Applications**: Firefox ESR, Thunderbird, text editors, and development tools
27-
- **VNC & noVNC** for remote desktop access
27+
- **noVNC** for remote desktop access
2828

2929
**Key Features:**
3030
- Runs completely isolated from your host system
@@ -139,7 +139,6 @@ graph LR
139139
- **Desktop API**: No authentication by default (localhost only). Supports REST and MCP.
140140
- **Agent API**: Can be secured with API keys
141141
- **Database**: Password protected, not exposed externally
142-
- **VNC Access**: Optional password protection
143142

144143
<Warning>
145144
Default configuration is for development. For production:

docs/core-concepts/desktop-environment.mdx

Lines changed: 5 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -39,27 +39,24 @@ The Bytebot Desktop Environment is a fully functional Linux desktop running insi
3939
- **Ubuntu 22.04 LTS**: Stable, well-supported Linux distribution
4040
- **XFCE4 Desktop**: Lightweight, responsive desktop environment
4141
- **X11 Display Server**: Standard Linux graphics system
42-
- **SystemD**: Modern service management
42+
- **supervisord**: Service management
4343

4444
### Pre-installed Software
4545

4646
<CardGroup cols={2}>
4747
<Card title="Web Browsers" icon="globe">
4848
- Firefox ESR (Extended Support Release)
49-
- Chrome/Chromium available
5049
- Pre-configured for automation
5150
- Ad blocker extensions
5251
</Card>
5352
<Card title="Office Tools" icon="file-lines">
5453
- Text editors (nano, vim, gedit)
55-
- LibreOffice suite (optional)
56-
- PDF viewers
54+
- PDF viewer
5755
- File managers
5856
</Card>
5957
<Card title="Communication" icon="envelope">
6058
- Thunderbird email client
6159
- Chat applications
62-
- Video conferencing tools
6360
- Terminal emulators
6461
</Card>
6562
<Card title="Development" icon="code">
@@ -78,17 +75,12 @@ The Bytebot Desktop Environment is a fully functional Linux desktop running insi
7875
- Built on nutjs framework
7976
- Provides REST API
8077

81-
2. **VNC Server**
82-
- TigerVNC for remote access
83-
- Configurable resolution
84-
- Multiple connection support
85-
86-
3. **noVNC Web Client**
78+
2. **noVNC Web Client**
8779
- Browser-based desktop access
8880
- No client installation needed
8981
- WebSocket proxy included
9082

91-
4. **Supervisor**
83+
3. **Supervisor**
9284
- Process management
9385
- Service monitoring
9486
- Automatic restarts
@@ -164,7 +156,7 @@ curl -X POST http://localhost:9990/api/computer \
164156
# Move mouse
165157
curl -X POST http://localhost:9990/api/computer \
166158
-H "Content-Type: application/json" \
167-
-d '{"action": "move_mouse", "coordinate": [500, 300]}'
159+
-d '{"action": "move_mouse", "coordinate": {"x": 500, "y": 300}}'
168160
```
169161

170162
## Customization

docs/deployment/railway.mdx

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: "Deploying Bytebot on Railway"
33
description: "Comprehensive guide to deploying the full Bytebot stack on Railway using the official 1-click template"
44
---
55

6-
> **TL;DR –** Click the button below, add your Anthropic key, and your personal Bytebot instance will be live in ~2 minutes.
6+
> **TL;DR –** Click the button below, add your AI API key (Anthropic, OpenAI, or Google), and your personal Bytebot instance will be live in ~2 minutes.
77
88
[![Deploy on Railway](https://railway.com/button.svg)](https://railway.com/deploy/bytebot?referralCode=L9lKXQ)
99

@@ -35,7 +35,12 @@ All internal traffic flows through Railway’s [private networking](https://docs
3535
Click the **Deploy on Railway** button above or visit [https://railway.com/deploy/bytebot?referralCode=L9lKXQ](https://railway.com/deploy/bytebot?referralCode=L9lKXQ).
3636
</Step>
3737
<Step title="2. Configure Environment">
38-
Railway automatically detects required variables. Paste your **Anthropic API key** into `ANTHROPIC_API_KEY` and keep other defaults.
38+
Railway automatically detects required variables. Add your AI API key (choose one):
39+
- **Anthropic**: Paste into `ANTHROPIC_API_KEY` for Claude models
40+
- **OpenAI**: Paste into `OPENAI_API_KEY` for GPT models
41+
- **Google**: Paste into `GOOGLE_API_KEY` for Gemini models
42+
43+
Keep other defaults as is.
3944
</Step>
4045
<Step title="3. Kick off the Deployment">
4146
Press **Deploy**. Railway will pull the pre-built images, create the Postgres database and link all services on a private network.
@@ -73,7 +78,7 @@ All internal traffic flows through Railway’s [private networking](https://docs
7378
| Symptom | Likely Cause | Fix |
7479
| ------- | ------------ | ---- |
7580
| Web UI shows “connecting…” | Desktop not ready or private networking mis-config | Wait for `bytebot-desktop` container to finish starting, or restart service |
76-
| Agent errors `401 Anthropic` | Missing/invalid API key | Re-enter `ANTHROPIC_API_KEY` in Railway variables |
81+
| Agent errors `401` or `403` | Missing/invalid API key | Re-enter your AI provider's API key in Railway variables |
7782
| Slow desktop video | Free Railway plan throttling | Upgrade plan or reduce screen resolution in desktop settings |
7883

7984
---

docs/images/agent-architecture.png

-818 Bytes
Loading

docs/images/core-container.png

2.34 KB
Loading

docs/introduction.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ Bytebot consists of four integrated components working together:
9595
Ubuntu 22.04 with XFCE4, pre-installed apps, and the automation daemon
9696
</Card>
9797
<Card title="AI Agent" icon="brain" href="/core-concepts/agent-system">
98-
NestJS service that uses an LLM to plan and execute tasks
98+
NestJS service that uses an LLM (Claude, GPT, or Gemini) to plan and execute tasks
9999
</Card>
100100
<Card
101101
title="Task Interface"

docs/quickstart.mdx

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,10 @@ description: "Get your AI desktop agent running in 2 minutes"
3434
- Docker ≥ 20.10
3535
- Docker Compose
3636
- 4GB+ RAM available
37-
- Anthropic API key ([get one here](https://console.anthropic.com))
37+
- AI API key from one of these providers:
38+
- Anthropic ([get one here](https://console.anthropic.com)) - Claude models
39+
- OpenAI ([get one here](https://platform.openai.com/api-keys)) - GPT models
40+
- Google ([get one here](https://makersuite.google.com/app/apikey)) - Gemini models
3841

3942
## 🚀 2-Minute Setup
4043

@@ -43,7 +46,13 @@ Get your self-hosted AI desktop agent running with just three commands:
4346
<Steps>
4447
<Step title="Clone and Configure">
4548
```bash
46-
git clone https://github.com/bytebot-ai/bytebot.git cd bytebot echo "ANTHROPIC_API_KEY=your_api_key_here" > docker/.env
49+
git clone https://github.com/bytebot-ai/bytebot.git
50+
cd bytebot
51+
52+
# Configure your AI provider (choose one):
53+
echo "ANTHROPIC_API_KEY=your_api_key_here" > docker/.env # For Claude
54+
# echo "OPENAI_API_KEY=your_api_key_here" > docker/.env # For OpenAI
55+
# echo "GOOGLE_API_KEY=your_api_key_here" > docker/.env # For Gemini
4756
```
4857
</Step>
4958

@@ -224,6 +233,7 @@ curl -X POST http://localhost:9990/api/computer \
224233
cat docker/.env
225234
docker-compose -f docker/docker-compose.yml logs bytebot-agent
226235
```
236+
Ensure you're using a valid API key from Anthropic, OpenAI, or Google.
227237
</Accordion>
228238
</AccordionGroup>
229239

0 commit comments

Comments
 (0)