Skip to content

Commit 53a1e07

Browse files
committed
Cloud agents PoC
1 parent e1c48f0 commit 53a1e07

File tree

25 files changed

+1804
-15
lines changed

25 files changed

+1804
-15
lines changed

apps/cloud-agents/.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.next
2+
3+
docker/postgres-data
4+
docker/redis-data

apps/cloud-agents/ARCHITECTURE.md

Lines changed: 299 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,299 @@
1+
# Cloud Agents Architecture
2+
3+
## Overview
4+
5+
Cloud Agents is a Next.js application that exposes an API for enqueueing Roo Code tasks via BullMQ, processes them in Docker containers, and integrates with GitHub webhooks to automatically fix issues.
6+
7+
## System Architecture
8+
9+
```mermaid
10+
flowchart TB
11+
GH[GitHub Webhook] --> API[Next.js API Routes]
12+
API --> BQ[BullMQ Queue]
13+
BQ --> JH[Job Handler]
14+
JH --> DC[Docker Container]
15+
DC --> RC[Roo Code Task Runner]
16+
17+
subgraph Infrastructure
18+
PG[(PostgreSQL)]
19+
RD[(Redis)]
20+
end
21+
22+
API --> PG
23+
BQ --> RD
24+
JH --> PG
25+
```
26+
27+
## Directory Structure
28+
29+
```
30+
apps/cloud-agents/
31+
├── src/
32+
│ ├── app/
33+
│ │ ├── api/
34+
│ │ │ ├── webhooks/
35+
│ │ │ │ └── github/
36+
│ │ │ │ └── route.ts # GitHub webhook handler
37+
│ │ │ ├── jobs/
38+
│ │ │ │ ├── route.ts # Create job endpoint
39+
│ │ │ │ └── [id]/
40+
│ │ │ │ └── route.ts # Get job status
41+
│ │ │ └── health/
42+
│ │ │ └── route.ts # Health check
43+
│ │ ├── layout.tsx
44+
│ │ └── page.tsx # Simple dashboard
45+
│ ├── lib/
46+
│ │ ├── db/
47+
│ │ │ ├── schema.ts # Cloud agents schema
48+
│ │ │ └── index.ts
49+
│ │ ├── queue/
50+
│ │ │ ├── processor.ts # BullMQ processor
51+
│ │ │ ├── jobs.ts # Job definitions
52+
│ │ │ └── index.ts
53+
│ │ ├── docker/
54+
│ │ │ ├── runner.ts # Docker container runner
55+
│ │ │ └── config.ts
56+
│ │ └── github/
57+
│ │ ├── webhooks.ts # Webhook handlers
58+
│ │ └── types.ts
59+
│ └── types/
60+
│ └── index.ts
61+
├── docker/
62+
│ ├── Dockerfile.agent # Simplified runner
63+
│ └── docker-compose.yml
64+
├── package.json
65+
├── tsconfig.json
66+
├── next.config.ts
67+
└── .env.example
68+
```
69+
70+
## Key Components
71+
72+
### 1. Database Schema (Drizzle ORM)
73+
74+
The cloud agents database extends the existing evals database with additional tables:
75+
76+
```typescript
77+
// Cloud agent specific tables
78+
- cloudJobs: Track job requests
79+
- id: integer (primary key)
80+
- type: text (e.g., 'github.issue.fix', 'task.execute')
81+
- status: text ('pending', 'processing', 'completed', 'failed')
82+
- payload: jsonb (job-specific data)
83+
- result: jsonb (job output)
84+
- error: text (error message if failed)
85+
- createdAt: timestamp
86+
- startedAt: timestamp
87+
- completedAt: timestamp
88+
89+
- cloudTasks: Link cloud jobs to Roo Code tasks
90+
- id: integer (primary key)
91+
- jobId: integer (references cloudJobs)
92+
- taskId: integer (references tasks from evals)
93+
- containerId: text (Docker container ID)
94+
- createdAt: timestamp
95+
```
96+
97+
### 2. BullMQ Job Types
98+
99+
```typescript
100+
interface JobTypes {
101+
"github.issue.fix": {
102+
repo: string // e.g., "RooCodeInc/Roo-Code"
103+
issue: number // Issue number
104+
title: string // Issue title
105+
body: string // Issue description
106+
labels?: string[] // Issue labels
107+
}
108+
109+
"task.execute": {
110+
prompt: string // Task prompt
111+
workspace?: string // Optional workspace path
112+
settings?: RooCodeSettings // Optional Roo Code settings override
113+
}
114+
}
115+
```
116+
117+
### 3. Simplified Docker Runner
118+
119+
The cloud agents Docker image is based on the existing `Dockerfile.runner` but simplified:
120+
121+
**Remove:**
122+
123+
- Language-specific VS Code extensions (Go, Java, Python, Rust)
124+
- Eval-specific dependencies and test infrastructure
125+
- UV/Python sync steps
126+
127+
**Keep:**
128+
129+
- Base Node.js environment
130+
- VS Code installation
131+
- Roo Code extension build and installation
132+
- Basic utilities (git, curl, etc.)
133+
- Docker CLI for nested container support
134+
135+
### 4. API Endpoints
136+
137+
#### `POST /api/webhooks/github`
138+
139+
Handles GitHub webhook events, specifically for issue events.
140+
141+
- Verifies webhook signature
142+
- Parses issue data
143+
- Creates appropriate job in queue
144+
145+
#### `POST /api/jobs`
146+
147+
Creates a new job in the queue.
148+
149+
```typescript
150+
Request: {
151+
type: keyof JobTypes
152+
payload: JobTypes[type]
153+
}
154+
Response: {
155+
id: string
156+
status: string
157+
}
158+
```
159+
160+
#### `GET /api/jobs/:id`
161+
162+
Retrieves job status and results.
163+
164+
```typescript
165+
Response: {
166+
id: string
167+
type: string
168+
status: string
169+
payload: object
170+
result?: object
171+
error?: string
172+
createdAt: string
173+
startedAt?: string
174+
completedAt?: string
175+
}
176+
```
177+
178+
#### `GET /api/health`
179+
180+
Health check endpoint for monitoring.
181+
182+
```typescript
183+
Response: {
184+
status: "ok" | "error"
185+
services: {
186+
database: boolean
187+
redis: boolean
188+
docker: boolean
189+
}
190+
}
191+
```
192+
193+
## Implementation Phases
194+
195+
### Phase 1: Core Infrastructure
196+
197+
1. Create Next.js app structure in `apps/cloud-agents`
198+
2. Set up database schema using Drizzle ORM
199+
3. Configure Docker compose with PostgreSQL & Redis
200+
4. Create simplified Dockerfile.agent
201+
202+
### Phase 2: Job Queue
203+
204+
1. Implement BullMQ setup and configuration
205+
2. Create job processor with proper error handling
206+
3. Add Docker container spawning logic
207+
4. Implement job status tracking in database
208+
209+
### Phase 3: API & Webhooks
210+
211+
1. Create all API route handlers
212+
2. Implement GitHub webhook signature verification
213+
3. Add issue parsing and automatic job creation
214+
4. Create simple status dashboard
215+
216+
### Phase 4: Testing & Deployment
217+
218+
1. Add integration tests for API endpoints
219+
2. Create production docker-compose configuration
220+
3. Add monitoring and structured logging
221+
4. Write comprehensive documentation
222+
223+
## Configuration
224+
225+
### Environment Variables
226+
227+
```env
228+
# Database
229+
DATABASE_URL=postgresql://postgres:password@localhost:5432/cloud_agents
230+
231+
# Redis
232+
REDIS_URL=redis://localhost:6379
233+
234+
# GitHub Integration
235+
GITHUB_WEBHOOK_SECRET=your-webhook-secret
236+
GITHUB_APP_ID=your-app-id
237+
GITHUB_PRIVATE_KEY=your-private-key-base64
238+
239+
# API Keys for Roo Code
240+
OPENROUTER_API_KEY=your-openrouter-key
241+
ANTHROPIC_API_KEY=your-anthropic-key
242+
243+
# Docker Configuration
244+
DOCKER_NETWORK=cloud-agents_default
245+
DOCKER_IMAGE=cloud-agents-runner:latest
246+
MAX_CONCURRENT_CONTAINERS=5
247+
248+
# Application
249+
PORT=3001
250+
NODE_ENV=development
251+
```
252+
253+
### Docker Compose Services
254+
255+
```yaml
256+
services:
257+
app:
258+
build: .
259+
ports:
260+
- "3001:3001"
261+
environment:
262+
- DATABASE_URL
263+
- REDIS_URL
264+
volumes:
265+
- /var/run/docker.sock:/var/run/docker.sock
266+
depends_on:
267+
- db
268+
- redis
269+
270+
db:
271+
image: postgres:17
272+
environment:
273+
- POSTGRES_PASSWORD=password
274+
- POSTGRES_DB=cloud_agents
275+
276+
redis:
277+
image: redis:7-alpine
278+
```
279+
280+
## Error Handling & Retry Logic
281+
282+
1. **Job Retries**: Failed jobs will be retried up to 3 times with exponential backoff
283+
2. **Container Timeouts**: Tasks have a 30-minute timeout by default
284+
3. **Resource Cleanup**: Containers are always cleaned up, even on failure
285+
4. **Dead Letter Queue**: Failed jobs after all retries go to DLQ for manual review
286+
287+
## Security Considerations
288+
289+
1. **Webhook Verification**: All GitHub webhooks are verified using HMAC
290+
2. **Container Isolation**: Each task runs in an isolated container
291+
3. **Resource Limits**: CPU and memory limits on containers
292+
4. **API Authentication**: Consider adding API key authentication for job creation
293+
294+
## Monitoring & Observability
295+
296+
1. **Metrics**: Job queue depth, processing time, success rate
297+
2. **Logging**: Structured logs for all job processing steps
298+
3. **Health Checks**: Regular checks on all dependent services
299+
4. **Alerts**: Notifications for failed jobs and system issues

0 commit comments

Comments
 (0)