|
| 1 | +# Cloud Agents Architecture |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Cloud Agents is a Next.js application that exposes an API for enqueueing Roo Code tasks via BullMQ, processes them in Docker containers, and integrates with GitHub webhooks to automatically fix issues. |
| 6 | + |
| 7 | +## System Architecture |
| 8 | + |
| 9 | +```mermaid |
| 10 | +flowchart TB |
| 11 | + GH[GitHub Webhook] --> API[Next.js API Routes] |
| 12 | + API --> BQ[BullMQ Queue] |
| 13 | + BQ --> JH[Job Handler] |
| 14 | + JH --> DC[Docker Container] |
| 15 | + DC --> RC[Roo Code Task Runner] |
| 16 | +
|
| 17 | + subgraph Infrastructure |
| 18 | + PG[(PostgreSQL)] |
| 19 | + RD[(Redis)] |
| 20 | + end |
| 21 | +
|
| 22 | + API --> PG |
| 23 | + BQ --> RD |
| 24 | + JH --> PG |
| 25 | +``` |
| 26 | + |
| 27 | +## Directory Structure |
| 28 | + |
| 29 | +``` |
| 30 | +apps/cloud-agents/ |
| 31 | +├── src/ |
| 32 | +│ ├── app/ |
| 33 | +│ │ ├── api/ |
| 34 | +│ │ │ ├── webhooks/ |
| 35 | +│ │ │ │ └── github/ |
| 36 | +│ │ │ │ └── route.ts # GitHub webhook handler |
| 37 | +│ │ │ ├── jobs/ |
| 38 | +│ │ │ │ ├── route.ts # Create job endpoint |
| 39 | +│ │ │ │ └── [id]/ |
| 40 | +│ │ │ │ └── route.ts # Get job status |
| 41 | +│ │ │ └── health/ |
| 42 | +│ │ │ └── route.ts # Health check |
| 43 | +│ │ ├── layout.tsx |
| 44 | +│ │ └── page.tsx # Simple dashboard |
| 45 | +│ ├── lib/ |
| 46 | +│ │ ├── db/ |
| 47 | +│ │ │ ├── schema.ts # Cloud agents schema |
| 48 | +│ │ │ └── index.ts |
| 49 | +│ │ ├── queue/ |
| 50 | +│ │ │ ├── processor.ts # BullMQ processor |
| 51 | +│ │ │ ├── jobs.ts # Job definitions |
| 52 | +│ │ │ └── index.ts |
| 53 | +│ │ ├── docker/ |
| 54 | +│ │ │ ├── runner.ts # Docker container runner |
| 55 | +│ │ │ └── config.ts |
| 56 | +│ │ └── github/ |
| 57 | +│ │ ├── webhooks.ts # Webhook handlers |
| 58 | +│ │ └── types.ts |
| 59 | +│ └── types/ |
| 60 | +│ └── index.ts |
| 61 | +├── docker/ |
| 62 | +│ ├── Dockerfile.agent # Simplified runner |
| 63 | +│ └── docker-compose.yml |
| 64 | +├── package.json |
| 65 | +├── tsconfig.json |
| 66 | +├── next.config.ts |
| 67 | +└── .env.example |
| 68 | +``` |
| 69 | + |
| 70 | +## Key Components |
| 71 | + |
| 72 | +### 1. Database Schema (Drizzle ORM) |
| 73 | + |
| 74 | +The cloud agents database extends the existing evals database with additional tables: |
| 75 | + |
| 76 | +```typescript |
| 77 | +// Cloud agent specific tables |
| 78 | +- cloudJobs: Track job requests |
| 79 | + - id: integer (primary key) |
| 80 | + - type: text (e.g., 'github.issue.fix', 'task.execute') |
| 81 | + - status: text ('pending', 'processing', 'completed', 'failed') |
| 82 | + - payload: jsonb (job-specific data) |
| 83 | + - result: jsonb (job output) |
| 84 | + - error: text (error message if failed) |
| 85 | + - createdAt: timestamp |
| 86 | + - startedAt: timestamp |
| 87 | + - completedAt: timestamp |
| 88 | + |
| 89 | +- cloudTasks: Link cloud jobs to Roo Code tasks |
| 90 | + - id: integer (primary key) |
| 91 | + - jobId: integer (references cloudJobs) |
| 92 | + - taskId: integer (references tasks from evals) |
| 93 | + - containerId: text (Docker container ID) |
| 94 | + - createdAt: timestamp |
| 95 | +``` |
| 96 | + |
| 97 | +### 2. BullMQ Job Types |
| 98 | + |
| 99 | +```typescript |
| 100 | +interface JobTypes { |
| 101 | + "github.issue.fix": { |
| 102 | + repo: string // e.g., "RooCodeInc/Roo-Code" |
| 103 | + issue: number // Issue number |
| 104 | + title: string // Issue title |
| 105 | + body: string // Issue description |
| 106 | + labels?: string[] // Issue labels |
| 107 | + } |
| 108 | + |
| 109 | + "task.execute": { |
| 110 | + prompt: string // Task prompt |
| 111 | + workspace?: string // Optional workspace path |
| 112 | + settings?: RooCodeSettings // Optional Roo Code settings override |
| 113 | + } |
| 114 | +} |
| 115 | +``` |
| 116 | + |
| 117 | +### 3. Simplified Docker Runner |
| 118 | + |
| 119 | +The cloud agents Docker image is based on the existing `Dockerfile.runner` but simplified: |
| 120 | + |
| 121 | +**Remove:** |
| 122 | + |
| 123 | +- Language-specific VS Code extensions (Go, Java, Python, Rust) |
| 124 | +- Eval-specific dependencies and test infrastructure |
| 125 | +- UV/Python sync steps |
| 126 | + |
| 127 | +**Keep:** |
| 128 | + |
| 129 | +- Base Node.js environment |
| 130 | +- VS Code installation |
| 131 | +- Roo Code extension build and installation |
| 132 | +- Basic utilities (git, curl, etc.) |
| 133 | +- Docker CLI for nested container support |
| 134 | + |
| 135 | +### 4. API Endpoints |
| 136 | + |
| 137 | +#### `POST /api/webhooks/github` |
| 138 | + |
| 139 | +Handles GitHub webhook events, specifically for issue events. |
| 140 | + |
| 141 | +- Verifies webhook signature |
| 142 | +- Parses issue data |
| 143 | +- Creates appropriate job in queue |
| 144 | + |
| 145 | +#### `POST /api/jobs` |
| 146 | + |
| 147 | +Creates a new job in the queue. |
| 148 | + |
| 149 | +```typescript |
| 150 | +Request: { |
| 151 | + type: keyof JobTypes |
| 152 | + payload: JobTypes[type] |
| 153 | +} |
| 154 | +Response: { |
| 155 | + id: string |
| 156 | + status: string |
| 157 | +} |
| 158 | +``` |
| 159 | + |
| 160 | +#### `GET /api/jobs/:id` |
| 161 | + |
| 162 | +Retrieves job status and results. |
| 163 | + |
| 164 | +```typescript |
| 165 | +Response: { |
| 166 | + id: string |
| 167 | + type: string |
| 168 | + status: string |
| 169 | + payload: object |
| 170 | + result?: object |
| 171 | + error?: string |
| 172 | + createdAt: string |
| 173 | + startedAt?: string |
| 174 | + completedAt?: string |
| 175 | +} |
| 176 | +``` |
| 177 | + |
| 178 | +#### `GET /api/health` |
| 179 | + |
| 180 | +Health check endpoint for monitoring. |
| 181 | + |
| 182 | +```typescript |
| 183 | +Response: { |
| 184 | + status: "ok" | "error" |
| 185 | + services: { |
| 186 | + database: boolean |
| 187 | + redis: boolean |
| 188 | + docker: boolean |
| 189 | + } |
| 190 | +} |
| 191 | +``` |
| 192 | + |
| 193 | +## Implementation Phases |
| 194 | + |
| 195 | +### Phase 1: Core Infrastructure |
| 196 | + |
| 197 | +1. Create Next.js app structure in `apps/cloud-agents` |
| 198 | +2. Set up database schema using Drizzle ORM |
| 199 | +3. Configure Docker compose with PostgreSQL & Redis |
| 200 | +4. Create simplified Dockerfile.agent |
| 201 | + |
| 202 | +### Phase 2: Job Queue |
| 203 | + |
| 204 | +1. Implement BullMQ setup and configuration |
| 205 | +2. Create job processor with proper error handling |
| 206 | +3. Add Docker container spawning logic |
| 207 | +4. Implement job status tracking in database |
| 208 | + |
| 209 | +### Phase 3: API & Webhooks |
| 210 | + |
| 211 | +1. Create all API route handlers |
| 212 | +2. Implement GitHub webhook signature verification |
| 213 | +3. Add issue parsing and automatic job creation |
| 214 | +4. Create simple status dashboard |
| 215 | + |
| 216 | +### Phase 4: Testing & Deployment |
| 217 | + |
| 218 | +1. Add integration tests for API endpoints |
| 219 | +2. Create production docker-compose configuration |
| 220 | +3. Add monitoring and structured logging |
| 221 | +4. Write comprehensive documentation |
| 222 | + |
| 223 | +## Configuration |
| 224 | + |
| 225 | +### Environment Variables |
| 226 | + |
| 227 | +```env |
| 228 | +# Database |
| 229 | +DATABASE_URL=postgresql://postgres:password@localhost:5432/cloud_agents |
| 230 | +
|
| 231 | +# Redis |
| 232 | +REDIS_URL=redis://localhost:6379 |
| 233 | +
|
| 234 | +# GitHub Integration |
| 235 | +GITHUB_WEBHOOK_SECRET=your-webhook-secret |
| 236 | +GITHUB_APP_ID=your-app-id |
| 237 | +GITHUB_PRIVATE_KEY=your-private-key-base64 |
| 238 | +
|
| 239 | +# API Keys for Roo Code |
| 240 | +OPENROUTER_API_KEY=your-openrouter-key |
| 241 | +ANTHROPIC_API_KEY=your-anthropic-key |
| 242 | +
|
| 243 | +# Docker Configuration |
| 244 | +DOCKER_NETWORK=cloud-agents_default |
| 245 | +DOCKER_IMAGE=cloud-agents-runner:latest |
| 246 | +MAX_CONCURRENT_CONTAINERS=5 |
| 247 | +
|
| 248 | +# Application |
| 249 | +PORT=3001 |
| 250 | +NODE_ENV=development |
| 251 | +``` |
| 252 | + |
| 253 | +### Docker Compose Services |
| 254 | + |
| 255 | +```yaml |
| 256 | +services: |
| 257 | + app: |
| 258 | + build: . |
| 259 | + ports: |
| 260 | + - "3001:3001" |
| 261 | + environment: |
| 262 | + - DATABASE_URL |
| 263 | + - REDIS_URL |
| 264 | + volumes: |
| 265 | + - /var/run/docker.sock:/var/run/docker.sock |
| 266 | + depends_on: |
| 267 | + - db |
| 268 | + - redis |
| 269 | + |
| 270 | + db: |
| 271 | + image: postgres:17 |
| 272 | + environment: |
| 273 | + - POSTGRES_PASSWORD=password |
| 274 | + - POSTGRES_DB=cloud_agents |
| 275 | + |
| 276 | + redis: |
| 277 | + image: redis:7-alpine |
| 278 | +``` |
| 279 | +
|
| 280 | +## Error Handling & Retry Logic |
| 281 | +
|
| 282 | +1. **Job Retries**: Failed jobs will be retried up to 3 times with exponential backoff |
| 283 | +2. **Container Timeouts**: Tasks have a 30-minute timeout by default |
| 284 | +3. **Resource Cleanup**: Containers are always cleaned up, even on failure |
| 285 | +4. **Dead Letter Queue**: Failed jobs after all retries go to DLQ for manual review |
| 286 | +
|
| 287 | +## Security Considerations |
| 288 | +
|
| 289 | +1. **Webhook Verification**: All GitHub webhooks are verified using HMAC |
| 290 | +2. **Container Isolation**: Each task runs in an isolated container |
| 291 | +3. **Resource Limits**: CPU and memory limits on containers |
| 292 | +4. **API Authentication**: Consider adding API key authentication for job creation |
| 293 | +
|
| 294 | +## Monitoring & Observability |
| 295 | +
|
| 296 | +1. **Metrics**: Job queue depth, processing time, success rate |
| 297 | +2. **Logging**: Structured logs for all job processing steps |
| 298 | +3. **Health Checks**: Regular checks on all dependent services |
| 299 | +4. **Alerts**: Notifications for failed jobs and system issues |
0 commit comments