Skip to content

sheikhcoders/Computer-Using-Agent

Repository files navigation

Computer-Using Agent (CUA)

A general-purpose AI agent capable of completing complex, long-horizon tasks. Powered by a multi-agent system with specialized agents for code, research, presentations, and multimedia processing.

Computer-Using Agent License Free Models AI SDK

🎯 Overview

CUA is a sophisticated multi-agent system that:

  • Decomposes complex tasks into manageable sub-tasks
  • Routes work to specialized expert agents
  • Coordinates execution with dependency management
  • Aggregates results into cohesive outputs

πŸš€ Agent Capabilities

Agent Capabilities
Code Agent Full-stack web apps, Auth, Database, Stripe, E2E Testing
Research Agent Web search, Deep analysis, Browser automation, Charts
PPT Agent Beautiful presentations, Flexible layouts, PPTX export
Multimodal Agent Image/Audio/Video input, Generation, OCR

πŸ’» Code Agent

βœ… Full-Stack Development
   - Next.js, React, Vue, APIs
   - Server Actions, tRPC, REST

βœ… Authentication
   - NextAuth.js, Clerk, Auth0
   - OAuth, JWT, RBAC

βœ… Database
   - Prisma, PostgreSQL, MongoDB
   - Migrations, Relations

βœ… Payments
   - Stripe Checkout
   - Subscriptions, Webhooks

βœ… Testing
   - Playwright E2E
   - Jest, React Testing Library

πŸ”¬ Research Agent

βœ… Comprehensive Research
   - Multi-source search
   - API integration
   - Browser automation

βœ… In-Depth Analysis
   - Data analysis with code
   - Chart generation
   - Report compilation

πŸ“Š PPT Agent

βœ… Aesthetics
   - Flexible layouts (not just templates)
   - Professional design
   - Data visualizations

βœ… Export Quality
   - High-fidelity PPTX
   - HTML to PowerPoint
   - Animations support

🎨 Multimodal Agent

βœ… Input
   - Long-text files
   - Video, Audio, Images
   - OCR and document processing

βœ… Output
   - Image generation
   - Audio synthesis (TTS)
   - Video analysis

πŸ”Œ MCP Ecosystem

Pre-built MCPs

  • GitHub/GitLab - Repository management, issues, PRs
  • Slack - Messaging and notifications
  • Google Maps - Places search, directions
  • Figma - Design file access, components

Custom MCPs

Create any custom MCP from scratch or by wrapping existing tools:

import { createCustomMCP } from "@/lib/mcp";

createCustomMCP(
  "my-api",
  "My Custom API",
  "Access my custom service",
  [
    {
      name: "my_tool",
      description: "Does something useful",
      inputSchema: { /* JSON Schema */ }
    }
  ]
);

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    User Request                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Orchestrator                           β”‚
β”‚  β€’ Task Analysis    β€’ Decomposition    β€’ Coordination   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό             β–Ό             β–Ό             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Code    β”‚ β”‚ Research  β”‚ β”‚    PPT    β”‚ β”‚Multimodal β”‚
β”‚   Agent   β”‚ β”‚   Agent   β”‚ β”‚   Agent   β”‚ β”‚   Agent   β”‚
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
      β”‚             β”‚             β”‚             β”‚
      β–Ό             β–Ό             β–Ό             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Tools & MCPs                          β”‚
β”‚  β€’ Code Execution   β€’ Web Search    β€’ Image Gen         β”‚
β”‚  β€’ File Operations  β€’ Browser Use   β€’ Audio/Video       β”‚
β”‚  β€’ Testing          β€’ Charts        β€’ PPTX Export       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Node.js 18+
  • API Keys (all FREE tiers):

Installation

# Clone
git clone https://github.com/sheikhcoders/Computer-Using-Agent.git
cd Computer-Using-Agent

# Install
npm install

# Configure
cp .env.example .env.local
# Add your API keys to .env.local

# Run
npm run dev

Pages

  • / - Chat interface (Lightning/Pro modes)
  • /agent - Multi-agent task execution

πŸ“ Project Structure

src/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ chat/route.ts      # Chat API (Lightning/Pro)
β”‚   β”‚   └── agent/route.ts     # Multi-agent API (SSE)
β”‚   β”œβ”€β”€ agent/page.tsx         # Multi-agent UI
β”‚   └── page.tsx               # Chat UI
β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ agent/                 # Agent UI components
β”‚   β”œβ”€β”€ chat/                  # Chat components
β”‚   └── ui/                    # shadcn/ui
β”œβ”€β”€ hooks/
β”‚   └── use-agent.ts           # Agent state management
└── lib/
    β”œβ”€β”€ agents/
    β”‚   β”œβ”€β”€ types.ts           # Type definitions
    β”‚   β”œβ”€β”€ config.ts          # Agent configurations
    β”‚   β”œβ”€β”€ orchestrator.ts    # Task orchestration
    β”‚   β”œβ”€β”€ code-agent.ts      # Code specialist
    β”‚   β”œβ”€β”€ research-agent.ts  # Research specialist
    β”‚   β”œβ”€β”€ ppt-agent.ts       # Presentation specialist
    β”‚   └── multimodal-agent.ts# Multimodal specialist
    β”œβ”€β”€ mcp/
    β”‚   └── index.ts           # MCP integration
    β”œβ”€β”€ e2b/
    β”‚   └── sandbox.ts         # Code execution
    └── ai/
        β”œβ”€β”€ registry.ts        # AI provider registry
        β”œβ”€β”€ modes.ts           # Lightning/Pro modes
        └── prompts.ts         # System prompts

πŸ”§ Environment Variables

# AI Models (Required - Both have FREE tiers)
GROQ_API_KEY=gsk_...              # Groq Console
GOOGLE_GENERATIVE_AI_API_KEY=...  # Google AI Studio

# Code Execution (Optional - FREE tier)
E2B_API_KEY=e2b_...               # E2B Dashboard

# MCPs (Optional)
GITHUB_TOKEN=ghp_...              # GitHub API
SLACK_BOT_TOKEN=xoxb-...          # Slack API

πŸ› οΈ Tech Stack

Category Technology
Framework Next.js 15 (App Router)
AI SDK Vercel AI SDK v4
Models Groq (Llama, Gemma), Google (Gemini)
Code Execution E2B Sandboxed Runtime
UI shadcn/ui, Radix UI, Tailwind CSS
Streaming Server-Sent Events (SSE)
Icons Lucide React (SVG only)

β™Ώ Accessibility

Following strict WCAG 2.1 guidelines:

  • βœ… Full keyboard navigation (WAI-ARIA APG)
  • βœ… Visible focus indicators
  • βœ… Minimum hit targets (24px desktop, 44px mobile)
  • βœ… prefers-reduced-motion support
  • βœ… Proper ARIA labels and roles
  • βœ… Skip to content link
  • βœ… Color contrast (APCA compliant)

πŸ—ΊοΈ Roadmap

  • Vision capabilities (screenshot analysis)
  • E2B Desktop sandbox (full browser control)
  • Voice input (Whisper integration)
  • Multi-agent collaboration (parallel execution)
  • Task memory and replay
  • Custom agent creation

🀝 Contributing

  1. Follow accessibility guidelines (MUST/SHOULD/NEVER)
  2. Use SVG icons only (never emoji in UI)
  3. Test keyboard navigation
  4. Ensure prefers-reduced-motion support
  5. Write tests for new features

πŸ“„ License

MIT License - see LICENSE

πŸ‘€ Author

Built by @sheikhcoders


Note: This project uses free API tiers. Please respect rate limits and usage policies.

About

Build a computer-using agent that can perform tasks on your behalf.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •