A general-purpose AI agent capable of completing complex, long-horizon tasks. Powered by a multi-agent system with specialized agents for code, research, presentations, and multimedia processing.
CUA is a sophisticated multi-agent system that:
- Decomposes complex tasks into manageable sub-tasks
- Routes work to specialized expert agents
- Coordinates execution with dependency management
- Aggregates results into cohesive outputs
| Agent | Capabilities |
|---|---|
| Code Agent | Full-stack web apps, Auth, Database, Stripe, E2E Testing |
| Research Agent | Web search, Deep analysis, Browser automation, Charts |
| PPT Agent | Beautiful presentations, Flexible layouts, PPTX export |
| Multimodal Agent | Image/Audio/Video input, Generation, OCR |
β
Full-Stack Development
- Next.js, React, Vue, APIs
- Server Actions, tRPC, REST
β
Authentication
- NextAuth.js, Clerk, Auth0
- OAuth, JWT, RBAC
β
Database
- Prisma, PostgreSQL, MongoDB
- Migrations, Relations
β
Payments
- Stripe Checkout
- Subscriptions, Webhooks
β
Testing
- Playwright E2E
- Jest, React Testing Library
β
Comprehensive Research
- Multi-source search
- API integration
- Browser automation
β
In-Depth Analysis
- Data analysis with code
- Chart generation
- Report compilation
β
Aesthetics
- Flexible layouts (not just templates)
- Professional design
- Data visualizations
β
Export Quality
- High-fidelity PPTX
- HTML to PowerPoint
- Animations support
β
Input
- Long-text files
- Video, Audio, Images
- OCR and document processing
β
Output
- Image generation
- Audio synthesis (TTS)
- Video analysis
- GitHub/GitLab - Repository management, issues, PRs
- Slack - Messaging and notifications
- Google Maps - Places search, directions
- Figma - Design file access, components
Create any custom MCP from scratch or by wrapping existing tools:
import { createCustomMCP } from "@/lib/mcp";
createCustomMCP(
"my-api",
"My Custom API",
"Access my custom service",
[
{
name: "my_tool",
description: "Does something useful",
inputSchema: { /* JSON Schema */ }
}
]
);βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Request β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Orchestrator β
β β’ Task Analysis β’ Decomposition β’ Coordination β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββ¬ββββββββββββββ
βΌ βΌ βΌ βΌ
βββββββββββββ βββββββββββββ βββββββββββββ βββββββββββββ
β Code β β Research β β PPT β βMultimodal β
β Agent β β Agent β β Agent β β Agent β
βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ
β β β β
βΌ βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Tools & MCPs β
β β’ Code Execution β’ Web Search β’ Image Gen β
β β’ File Operations β’ Browser Use β’ Audio/Video β
β β’ Testing β’ Charts β’ PPTX Export β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Node.js 18+
- API Keys (all FREE tiers):
- Groq - Fast inference
- Google AI Studio - Gemini
- E2B - Code sandbox (optional)
# Clone
git clone https://github.com/sheikhcoders/Computer-Using-Agent.git
cd Computer-Using-Agent
# Install
npm install
# Configure
cp .env.example .env.local
# Add your API keys to .env.local
# Run
npm run dev- / - Chat interface (Lightning/Pro modes)
- /agent - Multi-agent task execution
src/
βββ app/
β βββ api/
β β βββ chat/route.ts # Chat API (Lightning/Pro)
β β βββ agent/route.ts # Multi-agent API (SSE)
β βββ agent/page.tsx # Multi-agent UI
β βββ page.tsx # Chat UI
βββ components/
β βββ agent/ # Agent UI components
β βββ chat/ # Chat components
β βββ ui/ # shadcn/ui
βββ hooks/
β βββ use-agent.ts # Agent state management
βββ lib/
βββ agents/
β βββ types.ts # Type definitions
β βββ config.ts # Agent configurations
β βββ orchestrator.ts # Task orchestration
β βββ code-agent.ts # Code specialist
β βββ research-agent.ts # Research specialist
β βββ ppt-agent.ts # Presentation specialist
β βββ multimodal-agent.ts# Multimodal specialist
βββ mcp/
β βββ index.ts # MCP integration
βββ e2b/
β βββ sandbox.ts # Code execution
βββ ai/
βββ registry.ts # AI provider registry
βββ modes.ts # Lightning/Pro modes
βββ prompts.ts # System prompts
# AI Models (Required - Both have FREE tiers)
GROQ_API_KEY=gsk_... # Groq Console
GOOGLE_GENERATIVE_AI_API_KEY=... # Google AI Studio
# Code Execution (Optional - FREE tier)
E2B_API_KEY=e2b_... # E2B Dashboard
# MCPs (Optional)
GITHUB_TOKEN=ghp_... # GitHub API
SLACK_BOT_TOKEN=xoxb-... # Slack API| Category | Technology |
|---|---|
| Framework | Next.js 15 (App Router) |
| AI SDK | Vercel AI SDK v4 |
| Models | Groq (Llama, Gemma), Google (Gemini) |
| Code Execution | E2B Sandboxed Runtime |
| UI | shadcn/ui, Radix UI, Tailwind CSS |
| Streaming | Server-Sent Events (SSE) |
| Icons | Lucide React (SVG only) |
Following strict WCAG 2.1 guidelines:
- β Full keyboard navigation (WAI-ARIA APG)
- β Visible focus indicators
- β Minimum hit targets (24px desktop, 44px mobile)
- β
prefers-reduced-motionsupport - β Proper ARIA labels and roles
- β Skip to content link
- β Color contrast (APCA compliant)
- Vision capabilities (screenshot analysis)
- E2B Desktop sandbox (full browser control)
- Voice input (Whisper integration)
- Multi-agent collaboration (parallel execution)
- Task memory and replay
- Custom agent creation
- Follow accessibility guidelines (MUST/SHOULD/NEVER)
- Use SVG icons only (never emoji in UI)
- Test keyboard navigation
- Ensure
prefers-reduced-motionsupport - Write tests for new features
MIT License - see LICENSE
Built by @sheikhcoders
Note: This project uses free API tiers. Please respect rate limits and usage policies.