Teach OASIS OS your workflows once, and watch it handle your repetitive tasks forever.
- ✨ Features
- 🏗️ Architecture
- 🛠️ Prerequisites
- 🚀 Quick Start
- 📖 Usage Guide
- 🔌 API Documentation
- 🎯 Use Cases
- 🤝 Contributing
- 📄 License
Record your workflows once and OASIS OS learns to replicate them perfectly. Supports:
- 🖱️ Mouse tracking and click recording
- ⌨️ Keyboard input capture
- 🎤 Voice commands with speech-to-text (Groq API)
- 📸 Screen capture at regular intervals
- 🔄 Automatic workflow generation from recordings
- Execute complex workflows with simple commands
- Cross-platform support (Windows, macOS, Linux)
- AI-powered task execution with multiple LLM backends
- Visual debugging with screenshots and logs
- Beautiful Next.js frontend with dark/light mode
- Real-time workflow visualization
- Interactive workspace management
- Smooth animations and transitions
- FastAPI backend with async support
- Modular plugin system
- REST API for easy integration
- Support for multiple AI models (GPT-4, Claude, Groq, Ollama)
OASIS-Final/
├── 📁 Oasis-OS/
│ ├── 📁 backend/ # FastAPI backend server
│ │ ├── 📁 teach_mode/ # Workflow recording & building
│ │ ├── 📁 routers/ # API endpoints
│ │ └── 📁 services/ # Business logic
│ │
│ ├── 📁 frontend/ # Next.js frontend app
│ │ ├── 📁 src/
│ │ │ ├── 📁 app/ # App routes
│ │ │ └── 📁 components/ # UI components
│ │ └── 📁 public/ # Static assets
│ │
│ └── 📁 gui_agents/ # Desktop automation agents
│ └── 📁 s1/ # Agent implementation
│ ├── 📁 aci/ # Platform-specific interfaces
│ └── 📁 core/ # Core agent logic
- Python 3.8+ (3.11 recommended)
- Node.js 18+ and npm/yarn
- Git
- Operating System: Windows 10+, macOS 11+, or Linux (Ubuntu 20.04+)
- GROQ_API_KEY: For speech-to-text in teach mode
- OPENAI_API_KEY: For GPT models
- ANTHROPIC_API_KEY: For Claude models
git clone https://github.com/yourusername/Oasis-Final.git
cd Oasis-Final
# Navigate to backend directory
cd Oasis-OS/backend
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create .env file
echo "GROQ_API_KEY=your_groq_api_key_here" > .env
# Run the backend server
python start.py
The backend will start on http://localhost:8000
Open a new terminal:
# Navigate to frontend directory
cd Oasis-OS/frontend
# Install dependencies
npm install
# or
yarn install
# Run development server
npm run dev
# or
yarn dev
The frontend will start on http://localhost:3000
Open another terminal:
# Navigate to root directory
cd Oasis-Final
# Install GUI agents dependencies
pip install -r requirements.txt
# Test the installation
python -m gui_agents.s1.cli_app --help
Teach Mode allows you to record your workflows and have OASIS OS learn from them.
- Via API (curl/Postman):
# Start recording
curl -X POST http://localhost:8000/start_recording/my_workflow
# Stop recording and build workflow
curl -X POST http://localhost:8000/stop_recording
# Execute the recorded workflow
curl -X POST http://localhost:8000/run_workflow/my_workflow
- Via Python Script:
import requests
# Start recording
response = requests.post("http://localhost:8000/start_recording/file_organization")
print("Recording started!")
# Perform your tasks...
# - Click on folders
# - Type commands
# - Speak instructions (if GROQ_API_KEY is set)
# Stop recording
response = requests.post("http://localhost:8000/stop_recording")
print("Workflow built!")
- Mouse Events: Clicks, movements, scrolls
- Keyboard Input: All keystrokes and shortcuts
- Voice Commands: Spoken instructions (requires GROQ API)
- Screenshots: Visual context every 0.5 seconds
- Timing: Precise timestamps for replay
Execute your recorded workflows programmatically:
# Execute a workflow
response = requests.post("http://localhost:8000/run_workflow/file_organization")
Use the GUI agents for advanced automation:
# Interactive mode with GPT-4
python -m gui_agents.s1.cli_app --model gpt-4o
# Use local models with Ollama
python -m gui_agents.s1.cli_app --model llama3.2 --engine-type ollama
# Teaching mode - record a demonstration
python -m gui_agents.s1.cli_app --teach
- "Open Chrome and navigate to GitHub"
- "Create a new folder called 'Projects' on the desktop"
- "Take a screenshot and save it to Downloads"
- "Find all PDF files in Documents and move them to a new folder"
Method | Endpoint | Description |
---|---|---|
GET | / |
API information and available endpoints |
POST | /start_recording/{session_name} |
Start a new teach mode recording |
POST | /stop_recording |
Stop recording and build workflow |
POST | /run_workflow/{session_name} |
Execute a recorded workflow |
GET | /api/v1/workflow/{workflow_id}/status |
Get workflow execution status |
DELETE | /api/v1/workflow/{workflow_id} |
Cancel a running workflow |
GET | /api/v1/workflow/active |
List all active workflows |
// Frontend integration example
async function startTeaching() {
const response = await fetch('http://localhost:8000/start_recording/email_automation', {
method: 'POST'
});
const data = await response.json();
console.log('Recording started:', data);
}
async function executeWorkflow() {
const response = await fetch('http://localhost:8000/run_workflow/email_automation', {
method: 'POST'
});
const data = await response.json();
console.log('Workflow executed:', data);
}
# Record organizing downloads folder
# OASIS OS learns your file sorting patterns
# Automatically sorts new downloads based on your rules
# Teach OASIS OS to:
# - Filter and categorize emails
# - Auto-respond to common queries
# - Archive old conversations
# Record your Excel/CSV workflows
# - Data cleaning and formatting
# - Report generation
# - Chart creation
# Automate browser tasks:
# - Form filling
# - Data scraping
# - Social media posting
Create a .env
file in the backend directory:
# Speech-to-Text (Optional)
GROQ_API_KEY=your_groq_api_key
# AI Models (Choose one or more)
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
AZURE_OPENAI_API_KEY=your_azure_key
# Server Configuration
HOST=0.0.0.0
PORT=8000
Edit backend/teach_mode/teach_mode.py
for:
- VAD threshold adjustment
- Frame capture intervals
- Audio recording settings
-
"No audio detected" in Teach Mode
- Check microphone permissions
- Verify GROQ_API_KEY is set correctly
- Adjust VAD_THRESHOLD in teach_mode.py
-
"Module not found" errors
- Ensure virtual environment is activated
- Run
pip install -r requirements.txt
again - Check Python version (3.8+ required)
-
Frontend connection issues
- Verify backend is running on port 8000
- Check CORS settings in server.py
- Clear browser cache
-
GUI agents not working
- Install platform-specific dependencies
- Grant accessibility permissions (macOS)
- Run as administrator (Windows)
We welcome contributions! Please see our Contributing Guidelines for details.
# Install development dependencies
pip install -r requirements-dev.txt
npm install --save-dev
# Run tests
pytest
npm test
# Format code
black .
prettier --write .
This project is licensed under the MIT License - see the LICENSE file for details.