Computer Use Assistant (CUA)

Important: You must apply for access in order to use the Computer Use model. Apply here: https://aka.ms/oai/cuaaccess

This is a sample repository demonstrating how to use the Computer Use model, an AI model capable of interacting with graphical user interfaces (GUIs) through natural language instructions. The Computer Use model can understand visual interfaces, take actions, and complete tasks by controlling a computer just like a human would.

This framework provides a bridge between the Computer Use model and computer control, allowing for automated task execution while maintaining safety checks and user consent. It serves as a practical example of how to integrate the Computer Use model into applications that require GUI interaction.

Features

Natural language computer control through AI models
Screenshot capture and analysis
Mouse and keyboard control
Safety checks and user consent mechanisms
Support for both OpenAI and Azure OpenAI endpoints
Cross-platform compatibility (Windows, macOS, Linux)
Screen resolution scaling for consistent AI model input

Getting Started

Prerequisites

Python 3.7 or higher
Operating System: Windows, macOS, or Linux
OpenAI API key or Azure OpenAI credentials

Installation

Clone the repository:

git clone [repository-url]
cd computer-use

Install the required packages:

pip install -r requirements.txt

Set up your environment variables:

# Azure OpenAI  
# Craate .env file and update values for the following environment variables:   
AZURE_OPENAI_ENDPOINT="your-azure-endpoint"
AZURE_OPENAI_API_KEY="your-azure-api-key"


## Usage

### Local Computer Control

The framework is designed to work directly with your local computer. Here's how to use it:

1. Run the example application:
```bash
python main.py --instructions "Open web browser and go to microsoft.com"

The AI model will:
- Take screenshots of your screen
- Analyze the visual information
- Execute appropriate actions to complete the task
- Request user consent for safety-critical actions

Command Line Arguments

--instructions: The task to perform (default: "Open web browser and go to microsoft.com")
--model: The AI model to use (default: "computer-use-preview")
--endpoint: The API endpoint to use ("azure" or "openai", default: "azure")
--autoplay: Automatically execute actions without confirmation (default: true)

VM/Remote Control

For scenarios requiring remote computer control or VM automation, we recommend using Playwright. Playwright provides robust browser automation capabilities and is well-suited for VM-based testing and automation scenarios.

For more information on VM automation with Playwright, please refer to:

Demo

The included demo application (main.py) demonstrates how to use the CUA framework:

Start the demo:

python main.py

Enter your instructions when prompted, or use the --instructions parameter to provide them directly.
Watch as the AI model:
- Captures and analyzes your screen
- Performs mouse and keyboard actions
- Requests consent for safety-critical operations
- Provides reasoning for its actions

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
cua.py		cua.py
env.example		env.example
local_computer.py		local_computer.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Computer Use Assistant (CUA)

Features

Getting Started

Prerequisites

Installation

Command Line Arguments

VM/Remote Control

Demo

Resources

About

Uh oh!

Releases

Packages

Languages

sanjeevkumar761/example-CUA-Agent

Folders and files

Latest commit

History

Repository files navigation

Computer Use Assistant (CUA)

Features

Getting Started

Prerequisites

Installation

Command Line Arguments

VM/Remote Control

Demo

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages