Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed -
Binary file not shown.
211 changes: 211 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

macOS-use is an AI agent framework that enables AI models to control macOS applications through accessibility APIs. The project uses Python with macOS-specific libraries like PyObjC and Cocoa to interact with UI elements.

## Development Setup

### Environment Setup

This project uses conda environment named `macos-use`:

```bash
# Activate the conda environment
conda activate macos-use

# Install project in editable mode
pip install --editable .

# Install dev dependencies
pip install -e ".[dev]"
```

#### Alternative Setup with uv (if preferred)
```bash
# Set up development environment with uv
brew install uv && uv venv && source .venv/bin/activate

# Install project in editable mode
uv pip install --editable .

# Install dev dependencies
uv pip install -e ".[dev]"
```

### Environment Variables
Copy `.env.example` to `.env` and configure API keys:
- `OPENAI_API_KEY` - OpenAI API key (recommended)
- `ANTHROPIC_API_KEY` - Anthropic API key (recommended)
- `GEMINI_API_KEY` - Google Gemini API key (works but less reliable)

### Running Examples
```bash
# Basic interaction test
python examples/try.py

# Calculator demo
python examples/calculate.py

# Other examples
python examples/check_time_online.py
python examples/login_to_auth0.py
```

## Testing

### Test Commands
```bash
# Run all tests
pytest

# Run specific test markers
pytest -m "not slow" # Skip slow tests
pytest -m integration # Run integration tests only
pytest -m unit # Run unit tests only

# Run with verbose output
pytest -v

# Run tests in specific directory
pytest tests/
```

### Test Configuration
- Tests are configured in `pytest.ini`
- Test discovery looks for `test_*.py` and `*_test.py` files
- Async tests are supported with `asyncio_mode = auto`

## Code Quality

### Linting and Formatting
```bash
# The project uses ruff for linting and formatting
# Configuration is in pyproject.toml under [tool.ruff]
# - Line length: 130 characters
# - Quote style: single quotes
# - Indentation: tabs
# - Auto-fix enabled

# Run ruff (if available)
ruff check .
ruff format .
```

## Architecture

### Core Components Structure
The codebase follows a service-oriented architecture inspired by Netflix's Dispatch:

```
mlx_use/
├── agent/ # Core AI agent logic
│ ├── service.py # Main Agent class - orchestrates UI interaction
│ ├── prompts.py # System and agent prompts
│ ├── views.py # Data models for agent operations
│ └── message_manager/ # Manages conversation history and context
├── controller/ # Action execution system
│ ├── service.py # Controller class - manages action registry
│ ├── registry/ # Action registration and management
│ └── views.py # Action parameter models
├── mac/ # macOS-specific functionality
│ ├── actions.py # Core UI actions (click, type, scroll)
│ ├── element.py # UI element representation
│ ├── tree.py # UI tree building and caching
│ └── context.py # UI context management
└── telemetry/ # Usage analytics and monitoring
```

### Key Classes

#### Agent (`mlx_use/agent/service.py`)
- Main orchestration class that runs AI agent tasks
- Manages conversation history, state, and action execution
- Handles retries, failures, and telemetry
- Supports multiple LLM providers (OpenAI, Anthropic, Google)

#### Controller (`mlx_use/controller/service.py`)
- Executes actions received from the agent
- Manages action registry and validation
- Handles macOS app launching and UI interaction
- Supports custom action registration via decorators

#### MacUITreeBuilder (`mlx_use/mac/tree.py`)
- Builds accessibility tree from macOS applications
- Caches UI elements for efficient access
- Provides element discovery and interaction capabilities

### Action System
Actions are registered in the Controller's registry:
- `done` - Complete task with result text
- `input_text` - Type text into UI elements
- `click_element` - Click UI elements with specific actions
- `right_click_element` - Right-click UI elements
- `scroll_element` - Scroll elements in specified directions
- `open_app` - Launch macOS applications
- `run_apple_script` - Execute AppleScript commands

### LLM Integration
The system supports multiple LLM providers:
- **OpenAI**: Recommended, uses function calling
- **Anthropic**: Recommended, uses function calling
- **Google Gemini**: Works but less reliable, uses structured output

## Development Guidelines

### Code Organization
- Each service follows the pattern: `models.py`, `service.py`, `views.py`, `prompts.py`
- Services > 500 lines should be split into subservices
- Views should be organized as: All models, Request models, Response models
- Single `prompts.py` file per service (split if too long)
- Never split `routers.py` into multiple files

### Error Handling
- All actions should return `ActionResult` objects
- Include helpful error messages for debugging
- Use appropriate logging levels (DEBUG, INFO, WARNING, ERROR)
- Handle macOS accessibility permission issues gracefully

### Testing Patterns
- Use pytest fixtures for common setup
- Mock external dependencies (LLM calls, system APIs)
- Test both success and failure scenarios
- Use async test patterns for async functions

## Package Management

### Dependencies
Core dependencies include:
- `langchain` and provider-specific packages for LLM integration
- `pyobjc` and `pycocoa` for macOS system integration
- `pydantic` for data validation
- `gradio` for web UI components
- `playwright` for browser automation (if needed)

### Build System
- Uses `hatchling` as build backend
- Version managed in `pyproject.toml`
- Package distributed as `mlx-use` on PyPI

## Gradio Application

The project includes a Gradio web interface in `gradio_app/`:
- Provides web-based interaction with the agent
- Separate requirements file: `gradio_app/requirements.txt`
- Run with: `python gradio_app/app.py`

## Platform Considerations

### macOS Specific
- Requires macOS for full functionality
- Uses Accessibility APIs that may need user permissions
- Some features require specific macOS versions
- PIDs are used to track running applications

### Security
- Be cautious with AppleScript execution
- Never commit API keys to the repository
- The agent can interact with ANY macOS application
- Use appropriate access controls in production environments
78 changes: 78 additions & 0 deletions COMO-USAR-MLX-USE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
El comportamiento que estás viendo es completamente normal. El mensaje muestra que:

1. Se desinstaló la versión anterior: mlx-use-0.0.3 fue desinstalada
2. Se instaló la nueva versión en modo editable: Se reinstaló mlx-use-0.0.3 pero ahora en modo desarrollo

El comando mlx-use --version no funciona porque este proyecto no define un comando de línea de comandos llamado mlx-use. Si revisas el pyproject.toml, no hay
una sección [project.scripts] que defina comandos CLI.

Cómo funciona el proyecto

El proyecto se usa importando el módulo en Python, no como un comando CLI. Aquí tienes las formas correctas de usar el proyecto:

1. Ejecutar ejemplos directamente:

# Activa el entorno
conda activate macos-use

# Ejecuta los ejemplos
python examples/try.py
python examples/calculate.py
python examples/check_time_online.py

2. Usar en código Python:

from mlx_use import Agent
from mlx_use.controller.service import Controller

# Crear agente
agent = Agent(
task="tu tarea aquí",
llm=tu_llm,
controller=Controller()
)

# Ejecutar
await agent.run()

3. Ejecutar la aplicación Gradio:

python gradio_app/app.py

Verificar que la instalación funcionó

Para verificar que todo está bien:

# Verifica que el módulo se puede importar
python -c "import mlx_use; print('✅ mlx-use instalado correctamente')"

# Verifica la versión
python -c "import mlx_use; print(f'Versión: {mlx_use.__version__ if hasattr(mlx_use, \"__version__\") else \"Instalado en modo desarrollo\"}')"

Todo está funcionando correctamente. La instalación en modo editable (-e) permite que los cambios que hagas en el código se reflejen inmediatamente sin
necesidad de reinstalar el paquete.

# Para actualizar desde github

Tu versión está actualizada. Estás en el último commit (7bb5003) que coincide con el remote.

Para actualizar en el futuro:

# Descargar últimos cambios
git fetch origin

# Actualizar tu rama local
git pull origin main

Si tienes cambios locales como ahora, antes de hacer pull:

# Guardar cambios temporalmente
git stash

# Actualizar
git pull origin main

# Recuperar cambios
git stash pop

Tu proyecto está en la última versión disponible.
74 changes: 74 additions & 0 deletions check_folder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#!/usr/bin/env python3
"""Quick script to check if Ofir folder was created in Notes app"""

import asyncio

import Cocoa

from mlx_use.mac.optimized_tree import OptimizedTreeManager

NOTES_BUNDLE_ID = 'com.apple.Notes'
FOLDER_NAME = 'Ofir folder'

async def check_folder_created():
"""Check if the folder was created in Notes app"""
workspace = Cocoa.NSWorkspace.sharedWorkspace()

# Find Notes app
notes_app = None
for app in workspace.runningApplications():
if app.bundleIdentifier() and NOTES_BUNDLE_ID.lower() in app.bundleIdentifier().lower():
notes_app = app
break

if not notes_app:
print("❌ Notes app not found")
return False

print(f"📱 Found Notes app, PID: {notes_app.processIdentifier()}")

# Build UI tree
tree_manager = OptimizedTreeManager()
pid = notes_app.processIdentifier()

try:
root = await tree_manager.build_tree(pid)
if not root:
print("❌ Failed to build UI tree")
return False

ui_tree_string = root.get_clickable_elements_string()

# Check if folder exists in the outline/folder list
lines = ui_tree_string.split('\n')
for i, line in enumerate(lines):
if FOLDER_NAME in line:
# Get context around the folder name
start = max(0, i-2)
end = min(len(lines), i+3)
context = '\n'.join(lines[start:end])

print(f"🔍 Found '{FOLDER_NAME}' in UI tree:")
print(f"Context:\n{context}")

# Check if it's in a real folder location (outline view)
if 'outline' in context.lower() or 'axstatictext' in context.lower():
if 'axtextfield' not in context.lower():
print(f"✅ '{FOLDER_NAME}' found in folder list - folder created successfully!")
return True
else:
print(f"🔍 '{FOLDER_NAME}' found in text field - not a real folder")
else:
print(f"🔍 '{FOLDER_NAME}' found but not in folder list context")

print(f"❌ '{FOLDER_NAME}' not found in folder list")
return False

except Exception as e:
print(f"❌ Error checking folder: {e}")
return False
finally:
tree_manager.cleanup(pid)

if __name__ == '__main__':
asyncio.run(check_folder_created())
Loading