browser-use · angeltsalazar · Jul 10, 2025 · Jul 10, 2025 · Jul 10, 2025 · Jul 10, 2025
diff --git a/- b/-
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,211 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+macOS-use is an AI agent framework that enables AI models to control macOS applications through accessibility APIs. The project uses Python with macOS-specific libraries like PyObjC and Cocoa to interact with UI elements.
+
+## Development Setup
+
+### Environment Setup
+
+This project uses conda environment named `macos-use`:
+
+```bash
+# Activate the conda environment
+conda activate macos-use
+
+# Install project in editable mode
+pip install --editable .
+
+# Install dev dependencies
+pip install -e ".[dev]"
+```
+
+#### Alternative Setup with uv (if preferred)
+```bash
+# Set up development environment with uv
+brew install uv && uv venv && source .venv/bin/activate
+
+# Install project in editable mode
+uv pip install --editable .
+
+# Install dev dependencies
+uv pip install -e ".[dev]"
+```
+
+### Environment Variables
+Copy `.env.example` to `.env` and configure API keys:
+- `OPENAI_API_KEY` - OpenAI API key (recommended)
+- `ANTHROPIC_API_KEY` - Anthropic API key (recommended)
+- `GEMINI_API_KEY` - Google Gemini API key (works but less reliable)
+
+### Running Examples
+```bash
+# Basic interaction test
+python examples/try.py
+
+# Calculator demo
+python examples/calculate.py
+
+# Other examples
+python examples/check_time_online.py
+python examples/login_to_auth0.py
+```
+
+## Testing
+
+### Test Commands
+```bash
+# Run all tests
+pytest
+
+# Run specific test markers
+pytest -m "not slow"    # Skip slow tests
+pytest -m integration   # Run integration tests only
+pytest -m unit         # Run unit tests only
+
+# Run with verbose output
+pytest -v
+
+# Run tests in specific directory
+pytest tests/
+```
+
+### Test Configuration
+- Tests are configured in `pytest.ini`
+- Test discovery looks for `test_*.py` and `*_test.py` files
+- Async tests are supported with `asyncio_mode = auto`
+
+## Code Quality
+
+### Linting and Formatting
+```bash
+# The project uses ruff for linting and formatting
+# Configuration is in pyproject.toml under [tool.ruff]
+# - Line length: 130 characters
+# - Quote style: single quotes
+# - Indentation: tabs
+# - Auto-fix enabled
+
+# Run ruff (if available)
+ruff check .
+ruff format .
+```
+
+## Architecture
+
+### Core Components Structure
+The codebase follows a service-oriented architecture inspired by Netflix's Dispatch:
+
+```
+mlx_use/
+├── agent/                 # Core AI agent logic
+│   ├── service.py        # Main Agent class - orchestrates UI interaction
+│   ├── prompts.py        # System and agent prompts
+│   ├── views.py          # Data models for agent operations
+│   └── message_manager/  # Manages conversation history and context
+├── controller/           # Action execution system
+│   ├── service.py        # Controller class - manages action registry
+│   ├── registry/         # Action registration and management
+│   └── views.py          # Action parameter models
+├── mac/                  # macOS-specific functionality
+│   ├── actions.py        # Core UI actions (click, type, scroll)
+│   ├── element.py        # UI element representation
+│   ├── tree.py           # UI tree building and caching
+│   └── context.py        # UI context management
+└── telemetry/           # Usage analytics and monitoring
+```
+
+### Key Classes
+
+#### Agent (`mlx_use/agent/service.py`)
+- Main orchestration class that runs AI agent tasks
+- Manages conversation history, state, and action execution
+- Handles retries, failures, and telemetry
+- Supports multiple LLM providers (OpenAI, Anthropic, Google)
+
+#### Controller (`mlx_use/controller/service.py`)
+- Executes actions received from the agent
+- Manages action registry and validation
+- Handles macOS app launching and UI interaction
+- Supports custom action registration via decorators
+
+#### MacUITreeBuilder (`mlx_use/mac/tree.py`)
+- Builds accessibility tree from macOS applications
+- Caches UI elements for efficient access
+- Provides element discovery and interaction capabilities
+
+### Action System
+Actions are registered in the Controller's registry:
+- `done` - Complete task with result text
+- `input_text` - Type text into UI elements
+- `click_element` - Click UI elements with specific actions
+- `right_click_element` - Right-click UI elements
+- `scroll_element` - Scroll elements in specified directions
+- `open_app` - Launch macOS applications
+- `run_apple_script` - Execute AppleScript commands
+
+### LLM Integration
+The system supports multiple LLM providers:
+- **OpenAI**: Recommended, uses function calling
+- **Anthropic**: Recommended, uses function calling
+- **Google Gemini**: Works but less reliable, uses structured output
+
+## Development Guidelines
+
+### Code Organization
+- Each service follows the pattern: `models.py`, `service.py`, `views.py`, `prompts.py`
+- Services > 500 lines should be split into subservices
+- Views should be organized as: All models, Request models, Response models
+- Single `prompts.py` file per service (split if too long)
+- Never split `routers.py` into multiple files
+
+### Error Handling
+- All actions should return `ActionResult` objects
+- Include helpful error messages for debugging
+- Use appropriate logging levels (DEBUG, INFO, WARNING, ERROR)
+- Handle macOS accessibility permission issues gracefully
+
+### Testing Patterns
+- Use pytest fixtures for common setup
+- Mock external dependencies (LLM calls, system APIs)
+- Test both success and failure scenarios
+- Use async test patterns for async functions
+
+## Package Management
+
+### Dependencies
+Core dependencies include:
+- `langchain` and provider-specific packages for LLM integration
+- `pyobjc` and `pycocoa` for macOS system integration
+- `pydantic` for data validation
+- `gradio` for web UI components
+- `playwright` for browser automation (if needed)
+
+### Build System
+- Uses `hatchling` as build backend
+- Version managed in `pyproject.toml`
+- Package distributed as `mlx-use` on PyPI
+
+## Gradio Application
+
+The project includes a Gradio web interface in `gradio_app/`:
+- Provides web-based interaction with the agent
+- Separate requirements file: `gradio_app/requirements.txt`
+- Run with: `python gradio_app/app.py`
+
+## Platform Considerations
+
+### macOS Specific
+- Requires macOS for full functionality
+- Uses Accessibility APIs that may need user permissions
+- Some features require specific macOS versions
+- PIDs are used to track running applications
+
+### Security
+- Be cautious with AppleScript execution
+- Never commit API keys to the repository
+- The agent can interact with ANY macOS application
+- Use appropriate access controls in production environments
diff --git a/COMO-USAR-MLX-USE.md b/COMO-USAR-MLX-USE.md
@@ -0,0 +1,78 @@
+El comportamiento que estás viendo es completamente normal. El mensaje muestra que:
+
+  1. Se desinstaló la versión anterior: mlx-use-0.0.3 fue desinstalada
+  2. Se instaló la nueva versión en modo editable: Se reinstaló mlx-use-0.0.3 pero ahora en modo desarrollo
+
+  El comando mlx-use --version no funciona porque este proyecto no define un comando de línea de comandos llamado mlx-use. Si revisas el pyproject.toml, no hay
+  una sección [project.scripts] que defina comandos CLI.
+
+  Cómo funciona el proyecto
+
+  El proyecto se usa importando el módulo en Python, no como un comando CLI. Aquí tienes las formas correctas de usar el proyecto:
+
+  1. Ejecutar ejemplos directamente:
+
+  # Activa el entorno
+  conda activate macos-use
+
+  # Ejecuta los ejemplos
+  python examples/try.py
+  python examples/calculate.py
+  python examples/check_time_online.py
+
+  2. Usar en código Python:
+
+  from mlx_use import Agent
+  from mlx_use.controller.service import Controller
+
+  # Crear agente
+  agent = Agent(
+      task="tu tarea aquí",
+      llm=tu_llm,
+      controller=Controller()
+  )
+
+  # Ejecutar
+  await agent.run()
+
+  3. Ejecutar la aplicación Gradio:
+
+  python gradio_app/app.py
+
+  Verificar que la instalación funcionó
+
+  Para verificar que todo está bien:
+
+  # Verifica que el módulo se puede importar
+  python -c "import mlx_use; print('✅ mlx-use instalado correctamente')"
+
+  # Verifica la versión
+  python -c "import mlx_use; print(f'Versión: {mlx_use.__version__ if hasattr(mlx_use, \"__version__\") else \"Instalado en modo desarrollo\"}')"
+
+  Todo está funcionando correctamente. La instalación en modo editable (-e) permite que los cambios que hagas en el código se reflejen inmediatamente sin
+  necesidad de reinstalar el paquete.
+
+  # Para actualizar desde github
+
+  Tu versión está actualizada. Estás en el último commit (7bb5003) que coincide con el remote.
+
+  Para actualizar en el futuro:
+
+  # Descargar últimos cambios
+  git fetch origin
+
+  # Actualizar tu rama local
+  git pull origin main
+
+  Si tienes cambios locales como ahora, antes de hacer pull:
+
+  # Guardar cambios temporalmente
+  git stash
+
+  # Actualizar
+  git pull origin main
+
+  # Recuperar cambios
+  git stash pop
+
+  Tu proyecto está en la última versión disponible.
diff --git a/check_folder.py b/check_folder.py
@@ -0,0 +1,74 @@
+#!/usr/bin/env python3
+"""Quick script to check if Ofir folder was created in Notes app"""
+
+import asyncio
+
+import Cocoa
+
+from mlx_use.mac.optimized_tree import OptimizedTreeManager
+
+NOTES_BUNDLE_ID = 'com.apple.Notes'
+FOLDER_NAME = 'Ofir folder'
+
+async def check_folder_created():
+    """Check if the folder was created in Notes app"""
+    workspace = Cocoa.NSWorkspace.sharedWorkspace()
+
+    # Find Notes app
+    notes_app = None
+    for app in workspace.runningApplications():
+        if app.bundleIdentifier() and NOTES_BUNDLE_ID.lower() in app.bundleIdentifier().lower():
+            notes_app = app
+            break
+
+    if not notes_app:
+        print("❌ Notes app not found")
+        return False
+
+    print(f"📱 Found Notes app, PID: {notes_app.processIdentifier()}")
+
+    # Build UI tree
+    tree_manager = OptimizedTreeManager()
+    pid = notes_app.processIdentifier()
+
+    try:
+        root = await tree_manager.build_tree(pid)
+        if not root:
+            print("❌ Failed to build UI tree")
+            return False
+
+        ui_tree_string = root.get_clickable_elements_string()
+
+        # Check if folder exists in the outline/folder list
+        lines = ui_tree_string.split('\n')
+        for i, line in enumerate(lines):
+            if FOLDER_NAME in line:
+                # Get context around the folder name
+                start = max(0, i-2)
+                end = min(len(lines), i+3)
+                context = '\n'.join(lines[start:end])
+
+                print(f"🔍 Found '{FOLDER_NAME}' in UI tree:")
+                print(f"Context:\n{context}")
+
+                # Check if it's in a real folder location (outline view)
+                if 'outline' in context.lower() or 'axstatictext' in context.lower():
+                    if 'axtextfield' not in context.lower():
+                        print(f"✅ '{FOLDER_NAME}' found in folder list - folder created successfully!")
+                        return True
+                    else:
+                        print(f"🔍 '{FOLDER_NAME}' found in text field - not a real folder")
+                else:
+                    print(f"🔍 '{FOLDER_NAME}' found but not in folder list context")
+
+        print(f"❌ '{FOLDER_NAME}' not found in folder list")
+        return False
+
+    except Exception as e:
+        print(f"❌ Error checking folder: {e}")
+        return False
+    finally:
+        tree_manager.cleanup(pid)
+
+if __name__ == '__main__':
+    asyncio.run(check_folder_created())