diff --git a/.cursor/rules/agent.mdc b/.cursor/rules/agent.mdc index aac7882..cb3b3e9 100644 --- a/.cursor/rules/agent.mdc +++ b/.cursor/rules/agent.mdc @@ -1,3 +1,8 @@ +--- +description: +globs: +alwaysApply: false +--- # Agent Instructions This file provides guidance to AI assistants when working with code in this repository. @@ -22,6 +27,44 @@ The knowledge base (`knowledge_base/` directory) contains numerous Markdown file - The actual script code is contained in the Markdown body in a fenced code block - Scripts can use placeholders like `--MCP_INPUT:keyName` and `--MCP_ARG_N` for parameter substitution +## General useful for debug loops: +- The Claude Code tool is helpful to analyze large logs or do complex file edits. +- Pipe scripts that output lots of logs into a file for easier reading. +- Use AppleScript to execute apps such as Claude Code to test the mcp. +(This requires that the mcp is correctly set up) +- Whenever you want to ask something to the user, ask Claude Code first instead. +- Use AppleScript to find the bundle identifier for an app name + +- To run any terminal command, use `osascript .cursor/scripts/terminator.scpt`. + Call it without arguments to understand syntax. + Call it with just your tag and it will return the log. + +- read_file, write_file, move_file all need absolute paths! + +- To run tests for AXorcist reliable, use `run_tests.sh`. + +- To test the stdin feature of `axorc`, you MUST use `axorc/axorc_runner.sh`. + +## AXorcist Binary Management + +The `axorc/` directory contains: +- `axorc`: The main AXorcist binary (tracked in git) +- `AXorcist`: Symlink to `/Users/steipete/Projects/CodeLooper/AXorcist` (DO NOT REMOVE - needed for builds) +- `axorc_runner.sh`: Wrapper script that tries multiple binary locations + +To rebuild the AXorcist binary: +1. Ensure the `AXorcist` symlink exists (points to CodeLooper project) +2. Copy the latest binary from CodeLooper: `cp /Users/steipete/Projects/CodeLooper/AXorcist/.build/debug/axorc axorc/axorc` +3. Also copy to expected location: `cp axorc/axorc axorc/AXorcist/.build/debug/axorc` + +- Long-term this project simply ships with a compiled binary of `axorc`. + During developent, we have a symlink of the `AXorcist` directory from another folder to simplify development across projects. + +The symlink is ESSENTIAL for: +- Development workflow integration +- Build process access to source code +- Testing and debugging + ## Common Development Commands ```bash diff --git a/.cursor/rules/axorc.mdc b/.cursor/rules/axorc.mdc new file mode 100644 index 0000000..2f09c50 --- /dev/null +++ b/.cursor/rules/axorc.mdc @@ -0,0 +1,138 @@ +--- +description: +globs: +alwaysApply: false +--- +# macOS Accessibility (`axorc`) Command-Line Tool + +This document outlines the functionality, build process, testing procedures, and technical details of the `axorc` Swift command-line utility, designed for interacting with the macOS Accessibility framework. + +## 1. `axorc` Overview + +* **Purpose**: Provides a JSON-based interface to query UI elements and perform actions using the macOS Accessibility API. It's intended to be called by other processes. The core Swift library `AXorcist` handles the accessibility interactions. +* **Communication**: `axorc` reads JSON commands (via direct argument, stdin, or file) and writes JSON responses (or errors) to `stdout`. Debug information often goes to `stderr`. +* **Core `AXorcist` Library Commands (exposed by `axorc` via `CommandType` enum in `ax/AXorcist/Sources/AXorcist/Core/Models.swift`)**: + * `ping`: Checks if `axorc` is responsive. + * `getFocusedElement`: Retrieves information about the currently focused UI element in a target application. + * `query`: Retrieves information about specific UI element(s) matching locator criteria. + * `getAttributes`: Retrieves specific attributes for element(s) matching locator criteria. + * `describeElement`: Retrieves a comprehensive list of attributes for element(s) matching locator criteria. + * `collectAll`: Retrieves information about all UI elements matching criteria within a scope. + * `performAction`: Executes an action on a specified UI element. + * `extractText`: Extracts textual content from specified UI element(s). + * `batch`: Executes a sequence of sub-commands. +* **Key Input Fields (JSON - see `CommandEnvelope` in `ax/AXorcist/Sources/AXorcist/Core/Models.swift`)**: + * `command_id` (string): A unique identifier for the command, echoed in the response. + * `command` (string enum: `CommandType`): e.g., "ping", "getFocusedElement", "query", "getAttributes", "describeElement", "collectAll", "performAction", "extractText", "batch". + * `application` (string, optional): Bundle ID (e.g., "com.apple.TextEdit") or localized name of the target application. If omitted, behavior might depend on the command (e.g., `getFocusedElement` might try the system-wide focused app). + * `locator` (object, optional - see `Locator` in `ax/AXorcist/Sources/AXorcist/Core/Models.swift`): Specifies the target element(s) for commands like `query`, `getAttributes`, `describeElement`, `performAction`, `extractText`, `collectAll`. + * `criteria` (object `[String: String]`): Key-value pairs of attributes to match (e.g., `{"AXRole": "AXWindow", "AXTitle":"My Window"}`). + * `match_all` (boolean, optional): If true, all criteria must match. If false or omitted, any criterion matching is sufficient (behavior might vary by implementation). + * `root_element_path_hint` (array of strings, optional): A pathHint to find a container element from which the locator criteria will be applied. + * `requireAction` (string, optional): Filters results to elements supporting a specific action (e.g., "AXPress"). + * `computed_name_contains` (string, optional): Filters elements whose computed name (derived from title, value, etc.) contains the given string. + * `attributes` (array of strings, optional): For commands like `getFocusedElement`, `query`, `getAttributes`, `collectAll`, specifies which attributes to retrieve. Defaults to a common set if omitted. + * `path_hint` (array of strings, optional): A path to navigate the UI tree (e.g., `["window[0]", "button[AXTitle=OK]"]`) to find a target element or a base for the `locator`. (Exact path syntax may evolve). + * `action_name` (string, optional): For `performAction` command, the action to execute (e.g., "AXPress", "AXSetValue"). + * `action_value` (any, optional, via `AnyCodable`): For `performAction` with actions like "AXSetValue", this is the value to set (e.g., a string, number, boolean). + * `sub_commands` (array of `CommandEnvelope` objects, optional): For the `batch` command, contains the sequence of commands to execute. + * `max_elements` (int, optional): For `collectAll`, can limit the number of elements returned. Also used as max depth in some search operations. + * `output_format` (string enum `OutputFormat`, optional): For attribute retrieval, can be "smart", "verbose", "text_content", "json_string". From `ax/AXorcist/Sources/AXorcist/Core/Models.swift`. + * `debug_logging` (boolean, optional): If `true`, `axorc` and `AXorcist` include detailed internal debug logs in the response and/or stderr. + * `payload` (object `[String: String]`, optional): Legacy field, primarily for `ping` compatibility to echo back simple data. +* **Key Output Fields (JSON - see response structs in `ax/AXorcist/Sources/axorc/axorc.swift` which wrap `AXorcist.HandlerResponse`)**: + * All responses generally include `command_id` (string), `success` (boolean), and `debug_logs` (array of strings, optional). + * `SimpleSuccessResponse` (for `ping`): Contains `status`, `message`, `details`. + * `QueryResponse` (for `getFocusedElement`, `query`, `getAttributes`, `describeElement`, `collectAll`, `performAction`, `extractText`): + * `command` (string): The original command type. + * `data` (object `AXElementForEncoding`, optional): Contains the primary accessibility element data. + * `attributes` (object `[String: AnyCodable]`): Dictionary of element attributes. + * `path` (array of strings, optional): Path to the element. + * `error` (object `ErrorDetail`, optional): Contains an error `message` if `success` is false. + * `BatchOperationResponse` (for `batch`): + * `results` (array of `QueryResponse` objects): One for each sub-command. + * `ErrorResponse` (for input errors, decoding errors, or unhandled command types): + * `error` (object `ErrorDetail`): Contains an error `message`. + +## 2. Functionality - How it Works + +The `axorc` binary (`ax/AXorcist/Sources/axorc/main.swift`) is the command-line entry point. It parses input, decodes the JSON `CommandEnvelope`, and then calls methods on an instance of the `AXorcist` class (from `ax/AXorcist/Sources/AXorcist/AXorcist.swift`). The `AXorcist` library handles the core accessibility interactions. + +* **`AXorcist` Library**: + * Located in `ax/AXorcist/Sources/AXorcist/`. + * `AXorcist.swift`: Contains the main class and handler methods for each command type (e.g., `handleGetFocusedElement`, `handleQuery`, `handlePerformAction`). + * `Core/Models.swift`: Defines `CommandEnvelope`, `Locator`, `HandlerResponse`, `AXElement` (for data representation), `AnyCodable`, `OutputFormat`, etc. + * `Core/Element.swift`: Defines `AXorcist.AXElement` which is a wrapper around `AXUIElement` and is used internally by `AXorcist` and in `HandlerResponse.data`. + * `Search/ElementSearch.swift`: Contains logic for finding UI elements based on locators, path hints, and criteria (e.g., depth-first search, attribute matching). + * `Core/AccessibilityPermissions.swift`: Handles checking for necessary permissions. + * `Core/ProcessUtils.swift`: Utilities for finding application PIDs. + * Many functions interacting with `AXUIElement` are marked `@MainActor`. + +* **Application Targeting**: + * `AXorcist` uses `ProcessUtils.swift` to find the `pid_t` for a given application bundle ID or name. + * `AXUIElementCreateApplication(pid)` gets the root application `AXUIElement`. + +* **Element Location**: + * Typically handled by methods in `AXorcist.swift` or `Search/ElementSearch.swift`. + * Uses locators (`criteria`, `requireAction`, etc.) and `path_hint`. + * Involves traversing the accessibility tree (e.g., an element's `kAXChildrenAttribute`). + +* **Attribute Retrieval**: + * `AXorcist`'s `getElementAttributes` (internal helper) fetches attributes for an `AXUIElement`. + * Converts `CFTypeRef` values to Swift types, often using `AnyCodable` for the `attributes` dictionary in `AXorcist.AXElement`. + * Handles `AXValue` types (like position/size). + * May generate synthetic attributes like "ComputedName" or "AXActionNames". + +* **Action Performing**: + * `AXorcist` checks if an action is supported (e.g., via `kAXActionNamesAttribute`). + * Uses `AXUIElementPerformAction` or `AXUIElementSetAttributeValue` (for "AXSetValue"). + +* **Error Handling**: + * `AXorcist` handler methods return a `HandlerResponse` which includes an optional error string. + * `axorc` wraps this into its JSON error structures. + +* **Threading**: + * Core Accessibility API calls are dispatched to the `@MainActor` by `AXorcist`. + +* **Debugging**: + * The `debug_logging: true` in the input JSON enables verbose logging. + * Logs are collected by `AXorcist` and passed back in `HandlerResponse.debug_logs`. + * `axorc` includes these in its final JSON output's `debug_logs` field and may also print to `stderr` using `fputs`. + +## 3. Build Process + +* **Swift Package Manager**: `axorc` is built using SPM from the package in `ax/AXorcist/`. + * `ax/AXorcist/Package.swift` defines the "axorc" executable product and the "AXorcist" library product. +* **Output**: The executable is typically found in `ax/AXorcist/.build/debug/axorc` or `ax/AXorcist/.build/release/axorc`. + +## 4. Running & Testing + +* **Direct Execution**: + ```bash + cd /path/to/your/project/ax/AXorcist/ + swift build # if not already built + ./.build/debug/axorc '{ "command_id":"ping1", "command":"ping" }' + ``` +* **Via `terminator.scpt` (Example for consistency)**: + It is recommended to use a consistent tag (e.g., "axorc_ops") when using `terminator.scpt` to reuse the same terminal window/tab. + ```bash + # First command with a new tag (establishes session, cds, runs command) + osascript /path/to/.cursor/scripts/terminator.scpt "axorc_ops" "cd /Users/steipete/Projects/macos-automator-mcp/ax/AXorcist/ && ./.build/debug/axorc --debug '{ \"command_id\": \"claude-ping\", \"command\": \"ping\" }'" + + # Subsequent commands with the same tag + osascript /path/to/.cursor/scripts/terminator.scpt "axorc_ops" ".build/debug/axorc '{ \"command_id\": \"claude-getfocused\", \"command\": \"getFocusedElement\", \"application\": \"com.anthropic.claudefordesktop\" }'" + ``` +* **Input Methods for `axorc`**: + * Direct argument (last argument on the command line, must be valid JSON). + * `--stdin`: Reads JSON from standard input. + * `--file /path/to/file.json`: Reads JSON from a specified file. +* **Permissions**: The process executing `axorc` (e.g., Terminal, or your calling application) **must** have "Accessibility" permissions in "System Settings > Privacy & Security > Accessibility". `AXorcist` calls `AccessibilityPermissions.checkAccessibilityPermissions()` on startup. + +## 5. macOS Accessibility (AX) Intricacies + +* **Frameworks**: `ApplicationServices` (for C APIs like `AXUIElement...`), `AppKit` (for `NSRunningApplication`). +* **`AXUIElement`**: The core C type representing an accessible UI element. +* **Attributes & `CFTypeRef`**: Values are `CFTypeRef`. Handled by `AXorcist.AnyCodable` for JSON serialization. +* **Tooling**: **Accessibility Inspector** (Xcode > Open Developer Tool) is vital for inspecting UI elements and their properties. + +This document reflects the structure and functionality of the `axorc` tool and its underlying `AXorcist` library. diff --git a/.cursor/rules/claude-desktop.mdc b/.cursor/rules/claude-desktop.mdc new file mode 100644 index 0000000..96f86b6 --- /dev/null +++ b/.cursor/rules/claude-desktop.mdc @@ -0,0 +1,174 @@ +--- +description: +globs: +alwaysApply: false +--- +## Automating Claude Desktop (com.anthropic.claudefordesktop) + +This document outlines key findings and strategies for automating the Claude desktop application using AppleScript and the `ax` accessibility helper tool. + +**Bundle Identifier:** `com.anthropic.claudefordesktop` + +### 1. Ensuring a Clean Application State + +It's recommended to quit and relaunch the Claude application before starting an automation sequence to ensure it's in a known, default state. + +**AppleScript to Relaunch Claude:** +```applescript +set claudeBundleID to "com.anthropic.claudefordesktop" + +-- Quit the application +try + tell application id claudeBundleID + if it is running then + quit + delay 1 -- Give it a moment to quit + end if + end tell +on error errMsg + log "Notice: Error during quit (or app not running): " & errMsg +end try + +-- Ensure it's fully quit +set maxAttempts to 5 +repeat maxAttempts times + if application id claudeBundleID is running then + delay 0.5 + else + exit repeat + end if +end repeat + +-- Relaunch the application +try + tell application id claudeBundleID + launch + delay 1 -- Give it a moment to launch + activate -- Bring to front + end tell + log "Claude app relaunched successfully." +on error errMsg + log "Error: Could not relaunch Claude: " & errMsg +end try +``` + +### 2. Text Input + +Directly setting the `AXValue` of the main text input area via `ax` or simple AppleScript `set value of text area ...` has proven unreliable. The most robust method found is using AppleScript to simulate keystrokes. + +**AppleScript for Text Input (after app is active and window is frontmost):** +```applescript +tell application id "com.anthropic.claudefordesktop" + activate +end tell +delay 0.5 -- Give time for activation + +tell application "System Events" + tell process "Claude" -- Using name, but app should be frontmost by bundle ID + set frontmost to true + delay 0.2 + if not (exists window 1) then + error "Claude window 1 not found after activation." + end if + + -- Optional: Try to explicitly focus the window + try + perform action "AXRaise" of window 1 + set focused of window 1 to true + delay 0.2 + on error + -- Non-critical if this fails + end try + + try + keystroke "Your text to input here." + log "Keystroke successful." + on error errMsg number errNum + error "Error during keystroke: " & errMsg & " (Num: " & errNum & ")" + end try + end tell +end tell +``` + +### 3. Identifying UI Elements with `ax` + +The Claude desktop application appears to be Electron-based, meaning UI elements are often nested within web areas. + +**a. Main Text Input Area:** +* **Role:** `AXTextArea` +* **Location:** Typically within the first window. The full accessibility path can be quite deep (e.g., `window[1]/group[1]/group[1]/group[1]/group[1]/webArea[1]/group[1]/textArea[1]`). +* **`ax` Locator (Query):** + ```json + { + "command_id": "claude_desktop_query_textarea_001", + "command": "query", + "application": "com.anthropic.claudefordesktop", + "locator": { + "root_element_path_hint": ["window[1]"], + "criteria": { + "AXRole": "AXTextArea" + } + }, + "attributes": ["AXValue", "AXFocused", "AXPathHint", "ComputedName", "ComputedPath"], + "debug_logging": true, + "output_format": "verbose" + } + ``` + Querying this after text input can verify the content of `AXValue`. + +**b. Send Message Button:** +* **Role:** `AXButton` +* **Identifying Attribute:** `AXTitle` is typically "Send message". +* **Location:** Also within `window[1]`. +* **`ax` Locator (Query to find):** + ```json + { + "command_id": "claude_desktop_query_sendbutton_001", + "command": "query", + "application": "com.anthropic.claudefordesktop", + "locator": { + "root_element_path_hint": ["window[1]"], + "criteria": { + "AXRole": "AXButton", + "AXTitle": "Send message" + } + }, + "attributes": ["AXTitle", "AXIdentifier", "AXRoleDescription", "AXPathHint", "AXEnabled", "AXActionNames", "IsClickable", "ComputedName", "ComputedPath"], + "debug_logging": true, + "output_format": "verbose" + } + ``` +* **`ax` Locator (Perform Action):** + ```json + { + "command_id": "claude_desktop_perform_sendbutton_001", + "command": "perform_action", + "application": "com.anthropic.claudefordesktop", + "locator": { + "root_element_path_hint": ["window[1]"], + "criteria": { + "AXRole": "AXButton", + "AXTitle": "Send message" + } + }, + "action": "AXPress", + "debug_logging": true + } + ``` + +### 4. Performing Actions with `ax` + +* **`AXPress`:** The "Send message" button supports the `AXPress` action. This can be reliably triggered using the `perform_action` command with the locator described above. +* **`AXActionNames` Attribute:** While `AXPress` works, the `AXActionNames` attribute might appear as `kAXNotAvailableString` or a more detailed "n/a (no specific actions found...)" message if no actions are discoverable via `kAXActionNamesAttribute`, `kAXActionsAttribute`, or if `AXPress` is the only one found through direct check. However, the `ax` tool's element location (with `locator.require_action`) and `perform_action` command correctly determine if an action is supported and can execute it. + +### 5. General Observations & Debugging Tips + +* **Electron App:** The UI structure suggests an Electron application. This means standard AppKit/Cocoa control identification via simple AppleScript can be challenging, and accessibility often relies on traversing web areas. +* **`ax` Tool Debugging:** + * Enable `debug_logging: true` in your `ax` commands. + * The `stderr` output from `ax` provides detailed traversal and matching information if `GLOBAL_DEBUG_ENABLED` is true in `Logging.swift`. + * The `debug_logs` array in the JSON response provides command-specific logs. +* **Focus:** Ensuring the application and target window are active and frontmost is crucial, especially before sending keystrokes. The AppleScript snippets include `activate` and attempts to set `frontmost to true`. +* **`locator.root_element_path_hint`:** Using `locator.root_element_path_hint: ["window[1]"]` as a starting point for `ax` locators helps narrow down the search scope significantly when applying criteria. + +This summary should provide a good foundation for further automation of the Claude desktop application. diff --git a/.cursor/rules/file-editing.mdc b/.cursor/rules/file-editing.mdc new file mode 100644 index 0000000..f39fa98 --- /dev/null +++ b/.cursor/rules/file-editing.mdc @@ -0,0 +1,167 @@ +--- +description: text edit +globs: +alwaysApply: false +--- +**Core Philosophy for LLM:** +* **Perl is the Workhorse:** Prioritize Perl for its powerful regex (PCRE), in-place editing with backups (`-pi.bak`), multi-line handling (especially `-0777` slurp mode), and range operators (`/START/../END/`). +* **`ripgrep` (`rg`) for Extraction (if needed):** Use `rg` if its speed or specific regex features are beneficial for *finding and extracting* a block, which can then be piped to Perl or used to inform a Perl command. +* **One-Liners:** Aim for concise, powerful one-liners. +* **Backups are Non-Negotiable:** Always include `.bak` with Perl's `-i` or instruct manual `cp` if another tool is (exceptionally) used. +* **Regex Quoting:** Single quotes (`'...'`) are the default for Perl code/regex on the command line. If a regex comes from an LLM-generated variable, it needs careful handling. +* **Specificity & Non-Greedy:** LLM must generate specific start/end regexes. Non-greedy matching (`.*?`) within blocks is crucial. The `s` flag (dotall) in regex makes `.` match newlines. +* **Variable Interpolation:** If the LLM generates Perl code that uses shell variables for regexes or replacement text, it must be aware of how Perl will interpolate these. It's often safer to pass these as arguments or construct the Perl string carefully. + +--- + +**Perl & `ripgrep` One-Liner Guide for LLM-Driven File Editing** + +**Placeholders (for LLM to fill):** +* `source_file.txt` +* `target_file.txt` +* `'START_REGEX'`: Perl regex string for the start of a block. +* `'END_REGEX'`: Perl regex string for the end of a block. +* `'FULL_BLOCK_REGEX_PCRE'`: A complete PCRE regex (often `(?s)START_REGEX.*?END_REGEX`) to match the entire block. +* `'REGEX_WITHIN_BLOCK'`: Perl regex for content to change *inside* a block. +* `'NEW_TEXT_CONTENT'`: The replacement text (can be multi-line). LLM must ensure newlines are actual newlines or properly escaped for Perl strings. +* `'TARGET_INSERTION_MARKER_REGEX'`: Perl regex matching where to insert content in `target_file.txt`. + +--- + +**1. Editing Text *within* a Defined Block (In-Place)** + +* **Goal:** Modify specific text found *between* `START_REGEX` and `END_REGEX`. +* **Tool:** Perl +* **One-Liner:** + ```bash + perl -pi.bak -e 'if (/START_REGEX/../END_REGEX/) { s/REGEX_WITHIN_BLOCK/NEW_TEXT_CONTENT/g }' source_file.txt + ``` +* **LLM Notes:** + * `NEW_TEXT_CONTENT` is inserted literally. If it comes from a shell variable, the LLM needs to ensure it's correctly expanded and quoted if it contains special Perl characters or shell metacharacters. E.g., using environment variables: `REPL_VAR="$NEW_TEXT_CONTENT" perl -pi.bak -e 'if (/START_REGEX/../END_REGEX/) { s/REGEX_WITHIN_BLOCK/$ENV{REPL_VAR}/g }' source_file.txt` + * Ensure `START_REGEX`, `END_REGEX`, and `REGEX_WITHIN_BLOCK` are valid Perl regexes. + +--- + +**2. Replacing an *Entire* Block of Text (In-Place)** + +* **Goal:** Replace the whole block (from `START_REGEX` to `END_REGEX`) with `NEW_TEXT_CONTENT`. +* **Tool:** Perl (slurp mode is excellent here) +* **One-Liner:** + ```bash + perl -0777 -pi.bak -e 's/FULL_BLOCK_REGEX_PCRE/NEW_TEXT_CONTENT/sg' source_file.txt + ``` +* **LLM Notes:** + * `-0777`: Slurps the entire file into one string. + * `FULL_BLOCK_REGEX_PCRE`: Should be like `(?s)START_REGEX.*?END_REGEX`. The `(?s)` makes `.` match newlines. The `s` flag on the `s///sg` also ensures `.` matches newlines *within this specific regex*. The `g` flag is for global replacement if the block could appear multiple times. + * `NEW_TEXT_CONTENT`: Can be multi-line. If it's from a shell variable, ensure proper quoting and expansion for the shell, then ensure it's a valid Perl string literal. Example with env var: + `NEW_BLOCK_VAR="$NEW_TEXT_CONTENT" perl -0777 -pi.bak -e 's/FULL_BLOCK_REGEX_PCRE/$ENV{NEW_BLOCK_VAR}/sg' source_file.txt` + +--- + +**3. Deleting a Block of Text (In-Place)** + +* **Goal:** Remove everything from `START_REGEX` to `END_REGEX`. +* **Tool:** Perl +* **Option A (Range Operator - often simplest):** + ```bash + perl -ni.bak -e 'print unless /START_REGEX/../END_REGEX/' source_file.txt + ``` +* **Option B (Slurp mode - good if block definition is complex or spans tricky boundaries):** + ```bash + perl -0777 -pi.bak -e 's/FULL_BLOCK_REGEX_PCRE//sg' source_file.txt + ``` +* **LLM Notes:** + * Option A (`-n` and `print unless`): Processes line by line. `START_REGEX` and `END_REGEX` mark the boundaries. + * Option B (`-0777`): Treats file as one string, replaces the matched block with nothing. + * Choose Option A if start/end markers are clear line-based patterns. Choose B if the block's structure is more complex and better captured by a single regex over the whole file. + +--- + +**4. Adding/Inserting a New Block of Text (In-Place)** + +* **Goal:** Insert `NEW_TEXT_CONTENT` after a line matching `TARGET_INSERTION_MARKER_REGEX`. +* **Tool:** Perl +* **One-Liner (Insert *after* marker):** + ```bash + perl -pi.bak -e 'if (s/TARGET_INSERTION_MARKER_REGEX/$&\nNEW_TEXT_CONTENT/) {}' source_file.txt + ``` + * Or, if `NEW_TEXT_CONTENT` might contain `$&` or other special vars: + ```bash + NEW_BLOCK_VAR="$NEW_TEXT_CONTENT" perl -pi.bak -e 'if (s/TARGET_INSERTION_MARKER_REGEX/$&\n$ENV{NEW_BLOCK_VAR}/) {}' source_file.txt + ``` +* **One-Liner (Insert *before* marker - more complex to do robustly in one pass without temp vars in pure one-liner):** + It's often simpler to replace the marker with the new block *and* the marker: + ```bash + NEW_BLOCK_VAR="$NEW_TEXT_CONTENT" perl -pi.bak -e 'if (s/TARGET_INSERTION_MARKER_REGEX/$ENV{NEW_BLOCK_VAR}\n$&/) {}' source_file.txt + ``` +* **LLM Notes:** + * `$&` in the replacement part of `s///` refers to the entire matched string (the marker). + * `NEW_TEXT_CONTENT` is appended after the marker and a newline. + * The `if` and empty `{}` ensure the substitution happens and Perl continues. + * If `TARGET_INSERTION_MARKER_REGEX` should be *replaced* by the new block: + `NEW_BLOCK_VAR="$NEW_TEXT_CONTENT" perl -pi.bak -e 's/TARGET_INSERTION_MARKER_REGEX/$ENV{NEW_BLOCK_VAR}/g' source_file.txt` + +--- + +**5. Moving a Block of Text from `source_file.txt` to `target_file.txt`** + +* **Goal:** Extract block from source, insert into target, delete from source. +* **Tools:** Perl for all steps. (`rg` could be used for extraction if its specific regex capabilities are needed, then pipe to Perl for insertion). +* **One-Liner Sequence (conceptual, hard to make a true single shell one-liner without temp files or complex shell quoting for the block content):** + + This task inherently involves state (the extracted block). True one-liners that pass this state without a temp file are tricky and less readable. A clear sequence is better for LLM reliability. + + **Recommended Approach (using a shell variable to hold the block - LLM must handle quoting for this variable carefully):** + + ```bash + # Step 1: Extract block from source_file.txt into a shell variable + # Use rg (very fast for extraction) or Perl. Using Perl here for consistency: + EXTRACTED_BLOCK=$(perl -0777 -ne 'print $& if /FULL_BLOCK_REGEX_PCRE/sg' source_file.txt) + + # Check if block was extracted + if [ -z "$EXTRACTED_BLOCK" ]; then + echo "Error: Block not found in source_file.txt or is empty. Aborting move." + # exit 1 # LLM could add this + else + # Step 2: Insert block into target_file.txt (e.g., after TARGET_INSERTION_MARKER_REGEX) + # Pass EXTRACTED_BLOCK via an environment variable to avoid quoting hell with Perl -e + BLOCK_TO_INSERT="$EXTRACTED_BLOCK" \ + perl -pi.bak_target -e 's/(TARGET_INSERTION_MARKER_REGEX)/$1\n$ENV{BLOCK_TO_INSERT}/' target_file.txt && \ + \ + # Step 3: Delete block from source_file.txt (only if insertion seemed to succeed) + perl -0777 -pi.bak_source -e 's/FULL_BLOCK_REGEX_PCRE//sg' source_file.txt && \ + echo "Block moved successfully." + fi + ``` +* **LLM Notes for Move:** + * **Atomicity:** This is NOT atomic. LLM must warn that if a step fails, files can be in an inconsistent state. + * **Shell Variable for Block:** The `EXTRACTED_BLOCK=$(...)` captures the output. + * **Passing to Perl:** Using an environment variable (`BLOCK_TO_INSERT="$EXTRACTED_BLOCK" perl ... $ENV{BLOCK_TO_INSERT}`) is generally the most robust way to pass multi-line, potentially complex strings to a Perl `-e` script from the shell. + * **Insertion Logic:** The example inserts the block *after* the marker and a newline. The LLM can adapt this (e.g., replace marker, insert before). + * **Error Check:** The `if [ -z "$EXTRACTED_BLOCK" ]` is a basic check. + * **Backup Suffixes:** Using distinct backup suffixes like `.bak_target` and `.bak_source` is good practice. + +**Simplifying "Move" for a "Stricter" One-Liner (using process substitution if target insertion is simple):** + +If inserting into the target is simple (e.g., appending, or simple marker replacement not requiring the original marker), you *could* pipe: + +```bash +# Appending extracted block to target, then deleting from source +(perl -0777 -ne 'print $& if /FULL_BLOCK_REGEX_PCRE/sg' source_file.txt >> target_file.txt.new && \ + cp target_file.txt target_file.txt.bak_target && mv target_file.txt.new target_file.txt) && \ +perl -0777 -pi.bak_source -e 's/FULL_BLOCK_REGEX_PCRE//sg' source_file.txt +``` +This is less flexible for targeted insertion within `target_file.txt` without more complex `perl` in the receiving end. The previous multi-step approach with an intermediate shell variable is often more robust for an LLM to generate correctly for various insertion scenarios. + +--- + +**General "Fuzzy Knowledge" Mitigation for LLM:** + +1. **Request Regex Flavor Explicitly:** "Generate a Perl Compatible Regular Expression (PCRE)..." +2. **Emphasize Non-Greedy:** "Ensure the regex for the block content uses non-greedy matching (e.g., `.*?` with the `s` flag)." +3. **Ask for Backup Command:** "Always include a backup mechanism, like Perl's `-pi.bak`." +4. **Specify Multi-line Handling:** + * "If operating on a whole block as one unit, use Perl's `-0777` slurp mode." + * "If processing line-by-line but needing to act on a range, use Perl's `/START/../END/` range operator." +5. **Newline in Replacement:** "When providing `NEW_TEXT_CONTENT` for Perl, ensure newlines are actual newlines if it's a direct string in the `-e` script, or that they are correctly handled if coming from a shell variable." +6. **Quoting for LLM-Generated Variables:** If the LLM is told to use a shell variable for `START_REGEX` or `NEW_TEXT_CONTENT`, it must be reminded about shell quoting for the variable assignment and then how Perl will see that variable (e.g., via `$ENV{VAR_NAME}` to avoid Perl trying to interpret it as a Perl variable directly). \ No newline at end of file diff --git a/.cursor/scripts/terminator.scpt b/.cursor/scripts/terminator.scpt new file mode 100755 index 0000000..60bce85 --- /dev/null +++ b/.cursor/scripts/terminator.scpt @@ -0,0 +1,771 @@ +#!/usr/bin/osascript +-------------------------------------------------------------------------------- +-- terminator_v0.6.0_safe_enhanced.scpt - Safe Enhanced v0.6.0 +-- Conservative enhancement of proven v0.6.0 baseline with only minimal safe improvements +-- Features: Enhanced error reporting, improved timing, better output formatting +-------------------------------------------------------------------------------- + +--#region Configuration Properties +property maxCommandWaitTime : 15.0 -- Increased from 10.0 for better reliability +property pollIntervalForBusyCheck : 0.1 +property startupDelayForTerminal : 0.7 +property minTailLinesOnWrite : 100 -- Increased from 15 for better build log visibility +property defaultTailLines : 100 -- Increased from 30 for better build log visibility +property tabTitlePrefix : "Terminator ๐Ÿค–๐Ÿ’ฅ " -- For the window/tab title itself +property scriptInfoPrefix : "Terminator ๐Ÿค–๐Ÿ’ฅ: " -- For messages generated by this script +property projectIdentifierInTitle : "Project: " +property taskIdentifierInTitle : " - Task: " +property enableFuzzyTagGrouping : true +property fuzzyGroupingMinPrefixLength : 4 + +-- Safe enhanced properties (minimal additions) +property enhancedErrorReporting : true +property verboseLogging : false +--#endregion Configuration Properties + +--#region Helper Functions +on isValidPath(thePath) + if thePath is not "" and (thePath starts with "/") then + if not (thePath contains " -") then -- Basic heuristic + return true + end if + end if + return false +end isValidPath + +on getPathComponent(thePath, componentIndex) + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to "/" + set pathParts to text items of thePath + set AppleScript's text item delimiters to oldDelims + set nonEmptyParts to {} + repeat with aPart in pathParts + if aPart is not "" then set end of nonEmptyParts to aPart + end repeat + if (count nonEmptyParts) = 0 then return "" + try + if componentIndex is -1 then + return item -1 of nonEmptyParts + else if componentIndex > 0 and componentIndex โ‰ค (count nonEmptyParts) then + return item componentIndex of nonEmptyParts + end if + on error + return "" + end try + return "" +end getPathComponent + +on generateWindowTitle(taskTag as text, projectGroup as text) + if projectGroup is not "" then + return tabTitlePrefix & projectIdentifierInTitle & projectGroup & taskIdentifierInTitle & taskTag + else + return tabTitlePrefix & taskTag + end if +end generateWindowTitle + +on bufferContainsMeaningfulContentAS(multiLineText, knownInfoPrefix as text, commonShellPrompts as list) + if multiLineText is "" then return false + + -- Simple approach: if the trimmed content is substantial and not just our info messages, consider it meaningful + set trimmedText to my trimWhitespace(multiLineText) + if (length of trimmedText) < 3 then return false + + -- Check if it's only our script info messages + if trimmedText starts with knownInfoPrefix then + -- If it's ONLY our message and nothing else meaningful, return false + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to linefeed + set textLines to text items of multiLineText + set AppleScript's text item delimiters to oldDelims + + set nonInfoLines to 0 + repeat with aLine in textLines + set currentLine to my trimWhitespace(aLine as text) + if currentLine is not "" and not (currentLine starts with knownInfoPrefix) then + set nonInfoLines to nonInfoLines + 1 + end if + end repeat + + -- If we have substantial non-info content, consider it meaningful + return (nonInfoLines > 2) + end if + + -- If content doesn't start with our info prefix, likely contains command output + return true +end bufferContainsMeaningfulContentAS + +-- Enhanced error reporting helper +on formatErrorMessage(errorType, errorMsg, context) + if enhancedErrorReporting then + set formattedMsg to scriptInfoPrefix & errorType & ": " & errorMsg + if context is not "" then + set formattedMsg to formattedMsg & " (Context: " & context & ")" + end if + return formattedMsg + else + return scriptInfoPrefix & errorMsg + end if +end formatErrorMessage + +-- Enhanced logging helper +on logVerbose(message) + if verboseLogging then + log "๐Ÿ” " & message + end if +end logVerbose +--#endregion Helper Functions + +--#region Main Script Logic (on run) +on run argv + set appSpecificErrorOccurred to false + try + my logVerbose("Starting Terminator v0.6.0 Safe Enhanced") + + tell application "System Events" + if not (exists process "Terminal") then + launch application id "com.apple.Terminal" + delay startupDelayForTerminal + end if + end tell + + set originalArgCount to count argv + if originalArgCount < 1 then return my usageText() + + set projectPathArg to "" + set actualArgsForParsing to argv + if originalArgCount > 0 then + set potentialPath to item 1 of argv + if my isValidPath(potentialPath) then + set projectPathArg to potentialPath + my logVerbose("Detected project path: " & projectPathArg) + if originalArgCount > 1 then + set actualArgsForParsing to items 2 thru -1 of argv + else + return my formatErrorMessage("Argument Error", "Project path \"" & projectPathArg & "\" provided, but no task tag or command specified." & linefeed & linefeed & my usageText(), "") + end if + end if + end if + + if (count actualArgsForParsing) < 1 then return my usageText() + + set taskTagName to item 1 of actualArgsForParsing + my logVerbose("Task tag: " & taskTagName) + + if (length of taskTagName) > 40 or (not my tagOK(taskTagName)) then + set errorMsg to "Task Tag missing or invalid: \"" & taskTagName & "\"." & linefeed & linefeed & ยฌ + "A 'task tag' (e.g., 'build', 'tests') is a short name (1-40 letters, digits, -, _) " & ยฌ + "to identify a specific task, optionally within a project session." & linefeed & linefeed + return my formatErrorMessage("Validation Error", errorMsg & my usageText(), "tag validation") + end if + + set doWrite to false + set shellCmd to "" + set originalUserShellCmd to "" + set currentTailLines to defaultTailLines + set explicitLinesProvided to false + set argCountAfterTagOrPath to count actualArgsForParsing + + if argCountAfterTagOrPath > 1 then + set commandParts to items 2 thru -1 of actualArgsForParsing + if (count commandParts) > 0 then + set lastOfCmdParts to item -1 of commandParts + if my isInteger(lastOfCmdParts) then + set currentTailLines to (lastOfCmdParts as integer) + set explicitLinesProvided to true + my logVerbose("Explicit lines requested: " & currentTailLines) + if (count commandParts) > 1 then + set commandParts to items 1 thru -2 of commandParts + else + set commandParts to {} + end if + end if + end if + if (count commandParts) > 0 then + set originalUserShellCmd to my joinList(commandParts, " ") + my logVerbose("Command detected: " & originalUserShellCmd) + end if + else if argCountAfterTagOrPath = 1 then + -- Only taskTagName was provided after potential projectPathArg + -- This is a read operation by default. + my logVerbose("Read-only operation detected") + end if + + if originalUserShellCmd is not "" and (my trimWhitespace(originalUserShellCmd) is not "") then + set doWrite to true + set shellCmd to originalUserShellCmd + else if projectPathArg is not "" and originalUserShellCmd is "" then + -- Path provided, task tag, and empty command string "" OR no command string but lines_to_read was there + set doWrite to true + set shellCmd to "" -- will become 'cd path' + my logVerbose("CD-only operation for path: " & projectPathArg) + else + set doWrite to false + set shellCmd to "" + end if + + if currentTailLines < 1 then set currentTailLines to 1 + if doWrite and (shellCmd is not "" or projectPathArg is not "") and currentTailLines < minTailLinesOnWrite then + set currentTailLines to minTailLinesOnWrite + my logVerbose("Increased tail lines for write operation: " & currentTailLines) + end if + + if projectPathArg is not "" and doWrite then + set quotedProjectPath to quoted form of projectPathArg + if shellCmd is not "" then + set shellCmd to "cd " & quotedProjectPath & " && " & shellCmd + else + set shellCmd to "cd " & quotedProjectPath + end if + my logVerbose("Final command: " & shellCmd) + end if + + set derivedProjectGroup to "" + if projectPathArg is not "" then + set derivedProjectGroup to my getPathComponent(projectPathArg, -1) + if derivedProjectGroup is "" then set derivedProjectGroup to "DefaultProject" + my logVerbose("Project group: " & derivedProjectGroup) + end if + + set allowCreation to false + if doWrite then + set allowCreation to true + else if explicitLinesProvided then + set allowCreation to true + end if + + set effectiveTabTitleForLookup to my generateWindowTitle(taskTagName, derivedProjectGroup) + my logVerbose("Tab title: " & effectiveTabTitleForLookup) + + set tabInfo to my ensureTabAndWindow(taskTagName, derivedProjectGroup, allowCreation, effectiveTabTitleForLookup) + + if tabInfo is missing value then + if not allowCreation then + set errorMsg to "Terminal session \"" & effectiveTabTitleForLookup & "\" not found." & linefeed & ยฌ + "To create this session, provide a command (even an empty string \"\" if only 'cd'-ing to a project path), " & ยฌ + "or specify lines to read (e.g., ... \"" & taskTagName & "\" 1)." & linefeed + if projectPathArg is not "" then + set errorMsg to errorMsg & "Project path was specified as: \"" & projectPathArg & "\"." & linefeed + else + set errorMsg to errorMsg & "If this is for a new project, provide the absolute project path as the first argument." & linefeed + end if + return my formatErrorMessage("Session Error", errorMsg & linefeed & my usageText(), "session lookup") + else + return my formatErrorMessage("Creation Error", "Could not find or create Terminal tab for \"" & effectiveTabTitleForLookup & "\". Check permissions/Terminal state.", "tab creation") + end if + end if + + set targetTab to targetTab of tabInfo + set parentWindow to parentWindow of tabInfo + set wasNewlyCreated to wasNewlyCreated of tabInfo + set createdInExistingViaFuzzy to createdInExistingWindowViaFuzzy of tabInfo + + my logVerbose("Tab info - new: " & wasNewlyCreated & ", fuzzy: " & createdInExistingViaFuzzy) + + set bufferText to "" + set commandTimedOut to false + set tabWasBusyOnRead to false + set previousCommandActuallyStopped to true + set attemptMadeToStopPreviousCommand to false + set identifiedBusyProcessName to "" + set theTTYForInfo to "" + + if not doWrite and wasNewlyCreated then + if createdInExistingViaFuzzy then + return scriptInfoPrefix & "New tab \"" & effectiveTabTitleForLookup & "\" created in existing project window and ready." + else + return scriptInfoPrefix & "New tab \"" & effectiveTabTitleForLookup & "\" (in new window) created and ready." + end if + end if + + tell application id "com.apple.Terminal" + try + set index of parentWindow to 1 + set selected tab of parentWindow to targetTab + if wasNewlyCreated and doWrite then + delay 0.4 + else + delay 0.1 + end if + + if doWrite and shellCmd is not "" then + my logVerbose("Executing command: " & shellCmd) + set canProceedWithWrite to true + if busy of targetTab then + if not wasNewlyCreated or createdInExistingViaFuzzy then + set attemptMadeToStopPreviousCommand to true + set previousCommandActuallyStopped to false + try + set theTTYForInfo to my trimWhitespace(tty of targetTab) + end try + set processesBefore to {} + try + set processesBefore to processes of targetTab + end try + set commonShells to {"login", "bash", "zsh", "sh", "tcsh", "ksh", "-bash", "-zsh", "-sh", "-tcsh", "-ksh", "dtterm", "fish"} + set identifiedBusyProcessName to "" + if (count of processesBefore) > 0 then + repeat with i from (count of processesBefore) to 1 by -1 + set aProcessName to item i of processesBefore + if aProcessName is not in commonShells then + set identifiedBusyProcessName to aProcessName + exit repeat + end if + end repeat + end if + my logVerbose("Busy process identified: " & identifiedBusyProcessName) + set processToTargetForKill to identifiedBusyProcessName + set killedViaPID to false + if theTTYForInfo is not "" and processToTargetForKill is not "" then + set shortTTY to text 6 thru -1 of theTTYForInfo + set pidsToKillText to "" + try + set psCommand to "ps -t " & shortTTY & " -o pid,comm | awk '$2 == \"" & processToTargetForKill & "\" {print $1}'" + set pidsToKillText to do shell script psCommand + end try + if pidsToKillText is not "" then + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to linefeed + set pidList to text items of pidsToKillText + set AppleScript's text item delimiters to oldDelims + repeat with aPID in pidList + set aPID to my trimWhitespace(aPID) + if aPID is not "" then + try + do shell script "kill -INT " & aPID + delay 0.3 + do shell script "kill -0 " & aPID + try + do shell script "kill -KILL " & aPID + delay 0.2 + try + do shell script "kill -0 " & aPID + on error + set previousCommandActuallyStopped to true + end try + end try + on error + set previousCommandActuallyStopped to true + end try + end if + if previousCommandActuallyStopped then + set killedViaPID to true + exit repeat + end if + end repeat + end if + end if + if not previousCommandActuallyStopped and busy of targetTab then + activate + delay 0.5 + tell application "System Events" to keystroke "c" using control down + delay 0.6 + if not (busy of targetTab) then + set previousCommandActuallyStopped to true + if identifiedBusyProcessName is not "" and (identifiedBusyProcessName is in (processes of targetTab)) then + set previousCommandActuallyStopped to false + end if + end if + else if not busy of targetTab then + set previousCommandActuallyStopped to true + end if + if not previousCommandActuallyStopped then + set canProceedWithWrite to false + end if + else if wasNewlyCreated and not createdInExistingViaFuzzy and busy of targetTab then + delay 0.4 + if busy of targetTab then + set attemptMadeToStopPreviousCommand to true + set previousCommandActuallyStopped to false + set identifiedBusyProcessName to "extended initialization" + set canProceedWithWrite to false + else + set previousCommandActuallyStopped to true + end if + end if + end if + + if canProceedWithWrite then + -- MAINTAINED: No automatic clear command to prevent interrupting build processes + do script shellCmd in targetTab + set commandStartTime to current date + set commandFinished to false + repeat while ((current date) - commandStartTime) < maxCommandWaitTime + if not (busy of targetTab) then + set commandFinished to true + exit repeat + end if + delay pollIntervalForBusyCheck + end repeat + if not commandFinished then set commandTimedOut to true + if commandFinished then delay 0.2 -- Increased from 0.1 for better output settling + my logVerbose("Command execution completed, timeout: " & commandTimedOut) + end if + else if not doWrite then + if busy of targetTab then + set tabWasBusyOnRead to true + try + set theTTYForInfo to my trimWhitespace(tty of targetTab) + end try + set processesReading to processes of targetTab + set commonShells to {"login", "bash", "zsh", "sh", "tcsh", "ksh", "-bash", "-zsh", "-sh", "-tcsh", "-ksh", "dtterm", "fish"} + set identifiedBusyProcessName to "" + if (count of processesReading) > 0 then + repeat with i from (count of processesReading) to 1 by -1 + set aProcessName to item i of processesReading + if aProcessName is not in commonShells then + set identifiedBusyProcessName to aProcessName + exit repeat + end if + end repeat + end if + my logVerbose("Tab busy during read with: " & identifiedBusyProcessName) + end if + end if + + set bufferText to history of targetTab + on error errMsg number errNum + set appSpecificErrorOccurred to true + return my formatErrorMessage("Terminal Error", errMsg, "error " & errNum) + end try + end tell + + set appendedMessage to "" + set ttyInfoStringForMessage to "" + if theTTYForInfo is not "" then set ttyInfoStringForMessage to " (TTY " & theTTYForInfo & ")" + if attemptMadeToStopPreviousCommand then + set processNameToReport to "process" + if identifiedBusyProcessName is not "" and identifiedBusyProcessName is not "extended initialization" then + set processNameToReport to "'" & identifiedBusyProcessName & "'" + else if identifiedBusyProcessName is "extended initialization" then + set processNameToReport to "tab's extended initialization" + end if + if previousCommandActuallyStopped then + set appendedMessage to linefeed & scriptInfoPrefix & "Previous " & processNameToReport & ttyInfoStringForMessage & " was interrupted. ---" + else + set appendedMessage to linefeed & scriptInfoPrefix & "Attempted to interrupt previous " & processNameToReport & ttyInfoStringForMessage & ", but it may still be running. New command NOT executed. ---" + end if + end if + if commandTimedOut then + set cmdForMsg to originalUserShellCmd + if projectPathArg is not "" and originalUserShellCmd is not "" then set cmdForMsg to originalUserShellCmd & " (in " & projectPathArg & ")" + if projectPathArg is not "" and originalUserShellCmd is "" then set cmdForMsg to "(cd " & projectPathArg & ")" + set appendedMessage to appendedMessage & linefeed & scriptInfoPrefix & "Command '" & cmdForMsg & "' may still be running. Returned after " & maxCommandWaitTime & "s timeout. ---" + else if tabWasBusyOnRead then + set processNameToReportOnRead to "process" + if identifiedBusyProcessName is not "" then set processNameToReportOnRead to "'" & identifiedBusyProcessName & "'" + set busyProcessInfoString to "" + if identifiedBusyProcessName is not "" then set busyProcessInfoString to " with " & processNameToReportOnRead + set appendedMessage to appendedMessage & linefeed & scriptInfoPrefix & "Tab" & ttyInfoStringForMessage & " was busy" & busyProcessInfoString & " during read. Output may be from an ongoing process. ---" + end if + + if appendedMessage is not "" then + if bufferText is "" then + set bufferText to my trimWhitespace(appendedMessage) + else + set bufferText to bufferText & appendedMessage + end if + end if + + set tailedOutput to my tailBufferAS(bufferText, currentTailLines) + set finalResult to my trimBlankLinesAS(tailedOutput) + + if finalResult is "" then + set effectiveOriginalCmdForMsg to originalUserShellCmd + if projectPathArg is not "" and originalUserShellCmd is "" then + set effectiveOriginalCmdForMsg to "(cd " & projectPathArg & ")" + else if projectPathArg is not "" and originalUserShellCmd is not "" then + set effectiveOriginalCmdForMsg to originalUserShellCmd & " (in " & projectPathArg & ")" + end if + + set baseMsgInfo to "Session \"" & effectiveTabTitleForLookup & "\", requested " & currentTailLines & " lines." + set specificAppendedInfo to my trimWhitespace(appendedMessage) + set suffixForReturn to "" + if specificAppendedInfo is not "" then set suffixForReturn to linefeed & specificAppendedInfo + + if attemptMadeToStopPreviousCommand and not previousCommandActuallyStopped then + return my formatErrorMessage("Process Error", "Previous command/initialization in session \"" & effectiveTabTitleForLookup & "\"" & ttyInfoStringForMessage & " may not have terminated. New command '" & effectiveOriginalCmdForMsg & "' NOT executed." & suffixForReturn, "process termination") + else if commandTimedOut then + return my formatErrorMessage("Timeout Error", "Command '" & effectiveOriginalCmdForMsg & "' timed out after " & maxCommandWaitTime & "s. No other output. " & baseMsgInfo & suffixForReturn, "command timeout") + else if tabWasBusyOnRead then + return my formatErrorMessage("Busy Error", "Tab for session \"" & effectiveTabTitleForLookup & "\" was busy during read. No other output. " & baseMsgInfo & suffixForReturn, "read busy") + else if doWrite and shellCmd is not "" then + return scriptInfoPrefix & "Command '" & effectiveOriginalCmdForMsg & "' executed in session \"" & effectiveTabTitleForLookup & "\". No output captured." + else + return scriptInfoPrefix & "No meaningful content found in session \"" & effectiveTabTitleForLookup & "\"." + end if + end if + + my logVerbose("Returning " & (length of finalResult) & " characters of output") + return finalResult + + on error generalErrorMsg number generalErrorNum + if appSpecificErrorOccurred then error generalErrorMsg number generalErrorNum + return my formatErrorMessage("Execution Error", generalErrorMsg, "error " & generalErrorNum) + end try +end run +--#endregion Main Script Logic (on run) + +--#region Helper Functions (Unchanged from baseline for maximum compatibility) +on ensureTabAndWindow(taskTagName as text, projectGroupName as text, allowCreate as boolean, desiredFullTitle as text) + set wasActuallyCreated to false + set createdInExistingViaFuzzy to false + + tell application id "com.apple.Terminal" + try + repeat with w in windows + repeat with tb in tabs of w + try + if custom title of tb is desiredFullTitle then + set selected tab of w to tb + return {targetTab:tb, parentWindow:w, wasNewlyCreated:false, createdInExistingWindowViaFuzzy:false} + end if + end try + end repeat + end repeat + end try + + if allowCreate and enableFuzzyTagGrouping and projectGroupName is not "" then + set projectGroupSearchPatternForWindowName to tabTitlePrefix & projectIdentifierInTitle & projectGroupName + try + repeat with w in windows + try + -- More aggressive grouping: look for any window that contains our project name + if name of w starts with projectGroupSearchPatternForWindowName or name of w contains (projectIdentifierInTitle & projectGroupName) then + if not frontmost then activate + delay 0.2 + set newTabInGroup to do script "" in w -- MAINTAINED: No clear command + delay 0.3 + set custom title of newTabInGroup to desiredFullTitle + delay 0.2 + set selected tab of w to newTabInGroup + return {targetTab:newTabInGroup, parentWindow:w, wasNewlyCreated:true, createdInExistingWindowViaFuzzy:true} + end if + end try + end repeat + end try + end if + + -- Enhanced fallback: if no project-specific window found, try to use any existing Terminator window + if allowCreate and enableFuzzyTagGrouping then + try + repeat with w in windows + try + if name of w starts with tabTitlePrefix then + -- Found an existing Terminator window, use it for grouping + if not frontmost then activate + delay 0.2 + set newTabInGroup to do script "" in w -- MAINTAINED: No clear command + delay 0.3 + set custom title of newTabInGroup to desiredFullTitle + delay 0.2 + set selected tab of w to newTabInGroup + return {targetTab:newTabInGroup, parentWindow:w, wasNewlyCreated:true, createdInExistingWindowViaFuzzy:true} + end if + end try + end repeat + end try + end if + + if allowCreate then + try + if not frontmost then activate + delay 0.3 + set newTabInNewWindow to do script "" -- MAINTAINED: No clear command + set wasActuallyCreated to true + delay 0.4 + set custom title of newTabInNewWindow to desiredFullTitle + delay 0.2 + set parentWinOfNew to missing value + try + set parentWinOfNew to window of newTabInNewWindow + on error + if (count of windows) > 0 then set parentWinOfNew to front window + end try + if parentWinOfNew is not missing value then + if custom title of newTabInNewWindow is desiredFullTitle then + set selected tab of parentWinOfNew to newTabInNewWindow + return {targetTab:newTabInNewWindow, parentWindow:parentWinOfNew, wasNewlyCreated:wasActuallyCreated, createdInExistingWindowViaFuzzy:false} + end if + end if + repeat with w_final_scan in windows + repeat with tb_final_scan in tabs of w_final_scan + try + if custom title of tb_final_scan is desiredFullTitle then + set selected tab of w_final_scan to tb_final_scan + return {targetTab:tb_final_scan, parentWindow:w_final_scan, wasNewlyCreated:wasActuallyCreated, createdInExistingWindowViaFuzzy:false} + end if + end try + end repeat + end repeat + return missing value + on error + return missing value + end try + else + return missing value + end if + end tell +end ensureTabAndWindow + +on tailBufferAS(txt, n) + set AppleScript's text item delimiters to linefeed + set lst to text items of txt + if (count lst) = 0 then return "" + set startN to (count lst) - (n - 1) + if startN < 1 then set startN to 1 + set slice to items startN thru -1 of lst + set outText to slice as text + set AppleScript's text item delimiters to "" + return outText +end tailBufferAS + +on lineIsEffectivelyEmptyAS(aLine) + if aLine is "" then return true + set trimmedLine to my trimWhitespace(aLine) + return (trimmedLine is "") +end lineIsEffectivelyEmptyAS + +on trimBlankLinesAS(txt) + if txt is "" then return "" + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to {linefeed} + set originalLines to text items of txt + set linesToProcess to {} + repeat with aLineRef in originalLines + set aLine to contents of aLineRef + if my lineIsEffectivelyEmptyAS(aLine) then + set end of linesToProcess to "" + else + set end of linesToProcess to aLine + end if + end repeat + set firstContentLine to 1 + repeat while firstContentLine โ‰ค (count linesToProcess) and (item firstContentLine of linesToProcess is "") + set firstContentLine to firstContentLine + 1 + end repeat + set lastContentLine to count linesToProcess + repeat while lastContentLine โ‰ฅ firstContentLine and (item lastContentLine of linesToProcess is "") + set lastContentLine to lastContentLine - 1 + end repeat + if firstContentLine > lastContentLine then + set AppleScript's text item delimiters to oldDelims + return "" + end if + set resultLines to items firstContentLine thru lastContentLine of linesToProcess + set AppleScript's text item delimiters to linefeed + set trimmedTxt to resultLines as text + set AppleScript's text item delimiters to oldDelims + return trimmedTxt +end trimBlankLinesAS + +on trimWhitespace(theText) + set whitespaceChars to {" ", tab} + set newText to theText + repeat while (newText is not "") and (character 1 of newText is in whitespaceChars) + if (length of newText) > 1 then + set newText to text 2 thru -1 of newText + else + set newText to "" + end if + end repeat + repeat while (newText is not "") and (character -1 of newText is in whitespaceChars) + if (length of newText) > 1 then + set newText to text 1 thru -2 of newText + else + set newText to "" + end if + end repeat + return newText +end trimWhitespace + +on isInteger(v) + try + v as integer + return true + on error + return false + end try +end isInteger + +on tagOK(t) + try + do shell script "/bin/echo " & quoted form of t & " | /usr/bin/grep -E -q '^[A-Za-z0-9_-]+$'" + return true + on error + return false + end try +end tagOK + +on joinList(theList, theDelimiter) + set oldDelims to AppleScript's text item delimiters + set AppleScript's text item delimiters to theDelimiter + set theText to theList as text + set AppleScript's text item delimiters to oldDelims + return theText +end joinList + +on usageText() + set LF to linefeed + set scriptName to "terminator_v0.6.0_safe_enhanced.scpt" + set exampleProject to "/Users/name/Projects/FancyApp" + set exampleProjectNameForTitle to my getPathComponent(exampleProject, -1) + if exampleProjectNameForTitle is "" then set exampleProjectNameForTitle to "DefaultProject" + set exampleTaskTag to "build_frontend" + set exampleFullCommand to "npm run build" + + set generatedExampleTitle to my generateWindowTitle(exampleTaskTag, exampleProjectNameForTitle) + + set outText to scriptName & " - v0.6.0 Safe Enhanced \"T-1000\" โ€“ AppleScript Terminal helper" & LF & LF + set outText to outText & "Safe enhancements: Enhanced error reporting, verbose logging (optional)" & LF & LF + set outText to outText & "Manages dedicated, tagged Terminal sessions, grouped by project path." & LF & LF + + set outText to outText & "Core Concept:" & LF + set outText to outText & " 1. For a NEW project, provide the absolute project path FIRST, then task tag, then command:" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"" & exampleTaskTag & "\" \"" & exampleFullCommand & "\"" & LF + set outText to outText & " The script will 'cd' into the project path and run the command." & LF + set outText to outText & " The tab will be titled like: \"" & generatedExampleTitle & "\"" & LF + set outText to outText & " 2. For SUBSEQUENT commands for THE SAME PROJECT, use the project path and task tag:" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"" & exampleTaskTag & "\" \"another_command\"" & LF + set outText to outText & " 3. To simply READ from an existing session (path & tag must identify an existing session):" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"" & exampleTaskTag & "\"" & LF + set outText to outText & " A READ operation on a non-existent tag (without path/command to create) will error." & LF & LF + + set outText to outText & "Title Format: \"" & tabTitlePrefix & projectIdentifierInTitle & "" & taskIdentifierInTitle & "\"" & LF + set outText to outText & "Or if no project path provided: \"" & tabTitlePrefix & "\"" & LF & LF + + set outText to outText & "Safe Enhanced Features:" & LF + set outText to outText & " โ€ข Enhanced error reporting with context information" & LF + set outText to outText & " โ€ข Optional verbose logging for debugging" & LF + set outText to outText & " โ€ข Improved timing and reliability (same as v0.6.0)" & LF + set outText to outText & " โ€ข No automatic clearing to prevent interrupting builds" & LF + set outText to outText & " โ€ข 100-line default output for better build log visibility" & LF + set outText to outText & " โ€ข Automatically 'cd's into project path if provided with a command." & LF + set outText to outText & " โ€ข Groups new task tabs into existing project windows if fuzzy grouping enabled." & LF + set outText to outText & " โ€ข Interrupts busy processes in reused tabs." & LF & LF + + set outText to outText & "Usage Examples:" & LF + set outText to outText & " # Start new project session, cd, run command, get 100 lines:" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"frontend_build\" \"npm run build\" 100" & LF + set outText to outText & " # Create/use 'backend_tests' task tab in the 'FancyApp' project window:" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"backend_tests\" \"pytest\"" & LF + set outText to outText & " # Prepare/create a new session by just cd'ing into project path (empty command):" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"dev_shell\" \"\" 1" & LF + set outText to outText & " # Read from an existing session:" & LF + set outText to outText & " osascript " & scriptName & " \"" & exampleProject & "\" \"frontend_build\" 50" & LF & LF + + set outText to outText & "Parameters:" & LF + set outText to outText & " [\"/absolute/project/path\"]: (Optional First Arg) Base path for project. Enables 'cd' and grouping." & LF + set outText to outText & " \"\": Required. Specific task name for the tab (e.g., 'build', 'tests')." & LF + set outText to outText & " [\"\"]: (Optional) Command. If path provided, 'cd path &&' is prepended." & LF + set outText to outText & " Use \"\" for no command (will just 'cd' if path given)." & LF + set outText to outText & " [[lines_to_read]]: (Optional Last Arg) Number of history lines. Default: " & defaultTailLines & "." & LF & LF + + set outText to outText & "Notes:" & LF + set outText to outText & " โ€ข Safe enhanced version with improved error reporting and logging." & LF + set outText to outText & " โ€ข Provide project path on first use for a project for best window grouping and auto 'cd'." & LF + set outText to outText & " โ€ข Ensure Automation permissions for Terminal.app & System Events.app." & LF + set outText to outText & " โ€ข v0.6.0 Safe Enhanced: Better errors, optional verbose logging, 100% baseline compatibility." & LF + + return outText +end usageText +--#endregion Helper Functions \ No newline at end of file diff --git a/.gitignore b/.gitignore index 5314170..492be0f 100644 --- a/.gitignore +++ b/.gitignore @@ -71,4 +71,10 @@ scripts/*.js # Validation output file validation-output.txt -test_output.txt \ No newline at end of file +test_output.txt + +# Swift Package Manager +Package.resolved + +# AXorcist binary +axorc/axorc \ No newline at end of file diff --git a/CHANGELOG.md b/CHANGELOG.md index f896ff2..6194fc6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,8 +1,11 @@ # Changelog +## [0.5.0] - 2025-05-20 +- Added new accessibility runner for improved accessibility API access. + ## [0.4.1] - 2025-05-20 -- Fixed version reporting to only occur on tool calls, not MCP initialization handshake -- Removed unnecessary server ready log message that was causing MCP client connection issues +- Fixed version reporting to only occur on tool calls, not MCP initialization handshake. +- Removed unnecessary server ready log message that was causing MCP client connection issues. ## [0.4.0] - 2025-05-20 - Replaced the `use_script_friendly_output` boolean parameter with a more versatile `output_format_mode` string enum parameter for the `execute_script` tool. This provides finer-grained control over `osascript` output formatting flags. diff --git a/README.md b/README.md index 05067dc..6a705bf 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,23 @@ -# macOS Automator MCP Server +# ๐Ÿค– macOS Automator MCP Server: Your Friendly Neighborhood RoboScripterโ„ข ![macOS Automator MCP Server](assets/logo.png) -## Overview -This project provides a Model Context Protocol (MCP) server, `macos_automator`, that allows execution of AppleScript and JavaScript for Automation (JXA) scripts on macOS. It features a knowledge base of pre-defined scripts accessible by ID and supports inline scripts, script files, and argument passing. -The knowledge base is loaded lazily on first use for fast server startup. +## ๐ŸŽฏ Mission Control: Teaching Robots to Click Buttons Since 2024 -## Benefits -- Execute AppleScript/JXA scripts remotely via MCP. -- Utilize a rich, extensible knowledge base of common macOS automation tasks. -- Control macOS applications and system functions programmatically. -- Integrate macOS automation into larger AI-driven workflows. +Welcome to the automated future where your Mac finally does what you tell it to! This Model Context Protocol (MCP) server transforms your AI assistant into a silicon-based intern who actually knows AppleScript and JavaScript for Automation (JXA). -## Prerequisites -- Node.js (version >=18.0.0 recommended, see `package.json` engines). -- macOS. -- **CRITICAL PERMISSIONS SETUP:** +No more copy-pasting scripts like a caveman - let the robots handle the robot work! Our knowledge base contains over 200 pre-programmed automation sequences, loaded faster than you can say "Hey Siri, why don't you work like this?" + +## ๐Ÿš€ Why Let Robots Run Your Mac? +- **Remote Control Reality**: Execute AppleScript/JXA scripts via MCP - it's like having a tiny robot inside your Mac! +- **Knowledge Base of Power**: 200+ pre-built automation recipes. From "toggle dark mode" to "extract all URLs from Safari" - we've got your robot needs covered. +- **App Whisperer**: Control any macOS application programmatically. Make Finder dance, Safari sing, and Terminal... well, terminate things. +- **AI Workflow Integration**: Connect your Mac to the AI revolution. Your LLM can now actually DO things instead of just talking about them! + +## ๐Ÿ”ง Robot Requirements (Prerequisites) +- **Node.js** (version >=18.0.0) - Because even robots need a runtime +- **macOS** - Sorry Windows users, this is an Apple-only party ๐ŸŽ +- **โš ๏ธ CRITICAL: Permission to Automate (Your Mac's Trust Issues):** - The application running THIS MCP server (e.g., Terminal, your Node.js application) requires explicit user permissions on the macOS machine where the server is running. - **Automation Permissions:** To control other applications (Finder, Safari, Mail, etc.). - Go to: System Settings > Privacy & Security > Automation. @@ -27,11 +29,11 @@ The knowledge base is loaded lazily on first use for fast server startup. - Add the application running the server (e.g., Terminal) to the list and ensure its checkbox is ticked. - First-time attempts to control a new application or use accessibility features may still trigger a macOS confirmation prompt, even if pre-authorized. The server itself cannot grant these permissions. -## Installation & Usage +## ๐Ÿƒโ€โ™‚๏ธ Quick Start: Release the Robots! -The primary way to run this server is via `npx`. This ensures you're using the latest version without needing a global install. +The easiest way to deploy your automation army is via `npx`. No installation needed - just pure robot magic! -Add the following configuration to your MCP client's `mcp.json` (or equivalent configuration): +Add this to your MCP client's `mcp.json` and watch the automation begin: ```json { @@ -47,9 +49,9 @@ Add the following configuration to your MCP client's `mcp.json` (or equivalent c } ``` -### Running Locally (for Development or Direct Use) +### ๐Ÿ› ๏ธ Robot Workshop Mode (Local Development) -Alternatively, for development or if you prefer to run the server directly from a cloned repository, you can use the provided `start.sh` script. This is useful if you want to make local modifications or run a specific version. +Want to tinker with the robot's brain? Clone the repo and become a robot surgeon! 1. **Clone the repository:** ```bash @@ -80,11 +82,11 @@ Alternatively, for development or if you prefer to run the server directly from **Note for Developers:** The `start.sh` script, particularly if modified to remove any pre-existing compiled `dist/server.js` before execution (e.g., by adding `rm -f dist/server.js`), is designed to ensure you are always running the latest TypeScript code from the `src/` directory via `tsx`. This is ideal for development to prevent issues with stale builds. For production deployment (e.g., when published to npm), a build process would typically create a definitive `dist/server.js` which would then be the entry point for the published package. -## Tools Provided +## ๐Ÿค– Robot Toolbox -### 1. `execute_script` +### 1. `execute_script` - The Script Launcher 9000 -Executes an AppleScript or JavaScript for Automation (JXA) script on macOS. +Your robot's primary weapon for macOS domination. Feed it AppleScript or JXA, and watch the magic happen! Scripts can be provided as inline content (`script_content`), an absolute file path (`script_path`), or by referencing a script from the built-in knowledge base using its unique `kb_script_id`. **Script Sources (mutually exclusive):** @@ -186,9 +188,9 @@ The `execute_script` tool returns a response in the following format: } ``` -### 2. `get_scripting_tips` +### 2. `get_scripting_tips` - The Robot's Encyclopedia -Retrieves AppleScript/JXA tips, examples, and runnable script details from the server's knowledge base. Useful for discovering available scripts, their functionalities, and how to use them with `execute_script` (especially `kb_script_id`). +Your personal automation librarian! Searches through 200+ pre-built scripts faster than you can Google "how to AppleScript". Perfect for when your robot needs inspiration. **Arguments:** - `list_categories` (boolean, optional, default: false): If true, returns only the list of available knowledge base categories and their descriptions. Overrides other parameters. @@ -208,62 +210,138 @@ Retrieves AppleScript/JXA tips, examples, and runnable script details from the s - Search for tips related to "clipboard": `{ "toolName": "get_scripting_tips", "input": { "search_term": "clipboard" } }` -## Key Use Cases & Examples +### 3. `accessibility_query` - The UI X-Ray Vision + +Give your robot superhero powers to see and click ANY button in ANY app! This tool peers into the soul of macOS applications using the accessibility framework. Powered by the mystical `ax` binary, it's like having X-ray vision for user interfaces. + +The `ax` binary, and therefore this tool, can accept its JSON command input in multiple ways: +1. **Direct JSON String Argument:** If `ax` is invoked with a single command-line argument that is not a valid file path, it will attempt to parse this argument as a complete JSON string. +2. **File Path Argument:** If `ax` is invoked with a single command-line argument that is a valid file path, it will read the complete JSON command from this file. +3. **STDIN:** If `ax` is invoked with no command-line arguments, it will read the complete JSON command (which can be multi-line) from standard input. + +This tool exposes the complete macOS accessibility API capabilities, allowing detailed inspection of UI elements and their properties. It's particularly useful for automating interactions with applications that don't have robust AppleScript support or when you need to inspect the UI structure in detail. + +**Input Parameters:** + +* `command` (enum: 'query' | 'perform', required): The operation to perform. + * `query`: Retrieves information about UI elements. + * `perform`: Executes an action on a UI element (like clicking a button). + +* `locator` (object, required): Specifications to find the target element(s). + * `app` (string, required): The application to target, specified by either bundle ID or display name (e.g., "Safari", "com.apple.Safari"). + * `role` (string, required): The accessibility role of the target element (e.g., "AXButton", "AXStaticText"). + * `match` (object, required): Key-value pairs of attributes to match. Can be empty (`{}`) if not needed. + * `navigation_path_hint` (array of strings, optional): Path to navigate within the application hierarchy (e.g., `["window[1]", "toolbar[1]"]`). + +* `return_all_matches` (boolean, optional): When `true`, returns all matching elements rather than just the first match. Default is `false`. + +* `attributes_to_query` (array of strings, optional): Specific attributes to query for matched elements. If not provided, common attributes will be included. Examples: `["AXRole", "AXTitle", "AXValue"]` + +* `required_action_name` (string, optional): Filter elements to only those supporting a specific action (e.g., "AXPress" for clickable elements). + +* `action_to_perform` (string, optional, required when `command="perform"`): The accessibility action to perform on the matched element (e.g., "AXPress" to click a button). + +* `report_execution_time` (boolean, optional): If true, the tool will return an additional message containing the formatted script execution time. Defaults to false. + +* `limit` (integer, optional): Maximum number of lines to return in the output. Defaults to 500. Output will be truncated if it exceeds this limit. + +* `max_elements` (integer, optional): For `return_all_matches: true` queries, this specifies the maximum number of UI elements the `ax` binary will fully process and return attributes for. If omitted, an internal default (e.g., 200) is used. This helps manage performance when querying UIs with a very large number of matching elements (like numerous text fields on a complex web page). This is different from `limit`, which truncates the final text output based on lines. + +* `debug_logging` (boolean, optional): If true, enables detailed debug logging from the underlying `ax` binary. This diagnostic information will be included in the response, which can be helpful for troubleshooting complex queries or unexpected behavior. Defaults to false. + +* `output_format` (enum: 'smart' | 'verbose' | 'text_content', optional, default: 'smart'): Controls the format and verbosity of the attribute output from the `ax` binary. + * `'smart'`: (Default) Optimized for readability. Omits attributes with empty or placeholder values. Returns key-value pairs. + * `'verbose'`: Maximum detail. Includes all attributes, even empty/placeholders. Key-value pairs. Best for debugging element properties. + * `'text_content'`: Highly compact for text extraction. Returns only concatenated text values of common textual attributes (e.g., AXValue, AXTitle). No keys are returned. Ideal for quickly getting all text from elements; the `attributes_to_query` parameter is ignored in this mode. + +**Example Queries (Note: key names have changed to snake_case):** + +1. **Find all text elements in the front Safari window:** + ```json + { + "command": "query", + "return_all_matches": true, + "locator": { + "app": "Safari", + "role": "AXStaticText", + "match": {}, + "navigation_path_hint": ["window[1]"] + } + } + ``` + +2. **Find and click a button with a specific title:** + ```json + { + "command": "perform", + "locator": { + "app": "System Settings", + "role": "AXButton", + "match": {"AXTitle": "General"} + }, + "action_to_perform": "AXPress" + } + ``` + +3. **Get detailed information about the focused UI element:** + ```json + { + "command": "query", + "locator": { + "app": "Mail", + "role": "AXTextField", + "match": {"AXFocused": "true"} + }, + "attributes_to_query": ["AXRole", "AXTitle", "AXValue", "AXDescription", "AXHelp", "AXPosition", "AXSize"] + } + ``` + +**Note:** Using this tool requires that the application running this server has the necessary Accessibility permissions in macOS System Settings > Privacy & Security > Accessibility. -- **Application Control:** +## ๐ŸŽฎ Robot Playground: Cool Things Your New Robot Friend Can Do + +- **Application Control (Teaching Apps Who's Boss):** - Get the current URL from Safari: `{ "input": { "script_content": "tell application \"Safari\" to get URL of front document" } }` - Get subjects of unread emails in Mail: `{ "input": { "script_content": "tell application \"Mail\" to get subject of messages of inbox whose read status is false" } }` -- **File System Operations:** +- **File System Operations (Digital Housekeeping):** - List files on the Desktop: `{ "input": { "script_content": "tell application \"Finder\" to get name of every item of desktop" } }` - - Create a new folder: `{ "input": { "script_content": "tell application \"Finder\" to make new folder at desktop with properties {name:\"My New Folder\"}" } }` -- **System Interactions:** - - Display a system notification: `{ "input": { "script_content": "display notification \"Important Update!\" with title \"System Alert\"" } }` + - Create a new folder: `{ "input": { "script_content": "tell application \"Finder\" to make new folder at desktop with properties {name:\"Robot's Secret Stash\"}" } }` +- **System Interactions (Mac Mind Control):** + - Display a system notification: `{ "input": { "script_content": "display notification \"๐Ÿค– Beep boop! Task complete!\" with title \"Robot Report\"" } }` - Set system volume: `{ "input": { "script_content": "set volume output volume 50" } }` (0-100) - Get current clipboard content: `{ "input": { "script_content": "the clipboard" } }` -## Troubleshooting +## ๐Ÿ”ง When Robots Rebel (Troubleshooting) -- **Permissions Errors:** If scripts fail to control apps or perform UI actions, double-check Automation and Accessibility permissions in System Settings for the application running the MCP server (e.g., Terminal). -- **Script Syntax Errors:** `osascript` errors will be returned in the `stderr` or error message. Test complex scripts locally using Script Editor (for AppleScript) or a JXA runner first. -- **Timeouts:** If a script takes longer than `timeout_seconds` (default 60s), it will be terminated. Increase the timeout for long-running scripts. -- **File Not Found:** Ensure `script_path` is an absolute POSIX path accessible by the user running the MCP server. -- **Incorrect Output/JXA Issues:** For JXA scripts, especially those using Objective-C bridging, ensure `output_format_mode` is set to `'direct'` or `'auto'` (default). Using AppleScript-specific formatting flags like `human_readable` with JXA can cause errors. If AppleScript output is not parsing correctly, try `structured_output_and_error` or `structured_error`. +- **"Access Denied" Drama:** Your robot lacks permissions! Check System Settings > Privacy & Security. Give your Terminal the keys to the kingdom. +- **Script Syntax Sadness:** Even robots make typos. Test scripts in Script Editor first - it's like spell-check for automation. +- **Timeout Tantrums:** Some tasks take time. Increase `timeout_seconds` if your robot needs more than 60 seconds to complete its mission. +- **File Not Found Fiasco:** Robots need absolute paths, not relative ones. No shortcuts in robot land! +- **JXA Output Oddities:** JavaScript robots are picky. Use `output_format_mode: 'direct'` or let `'auto'` mode handle it. -## Configuration via Environment Variables +## ๐ŸŽ›๏ธ Robot Control Panel (Configuration) -- `LOG_LEVEL`: Set the logging level for the server. - - Values: `DEBUG`, `INFO`, `WARN`, `ERROR` - - Example: `LOG_LEVEL=DEBUG npx @steipete/macos-automator-mcp@latest` +Fine-tune your robot's behavior with these environment variables: -- `KB_PARSING`: Controls when the knowledge base (script tips) is parsed. - - Values: - - `lazy` (default): The knowledge base is parsed on the first request to `get_scripting_tips` or when a `kb_script_id` is used in `execute_script`. This allows for faster server startup. - - `eager`: The knowledge base is parsed when the server starts up. This may slightly increase startup time but ensures the KB is immediately available and any parsing errors are caught early. - - Example (when running via `start.sh` or similar): - ```bash - KB_PARSING=eager ./start.sh - ``` - - Example (when configuring via an MCP runner that supports `env`, like `mcp-agentify`): - ```json - { - "env": { - "LOG_LEVEL": "INFO", - "KB_PARSING": "eager" - } - } - ``` - -## For Developers +- **`LOG_LEVEL`**: How chatty should your robot be? + - `DEBUG`: Robot tells you EVERYTHING (TMI mode) + - `INFO`: Normal robot chatter + - `WARN`: Only important stuff + - `ERROR`: Silent mode (robot speaks only when things explode) + - Example: `LOG_LEVEL=DEBUG npx @steipete/macos-automator-mcp@latest` -For detailed instructions on local development, project structure (including the `knowledge_base`), and contribution guidelines, please see [DEVELOPMENT.md](DEVELOPMENT.md). +- **`KB_PARSING`**: When should the robot load its brain? + - `lazy` (default): Loads knowledge on-demand (fast startup, lazy robot) + - `eager`: Loads everything at startup (slower start, ready-to-go robot) + - Example: `KB_PARSING=eager ./start.sh` -## Development +## ๐Ÿ‘จโ€๐Ÿ”ฌ Robot Scientists Welcome! -See [DEVELOPMENT.md](./DEVELOPMENT.md) for details on the project structure, building, and testing. +Want to upgrade your robot? Check out [DEVELOPMENT.md](DEVELOPMENT.md) for the full technical manual on teaching new tricks to your automation assistant. -## Local Knowledge Base +## ๐Ÿง  Teach Your Robot New Tricks (Local Knowledge Base) -You can supplement the built-in knowledge base with your own local tips and shared handlers. Create a directory structure identical to the `knowledge_base` in this repository (or a subset of it). +Your robot can learn custom skills! Create your own automation recipes and watch your robot evolve. By default, the application will look for this local knowledge base at `~/.macos-automator/knowledge_base`. You can customize this path by setting the `LOCAL_KB_PATH` environment variable. @@ -287,81 +365,60 @@ Or, if you are running the validator script, you can use the `--local-kb-path` a This allows for personalization and extension of the available automation scripts and tips without modifying the core application files. -## Contributing +## ๐Ÿค Join the Robot Revolution! -Contributions are welcome! Please submit issues and pull requests to the [GitHub repository](https://github.com/steipete/macos-automator-mcp). +Found a bug? Got a cool automation idea? Your robot army needs YOU! Submit issues and pull requests to the [GitHub repository](https://github.com/steipete/macos-automator-mcp). -## Automation Capabilities +## ๐Ÿ’ช Robot Superpowers Showcase -This server provides powerful macOS automation capabilities through AppleScript and JavaScript for Automation (JXA). Here are some of the most useful examples: +Here's what your new silicon sidekick can do out of the box: -### Terminal Automation -- **Run commands in new Terminal tabs:** +### ๐Ÿ–ฅ๏ธ Terminal Tamer +- **Command Line Wizardry:** Open new tabs, run commands, capture output - your robot speaks fluent bash! ``` - { "input": { "kb_script_id": "terminal_app_run_command_new_tab", "input_data": { "command": "ls -la" } } } + { "input": { "kb_script_id": "terminal_app_run_command_new_tab", "input_data": { "command": "echo '๐Ÿค– Hello World!'" } } } ``` -- **Execute commands with sudo and provide password securely** -- **Capture command output for processing** -### Browser Control -- **Chrome/Safari automation:** - ``` - { "input": { "kb_script_id": "chrome_open_url_new_tab_profile", "input_data": { "url": "https://example.com", "profile_name": "Default" } } } - ``` +### ๐ŸŒ Browser Bot +- **Web Automation Master:** Control Chrome and Safari like a puppet master! ``` { "input": { "kb_script_id": "safari_get_front_tab_url" } } ``` -- **Execute JavaScript in browser context:** - ``` - { "input": { "kb_script_id": "chrome_execute_javascript", "input_data": { "javascript_code": "document.title" } } } - ``` -- **Extract page content, manipulate forms, and automate workflows** -- **Take screenshots of web pages** +- **JavaScript Injection:** Make web pages dance to your robot's tune +- **Screenshot Sniper:** Capture any webpage faster than you can say "cheese" -### System Interaction -- **Toggle system settings (dark mode, volume, network):** +### โš™๏ธ System Sorcerer +- **Dark Mode Toggle:** Because robots have sensitive optical sensors ``` { "input": { "kb_script_id": "systemsettings_toggle_dark_mode_ui" } } ``` -- **Get/set clipboard content:** - ``` - { "input": { "kb_script_id": "system_clipboard_get_file_paths" } } - ``` -- **Open/control system dialogs and alerts** -- **Create and manage system notifications** +- **Clipboard Commander:** Copy, paste, and manipulate like a pro +- **Notification Ninja:** Send alerts that actually get noticed -### File Operations -- **Create, move, and manipulate files/folders:** - ``` - { "input": { "kb_script_id": "finder_create_new_folder_desktop", "input_data": { "folder_name": "My Project" } } } +### ๐Ÿ“ File System Feng Shui +- **Folder Creator 3000:** Organize your digital life with robotic precision ``` -- **Read and write text files:** + { "input": { "kb_script_id": "finder_create_new_folder_desktop", "input_data": { "folder_name": "Robot Paradise" } } } ``` - { "input": { "kb_script_id": "fileops_read_text_file", "input_data": { "file_path": "~/Documents/notes.txt" } } } - ``` -- **List and filter files in directories** -- **Get file metadata and properties** +- **Text File Telepathy:** Read and write files faster than humanly possible -### Application Integration -- **Calendar/Reminders management:** - ``` - { "input": { "kb_script_id": "calendar_create_event", "input_data": { "title": "Meeting", "start_date": "2023-06-01 10:00", "end_date": "2023-06-01 11:00" } } } - ``` -- **Email automation with Mail.app:** - ``` - { "input": { "kb_script_id": "mail_send_email_direct", "input_data": { "recipient": "user@example.com", "subject": "Hello", "body_content": "Message content" } } } - ``` -- **Control music playback:** +### ๐Ÿ“ฑ App Whisperer +- **Calendar Conductor:** Schedule meetings while you sleep +- **Email Automator:** Send emails without lifting a finger +- **Music Maestro:** DJ your playlists programmatically ``` { "input": { "kb_script_id": "music_playback_controls", "input_data": { "action": "play" } } } ``` -- **Work with creative apps (Keynote, Pages, Numbers)** -Use the `get_scripting_tips` tool to explore all available automation capabilities organized by category. +๐ŸŽฏ **Pro Tip:** Use `get_scripting_tips` to discover all 200+ automation recipes! + +## ๐Ÿ“œ Legal Stuff (Robot Rights) + +This project is licensed under the MIT License - which means your robot is free to roam! See the [LICENSE](LICENSE) file for the fine print. -## License +--- -This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. +๐Ÿค– **Remember:** With great automation power comes great responsibility. Use your robot wisely! macOS Automator Server MCP server diff --git a/axorc/AXorcist b/axorc/AXorcist new file mode 120000 index 0000000..8b8cd9e --- /dev/null +++ b/axorc/AXorcist @@ -0,0 +1 @@ +/Users/steipete/Projects/CodeLooper/AXorcist \ No newline at end of file diff --git a/axorc/axorc_runner.sh b/axorc/axorc_runner.sh new file mode 100755 index 0000000..8b7c469 --- /dev/null +++ b/axorc/axorc_runner.sh @@ -0,0 +1,78 @@ +#!/bin/bash +# Simple runner for axorc, taking a JSON file as input. +# AXORC_PATH should be the path to your axorc executable. +# If not set, it defaults to a path relative to this script. + +# Determine the directory of this script +SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" + +# Set AXORC_PATH relative to the script's directory if not already set +: ${AXORC_PATH:="$SCRIPT_DIR/AXorcist/.build/debug/axorc"} + +# Check if AXORC_PATH exists and is executable +if [ ! -x "$AXORC_PATH" ]; then + echo "Error: axorc executable not found or not executable at $AXORC_PATH" + echo "Please set AXORC_PATH environment variable or ensure it's built at the default location." + exit 1 +fi + +DEBUG_FLAG="" +POSITIONAL_ARGS=() + +# Parse arguments for --debug and file/json payload +while [[ $# -gt 0 ]]; do + case "$1" in + --debug) + DEBUG_FLAG="--debug" + shift # past argument + ;; + --file) + if [[ -z "$2" || ! -f "$2" ]]; then + echo "Error: File not provided or not found after --file argument." + exit 1 + fi + INPUT_JSON=$(cat "$2") + USE_STDIN_FLAG=true + shift # past argument + shift # past value + ;; + --json) + if [[ -z "$2" ]]; then + echo "Error: JSON string not provided after --json argument." + exit 1 + fi + INPUT_JSON="$2" + USE_STDIN_FLAG=true + shift # past argument + shift # past value + ;; + *) + POSITIONAL_ARGS+=("$1") # unknown option will be captured if axorc supports more + shift # past argument + ;; + esac +done + +if [ -z "$INPUT_JSON" ]; then + echo "Error: No JSON input provided via --file or --json." + echo "Usage: $0 [--debug] --file /path/to/command.json OR $0 [--debug] --json '{"command":"ping"}'" + exit 1 +fi + +echo "--- DEBUG_RUNNER: INPUT_JSON content before piping --- BEGIN" +printf "%s\n" "$INPUT_JSON" +echo "--- DEBUG_RUNNER: INPUT_JSON content before piping --- END" +echo "--- DEBUG_RUNNER: AXORC_PATH: $AXORC_PATH" +echo "--- DEBUG_RUNNER: DEBUG_FLAG: $DEBUG_FLAG" + + +# Execute axorc with the input JSON +if [ "$USE_STDIN_FLAG" = true ]; then + printf '%s' "$INPUT_JSON" | "$AXORC_PATH" --stdin $DEBUG_FLAG "${POSITIONAL_ARGS[@]}" + AXORC_EXIT_CODE=$? + echo "--- DEBUG_RUNNER: axorc exit code: $AXORC_EXIT_CODE ---" +else + # This case should not be reached if --file or --json is mandatory + echo "Error: USE_STDIN_FLAG was not set, programming error in runner script." + exit 1 +fi diff --git a/knowledge_base/04_system/accessibility/accessibility_query.md b/knowledge_base/04_system/accessibility/accessibility_query.md new file mode 100644 index 0000000..252b615 --- /dev/null +++ b/knowledge_base/04_system/accessibility/accessibility_query.md @@ -0,0 +1,247 @@ +--- +title: 'macOS: Query UI Elements with Accessibility API' +category: 04_system +id: macos_accessibility_query +description: >- + Guide to using the accessibility_query tool to inspect and interact with UI elements + across any application using the macOS Accessibility API. +keywords: + - accessibility + - AX + - UI automation + - screen reader + - interface inspection + - user interface + - Safari + - buttons + - elements + - inspection + - UI testing + - macOS + - AXStaticText +language: javascript +isComplex: true +--- + +# Using the accessibility_query Tool + +The `accessibility_query` tool provides a way to inspect and interact with UI elements of any application on macOS by leveraging the native Accessibility API. This is particularly useful when you need to: + +1. Identify UI elements that aren't easily accessible through AppleScript or JXA +2. Extract text or other information from application UIs +3. Perform actions like clicking buttons or interacting with controls +4. Inspect the structure of application interfaces + +## How It Works + +The tool interfaces with the macOS Accessibility API framework, which is the same system that powers VoiceOver and other assistive technologies. It allows you to: + +- Query elements by their accessibility role and attributes +- Navigate through the UI hierarchy +- Retrieve detailed information about UI elements +- Perform actions on elements (like clicking) + +## Basic Usage + +The tool accepts JSON queries through the `accessibility_query` MCP tool. There are two main command types: + +1. `query` - Retrieve information about UI elements +2. `perform` - Execute an action on a UI element + +### Query Examples + +#### 1. Find all text in the frontmost Safari window: + +```json +{ + "cmd": "query", + "multi": true, + "locator": { + "app": "Safari", + "role": "AXStaticText", + "match": {}, + "pathHint": [ + "window[1]" + ] + }, + "attributes": [ + "AXRole", + "AXTitle", + "AXIdentifier", + "AXActions", + "AXPosition", + "AXSize", + "AXRoleDescription", + "AXLabel", + "AXTitleUIElement", + "AXHelp" + ] +} +``` + +#### 2. Find all clickable buttons in System Settings: + +```json +{ + "cmd": "query", + "multi": true, + "locator": { + "app": "System Settings", + "role": "AXButton", + "match": {}, + "pathHint": [ + "window[1]" + ] + }, + "requireAction": "AXPress" +} +``` + +#### 3. Find a specific button by title: + +```json +{ + "cmd": "query", + "locator": { + "app": "System Settings", + "role": "AXButton", + "match": { + "AXTitle": "General" + } + } +} +``` + +### Perform Examples + +#### 1. Click a button: + +```json +{ + "cmd": "perform", + "locator": { + "app": "System Settings", + "role": "AXButton", + "match": { + "AXTitle": "General" + } + }, + "action": "AXPress" +} +``` + +#### 2. Enter text in a text field: + +```json +{ + "cmd": "perform", + "locator": { + "app": "TextEdit", + "role": "AXTextField", + "match": { + "AXFocused": "true" + } + }, + "action": "AXSetValue", + "value": "Hello, world!" +} +``` + +## Advanced Usage + +### Finding Elements with `pathHint` + +The `pathHint` parameter helps navigate to a specific part of the UI hierarchy. Each entry has the format `"elementType[index]"` where index is 1-based: + +```json +"pathHint": ["window[1]", "toolbar[1]", "group[3]"] +``` + +This navigates to the first window, then its toolbar, then the third group within that toolbar. + +### Filtering with `requireAction` + +Use `requireAction` to only find elements that support a specific action: + +```json +"requireAction": "AXPress" +``` + +This will only return elements that can be clicked/pressed. + +### Common Accessibility Roles + +Here are some common accessibility roles you can use in queries: + +- `AXButton` - Buttons +- `AXStaticText` - Text labels +- `AXTextField` - Editable text fields +- `AXCheckBox` - Checkboxes +- `AXRadioButton` - Radio buttons +- `AXPopUpButton` - Dropdown buttons +- `AXMenu` - Menus +- `AXMenuItem` - Menu items +- `AXWindow` - Windows +- `AXScrollArea` - Scrollable areas +- `AXList` - Lists +- `AXTable` - Tables +- `AXLink` - Links (in web content) +- `AXImage` - Images + +### Common Accessibility Actions + +- `AXPress` - Click/press an element +- `AXShowMenu` - Show a contextual menu +- `AXDecrement` - Decrease a value (e.g., in a stepper) +- `AXIncrement` - Increase a value +- `AXPickerCancel` - Cancel a picker +- `AXCancel` - Cancel an operation +- `AXConfirm` - Confirm an operation + +## Troubleshooting + +### No Elements Found + +If you're not finding elements: + +1. Verify the application is running +2. Try using more general queries first, then narrow down +3. Make sure you're using the correct accessibility role +4. Try listing all windows with `"role": "AXWindow"` to see what's available + +### Permission Issues + +Ensure that the application running this tool has Accessibility permissions in System Settings > Privacy & Security > Accessibility. + +## Technical Notes + +- The accessibility interface runs in the background, so it doesn't interrupt your normal application usage +- For web content in browsers, web-specific accessibility attributes are available +- Some applications may have non-standard accessibility implementations +- The tool uses the Swift AXUIElement framework to interact with the accessibility API + +## Example: Extracting Text from a PDF in Preview + +```json +{ + "cmd": "query", + "multi": true, + "locator": { + "app": "Preview", + "role": "AXStaticText", + "match": {}, + "pathHint": [ + "window[1]", + "AXScrollArea[1]" + ] + }, + "attributes": [ + "AXValue", + "AXRole", + "AXPosition", + "AXSize" + ] +} +``` + +This query extracts all text elements from a PDF document open in Preview, along with their positions and sizes on the page. \ No newline at end of file diff --git a/package.json b/package.json index 03a2b52..5678d84 100644 --- a/package.json +++ b/package.json @@ -10,11 +10,13 @@ "files": [ "dist/**/*", "knowledge_base/**/*", + "axorc/axorc_runner.sh", + "axorc/axorc", "README.md", "LICENSE" ], "scripts": { - "build": "tsc", + "build": "tsc && (cd axorc/AXorcist && swift build -c release || echo 'Swift build failed, using fallback binary') && mkdir -p dist/axorc && (cp axorc/AXorcist/.build/release/axorc dist/axorc/axorc 2>/dev/null || cp axorc/axorc dist/axorc/axorc) && cp axorc/axorc_runner.sh dist/axorc/axorc_runner.sh && chmod +x dist/axorc/axorc_runner.sh dist/axorc/axorc", "dev": "tsx src/server.ts", "start": "node dist/server.js", "lint": "eslint . --ext .ts", diff --git a/src/AXQueryExecutor.ts b/src/AXQueryExecutor.ts new file mode 100644 index 0000000..1decf7d --- /dev/null +++ b/src/AXQueryExecutor.ts @@ -0,0 +1,196 @@ +// AXQueryExecutor.ts - Execute commands against the AX accessibility utility + +import path from 'node:path'; +import { spawn } from 'node:child_process'; +import { fileURLToPath } from 'node:url'; +import { Logger } from './logger.js'; +import type { AXQueryInput } from './schemas.js'; // Import AXQueryInput type + +// Get the directory of the current module +const __filename = fileURLToPath(import.meta.url); +const __dirname = path.dirname(__filename); +const logger = new Logger('AXQueryExecutor'); + +export interface AXQueryExecutionResult { + result: Record; + execution_time_seconds: number; + debug_logs?: string[]; +} + +export class AXQueryExecutor { + private axUtilityPath: string; + private scriptPath: string; + + constructor() { + // Determine if running from source or dist to set the correct base path + // __dirname will be like /path/to/project/src or /path/to/project/dist/src + const isProdBuild = __dirname.includes(path.join(path.sep, 'dist', path.sep)); + + if (isProdBuild) { + // In production (dist), axorc_runner.sh and axorc binary are directly in dist/ + // So, utility path is one level up from dist/src (i.e., dist/) + this.axUtilityPath = path.resolve(__dirname, '..'); + } else { + // In development (src), axorc_runner.sh and axorc binary are in project_root/axorc/ + // So, utility path is one level up from src/ and then into axorc/ + this.axUtilityPath = path.resolve(__dirname, '..', 'axorc'); + } + + this.scriptPath = path.join(this.axUtilityPath, 'axorc_runner.sh'); + logger.debug('AXQueryExecutor initialized', { + isProdBuild, + axUtilityPath: this.axUtilityPath, + scriptPath: this.scriptPath + }); + } + + /** + * Execute a query against the AX utility + * @param queryData The query to execute + * @returns The result of the query + */ + async execute(queryData: AXQueryInput): Promise { + logger.debug('Executing AX query with input:', queryData); + const startTime = Date.now(); + + // Map to the keys expected by the Swift binary + const mappedQueryData = { + cmd: queryData.command, + multi: queryData.return_all_matches, + locator: { + app: (queryData.locator as { app: string }).app, + role: (queryData.locator as { role: string }).role, + match: (queryData.locator as { match: Record }).match, + pathHint: (queryData.locator as { navigation_path_hint?: string[] }).navigation_path_hint, + }, + attributes: queryData.attributes_to_query, + requireAction: queryData.required_action_name, + action: queryData.action_to_perform, + // report_execution_time is not sent to the Swift binary + debug_logging: queryData.debug_logging, + max_elements: queryData.max_elements, + output_format: queryData.output_format + }; + logger.debug('Mapped AX query for Swift binary:', mappedQueryData); + + return new Promise((resolve, reject) => { + try { + // Get the query string from the mapped data + const queryString = JSON.stringify(mappedQueryData) + '\n'; + + logger.debug('Running AX utility through wrapper script', { path: this.scriptPath }); + logger.debug('Query to run: ', { query: queryString}); + + // Run the script with wrapper that handles SIGTRAP + const process = spawn(this.scriptPath, [], { + cwd: this.axUtilityPath, + stdio: ['pipe', 'pipe', 'pipe'] + }); + + let stdoutData = ''; + let stderrData = ''; + + // Listen for stdout + process.stdout.on('data', (data) => { + const str = data.toString(); + logger.debug('AX utility stdout:', { data: str }); + stdoutData += str; + }); + + // Listen for stderr + process.stderr.on('data', (data) => { + const str = data.toString(); + logger.debug('AX utility stderr:', { data: str }); + stderrData += str; + }); + + // Handle process errors + process.on('error', (error) => { + logger.error('Process error:', { error }); + const endTime = Date.now(); + const execution_time_seconds = parseFloat(((endTime - startTime) / 1000).toFixed(3)); + const errorToReject = new Error(`Process error: ${error.message}`) as Error & { execution_time_seconds?: number }; + errorToReject.execution_time_seconds = execution_time_seconds; + reject(errorToReject); + }); + + // Handle process exit + process.on('exit', (code, signal) => { + logger.debug('Process exited:', { code, signal }); + const endTime = Date.now(); + const execution_time_seconds = parseFloat(((endTime - startTime) / 1000).toFixed(3)); + + // Check for log file if we had issues + if (code !== 0 || signal) { + logger.debug('Checking log file for more information'); + try { + // We won't actually read it here, but we'll mention it in the error + const logPath = path.join(this.axUtilityPath, 'axorc_runner.log'); + stderrData += `\nCheck log file at ${logPath} for more details.`; + } catch { + // Ignore errors reading the log + } + } + + // If we got any JSON output, try to parse it + if (stdoutData.trim()) { + try { + const parsedJson = JSON.parse(stdoutData) as (Record & { debug_logs?: string[] }); + // Separate the core result from potential debug_logs + const { debug_logs, ...coreResult } = parsedJson; + return resolve({ result: coreResult, execution_time_seconds, debug_logs }); + } catch (error) { + logger.error('Failed to parse JSON output', { error, stdout: stdoutData }); + // Fall through to error handling below if JSON parsing fails + } + } + + let errorMessage = ''; + if (signal) { + errorMessage = `Process terminated by signal ${signal}: ${stderrData}`; + } else if (code !== 0) { + errorMessage = `Process exited with code ${code}: ${stderrData}`; + } else { + // Attempt to parse stderr as JSON ErrorResponse if stdout was empty but exit was 0 + try { + const errorJson = JSON.parse(stderrData.split('\n').filter(line => line.startsWith("{\"error\":")).join('') || stderrData); + if (errorJson.error) { + errorMessage = `AX tool reported error: ${errorJson.error}`; + const errorToReject = new Error(errorMessage) as Error & { execution_time_seconds?: number; debug_logs?: string[] }; + errorToReject.execution_time_seconds = execution_time_seconds; + errorToReject.debug_logs = errorJson.debug_logs; // Capture debug logs from error JSON + return reject(errorToReject); + } + } catch { + // stderr was not a JSON error response, proceed with generic message + } + errorMessage = `Process completed but no valid JSON output on stdout. Stderr: ${stderrData}`; + } + const errorToReject = new Error(errorMessage) as Error & { execution_time_seconds?: number; debug_logs?: string[] }; + errorToReject.execution_time_seconds = execution_time_seconds; + // If stderrData might contain our JSON error object with debug_logs, try to parse it + try { + const errorJson = JSON.parse(stderrData.split('\n').filter(line => line.startsWith("{\"error\":")).join('') || stderrData); + if (errorJson.debug_logs) { + errorToReject.debug_logs = errorJson.debug_logs; + } + } catch { /* ignore if stderr is not our JSON error */ } + reject(errorToReject); + }); + + // Write the query to stdin and close + logger.debug('Sending query to AX utility:', { query: queryString }); + process.stdin.write(queryString); + process.stdin.end(); + + } catch (error) { + logger.error('Failed to execute AX utility:', { error }); + const endTime = Date.now(); + const execution_time_seconds = parseFloat(((endTime - startTime) / 1000).toFixed(3)); + const errorToReject = new Error(`Failed to execute AX utility: ${error instanceof Error ? error.message : String(error)}`) as Error & { execution_time_seconds?: number }; + errorToReject.execution_time_seconds = execution_time_seconds; + reject(errorToReject); + } + }); + } +} \ No newline at end of file diff --git a/src/schemas.ts b/src/schemas.ts index 108b8b7..907d166 100644 --- a/src/schemas.ts +++ b/src/schemas.ts @@ -65,5 +65,95 @@ export const GetScriptingTipsInputSchema = z.object({ export type GetScriptingTipsInput = z.infer; +// AX Query Input Schema +export const AXQueryInputSchema = z.object({ + command: z.enum(['query', 'perform']).describe('The operation to perform. (Formerly cmd)'), + + // Fields for lenient parsing if locator is flattened + app: z.string().optional().describe('Top-level app name (used if locator is a string and app is not specified within a locator object)'), + role: z.string().optional().describe('Top-level role (used if locator is a string/flattened)'), + match: z.record(z.string()).optional().describe('Top-level match (used if locator is a string/flattened)'), + + locator: z.union([ + z.object({ + app: z.string().describe('Bundle ID or display name of the application to query'), + role: z.string().describe('Accessibility role to match, e.g., "AXButton", "AXStaticText"'), + match: z.record(z.string()).describe('Attributes to match for the element'), + navigation_path_hint: z.array(z.string()).optional().describe('Optional path to navigate within the application hierarchy, e.g., ["window[1]", "toolbar[1]"]. (Formerly pathHint)'), + }), + z.string().describe('Bundle ID or display name of the application to query (used if role/match are provided at top level and this string serves as the app name)') + ]).describe('Specifications to find the target element(s). Can be a full locator object or just an app name string (if role/match are top-level).'), + + return_all_matches: z.boolean().optional().describe('When true, returns all matching elements rather than just the first match. Default is false. (Formerly multi)'), + attributes_to_query: z.array(z.string()).optional().describe('Attributes to query for matched elements. If not provided, common attributes will be included. (Formerly attributes)'), + required_action_name: z.string().optional().describe('Filter elements to only those supporting this action, e.g., "AXPress". (Formerly requireAction)'), + action_to_perform: z.string().optional().describe('Only used with command: "perform" - The action to perform on the matched element. (Formerly action)'), + report_execution_time: z.boolean().optional().default(false).describe( + 'If true, the tool will return an additional message containing the formatted script execution time. Defaults to false.', + ), + limit: z.number().int().positive().optional().default(500).describe( + 'Maximum number of lines to return in the output. Defaults to 500. Output will be truncated if it exceeds this limit.' + ), + max_elements: z.number().int().positive().optional().describe( + 'For return_all_matches: true queries, specifies the maximum number of UI elements to fully process and return. If omitted, a default (e.g., 200) is used internally by the ax binary. Helps control performance for very large result sets.' + ), + debug_logging: z.boolean().optional().default(false).describe( + 'If true, enables detailed debug logging from the ax binary, which will be returned as part of the response. Defaults to false.' + ), + output_format: z.enum(['smart', 'verbose', 'text_content']).optional().default('smart').describe( + "Controls the format and verbosity of the attribute output. \n" + + "'smart': (Default) Omits empty/placeholder values. Key-value pairs. \n" + + "'verbose': Includes all attributes, even empty/placeholders. Key-value pairs. Useful for debugging. \n" + + "'text_content': Returns only concatenated text values of common textual attributes (e.g., AXValue, AXTitle, AXDescription). No keys. Ideal for fast text extraction." + ) +}).refine( + (data) => { + // If command is 'perform', action_to_perform must be provided + return data.command !== 'perform' || (!!data.action_to_perform); + }, + { + message: "When command is 'perform', an action_to_perform must be provided", + path: ["action_to_perform"], + } +).superRefine((data, ctx) => { + if (typeof data.locator === 'string') { // Case 1: locator is a string (app name) + if (data.role === undefined) { + ctx.addIssue({ + code: z.ZodIssueCode.custom, + message: "If 'locator' is a string (app name), top-level 'role' must be provided.", + path: ['role'], // Path refers to the top-level role + }); + } + // data.match will default to {} if undefined later in the handler + // data.app (top-level) is ignored if data.locator (string) is present, as the locator string *is* the app name. + } else { // Case 2: locator is an object + // Ensure top-level app, role, match are not present if locator is a full object, to avoid ambiguity. + // This is a stricter interpretation. Alternatively, we could prioritize the locator object's fields. + if (data.app !== undefined) { + ctx.addIssue({ + code: z.ZodIssueCode.custom, + message: "Top-level 'app' should not be provided if 'locator' is a detailed object. Define 'app' inside the 'locator' object.", + path: ['app'], + }); + } + if (data.role !== undefined) { + ctx.addIssue({ + code: z.ZodIssueCode.custom, + message: "Top-level 'role' should not be provided if 'locator' is a detailed object. Define 'role' inside the 'locator' object.", + path: ['role'], + }); + } + if (data.match !== undefined) { + ctx.addIssue({ + code: z.ZodIssueCode.custom, + message: "Top-level 'match' should not be provided if 'locator' is a detailed object. Define 'match' inside the 'locator' object.", + path: ['match'], + }); + } + } +}); + +export type AXQueryInput = z.infer; + // Output is always { content: [{ type: "text", text: "string_output" }] } // No specific Zod schema needed for output beyond what MCP SDK handles. \ No newline at end of file diff --git a/src/server.ts b/src/server.ts index 2c031cc..5f551e2 100644 --- a/src/server.ts +++ b/src/server.ts @@ -8,8 +8,9 @@ import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js' import * as sdkTypes from '@modelcontextprotocol/sdk/types.js'; // import { ZodError } from 'zod'; // ZodError is not directly used from here, handled by SDK or refined errors import { Logger } from './logger.js'; -import { ExecuteScriptInputSchema, GetScriptingTipsInputSchema } from './schemas.js'; +import { ExecuteScriptInputSchema, GetScriptingTipsInputSchema, AXQueryInputSchema, type AXQueryInput } from './schemas.js'; import { ScriptExecutor } from './ScriptExecutor.js'; +import { AXQueryExecutor } from './AXQueryExecutor.js'; import type { ScriptExecutionError, ExecuteScriptResponse } from './types.js'; // import pkg from '../package.json' with { type: 'json' }; // Import package.json // REMOVED import { getKnowledgeBase, getScriptingTipsService, conditionallyInitializeKnowledgeBase } from './services/knowledgeBaseService.js'; // Import KB functions @@ -47,6 +48,7 @@ const serverInfoMessage = `MacOS Automator MCP v${pkg.version}, started at ${SER const logger = new Logger('macos_automator_server'); const scriptExecutor = new ScriptExecutor(); +const axQueryExecutor = new AXQueryExecutor(); // Define raw shapes for tool registration (required by newer SDK versions) const ExecuteScriptInputShape = { @@ -71,6 +73,32 @@ const GetScriptingTipsInputShape = { limit: z.number().int().positive().optional(), } as const; +const AXQueryInputShape = { + command: z.enum(['query', 'perform']), + // Top-level fields for lenient parsing + app: z.string().optional(), + role: z.string().optional(), + match: z.record(z.string()).optional(), + + locator: z.union([ + z.object({ + app: z.string(), + role: z.string(), + match: z.record(z.string()), + navigation_path_hint: z.array(z.string()).optional(), + }), + z.string() + ]), + return_all_matches: z.boolean().optional(), + attributes_to_query: z.array(z.string()).optional(), + required_action_name: z.string().optional(), + action_to_perform: z.string().optional(), + report_execution_time: z.boolean().optional().default(false), + limit: z.number().int().positive().optional().default(500), + debug_logging: z.boolean().optional().default(false), + output_format: z.enum(['smart', 'verbose', 'text_content']).optional().default('smart'), +} as const; + async function main() { if (!IS_E2E_TESTING) { logger.info("[Server Startup] Current working directory", { cwd: process.cwd() }); @@ -396,6 +424,197 @@ async function main() { } ); + // ADD THE NEW accessibility_query TOOL HERE + server.tool( + 'accessibility_query', + `Query and interact with the macOS accessibility interface to inspect UI elements of applications. This tool provides a powerful way to explore and manipulate the user interface elements of any application using the native macOS accessibility framework. + +This tool exposes the complete macOS accessibility API capabilities, allowing detailed inspection of UI elements and their properties. It's particularly useful for automating interactions with applications that don't have robust AppleScript support or when you need to inspect the UI structure in detail. + +**Input Parameters:** + +* \`command\` (enum: 'query' | 'perform', required): The operation to perform. + * \`query\`: Retrieves information about UI elements. + * \`perform\`: Executes an action on a UI element (like clicking a button). + +* \`locator\` (object, required): Specifications to find the target element(s). + * \`app\` (string, required): The application to target, specified by either bundle ID or display name (e.g., "Safari", "com.apple.Safari"). + * \`role\` (string, required): The accessibility role of the target element (e.g., "AXButton", "AXStaticText"). + * \`match\` (object, required): Key-value pairs of attributes to match. Can be empty (\`{}\`) if not needed. + * \`navigation_path_hint\` (array of strings, optional): Path to navigate within the application hierarchy (e.g., \`["window[1]", "toolbar[1]"]\`). + +* \`return_all_matches\` (boolean, optional): When \`true\`, returns all matching elements rather than just the first match. Default is \`false\`. + +* \`attributes_to_query\` (array of strings, optional): Specific attributes to query for matched elements. If not provided, common attributes will be included. Examples: \`["AXRole", "AXTitle", "AXValue"]\` + +* \`required_action_name\` (string, optional): Filter elements to only those supporting a specific action (e.g., "AXPress" for clickable elements). + +* \`action_to_perform\` (string, optional, required when \`command="perform"\`): The accessibility action to perform on the matched element (e.g., "AXPress" to click a button). + +* \`report_execution_time\` (boolean, optional): If true, the tool will return an additional message containing the formatted script execution time. Defaults to false. + +* \`limit\` (integer, optional): Maximum number of lines to return in the output. Defaults to 500. Output will be truncated if it exceeds this limit. + +* \`max_elements\` (integer, optional): For \`return_all_matches: true\` queries, this specifies the maximum number of UI elements the \`ax\` binary will fully process and return attributes for. If omitted, an internal default (e.g., 200) is used. This helps manage performance when querying UIs with a very large number of matching elements (like numerous text fields on a complex web page). This is different from \`limit\`, which truncates the final text output based on lines. + +* \`debug_logging\` (boolean, optional): If true, enables detailed debug logging from the underlying \`ax\` binary. This diagnostic information will be included in the response, which can be helpful for troubleshooting complex queries or unexpected behavior. Defaults to false. + +* \`output_format\` (enum: 'smart' | 'verbose' | 'text_content', optional, default: 'smart'): Controls the format and verbosity of the attribute output from the \`ax\` binary. + * \`'smart'\`: (Default) Optimized for readability. Omits attributes with empty or placeholder values. Returns key-value pairs. + * \`'verbose'\`: Maximum detail. Includes all attributes, even empty/placeholders. Key-value pairs. Best for debugging element properties. + * \`'text_content'\`: Highly compact for text extraction. Returns only concatenated text values of common textual attributes (e.g., AXValue, AXTitle). No keys are returned. Ideal for quickly getting all text from elements; the \`attributes_to_query\` parameter is ignored in this mode. + +**Example Queries (Note: key names have changed to snake_case):** + +1. **Find all text elements in the front Safari window:** + \`\`\`json + { + "command": "query", + "return_all_matches": true, + "locator": { + "app": "Safari", + "role": "AXStaticText", + "match": {}, + "navigation_path_hint": ["window[1]"] + } + } + \`\`\` + +2. **Find and click a button with a specific title:** + \`\`\`json + { + "command": "perform", + "locator": { + "app": "System Settings", + "role": "AXButton", + "match": {"AXTitle": "General"} + }, + "action_to_perform": "AXPress" + } + \`\`\` + +3. **Get detailed information about the focused UI element:** + \`\`\`json + { + "command": "query", + "locator": { + "app": "Mail", + "role": "AXTextField", + "match": {"AXFocused": "true"} + }, + "attributes_to_query": ["AXRole", "AXTitle", "AXValue", "AXDescription", "AXHelp", "AXPosition", "AXSize"] + } + \`\`\` + +**Note:** Using this tool requires that the application running this server has the necessary Accessibility permissions in macOS System Settings > Privacy & Security > Accessibility.`, + AXQueryInputShape, + async (args: unknown) => { + let inputFromZod: AXQueryInput; + try { + inputFromZod = AXQueryInputSchema.parse(args); + logger.info('accessibility_query called with raw Zod-parsed input:', inputFromZod); + + // Normalize the input to the canonical structure AXQueryExecutor expects + let canonicalInput: AXQueryInput; + + if (typeof inputFromZod.locator === 'string') { + logger.debug('Normalizing malformed input (locator is string). Top-level data:', { appLocatorString: inputFromZod.locator, role: inputFromZod.role, match: inputFromZod.match }); + // Zod superRefine should have already ensured inputFromZod.role is defined. + // The top-level inputFromZod.app is ignored here because inputFromZod.locator (the string) is the app. + canonicalInput = { + // Spread all other fields from inputFromZod first + ...inputFromZod, + // Then explicitly define the locator object + locator: { + app: inputFromZod.locator, // The string locator is the app name + role: inputFromZod.role!, // Role from top level (assert non-null due to Zod refine) + match: inputFromZod.match || {}, // Match from top level, or default to empty + navigation_path_hint: undefined // No path hint in this malformed case typically + }, + // Nullify the top-level fields that are now part of the canonical locator + // to avoid confusion if they were passed, though AXQueryExecutor won't use them. + app: undefined, + role: undefined, + match: undefined + }; + } else { + // Well-formed case: locator is an object. Zod superRefine ensures top-level app/role/match are undefined. + logger.debug('Input is well-formed (locator is object).'); + canonicalInput = inputFromZod; + } + + // logger.info('accessibility_query using canonical input for executor:', JSON.parse(JSON.stringify(canonicalInput))); // Commented out due to persistent linter issue + + const result = await axQueryExecutor.execute(canonicalInput); + + // For cleaner output, especially for multi-element queries, format the response + let formattedOutput: string; + + if (inputFromZod.command === 'query' && inputFromZod.return_all_matches === true) { + // For multi-element queries, format the results more readably + if ('elements' in result) { + formattedOutput = JSON.stringify(result, null, 2); + } else { + formattedOutput = JSON.stringify(result, null, 2); + } + } else { + // For single element queries or perform actions + formattedOutput = JSON.stringify(result, null, 2); + } + + // Apply line limit + let finalOutputText = formattedOutput; + const lines = finalOutputText.split('\n'); + if (inputFromZod.limit !== undefined && lines.length > inputFromZod.limit) { + finalOutputText = lines.slice(0, inputFromZod.limit).join('\n'); + const truncationNotice = `\n\n--- Output truncated to ${inputFromZod.limit} lines. Original length was ${lines.length} lines. ---`; + finalOutputText += truncationNotice; + } + + const responseContent: Array<{ type: 'text'; text: string }> = [{ type: 'text', text: finalOutputText }]; + + // Add debug logs if they exist in the result + if (result.debug_logs && Array.isArray(result.debug_logs) && result.debug_logs.length > 0) { + const debugHeader = "\n\n--- AX Binary Debug Logs ---"; + const logsString = result.debug_logs.join('\n'); + responseContent.push({ type: 'text', text: `${debugHeader}\n${logsString}` }); + } + + if (inputFromZod.report_execution_time) { + const ms = result.execution_time_seconds * 1000; + let timeMessage = "Script executed in "; + if (ms < 1) { // Less than 1 millisecond + timeMessage += "<1 millisecond."; + } else if (ms < 1000) { // 1ms up to 999ms + timeMessage += `${ms.toFixed(0)} milliseconds.`; + } else if (ms < 60000) { // 1 second up to 59.999 seconds + timeMessage += `${(ms / 1000).toFixed(2)} seconds.`; + } else { + const totalSeconds = ms / 1000; + const minutes = Math.floor(totalSeconds / 60); + const remainingSeconds = Math.round(totalSeconds % 60); + timeMessage += `${minutes} minute(s) and ${remainingSeconds} seconds.`; + } + responseContent.push({ type: 'text', text: `${timeMessage}` }); + } + + return { content: responseContent }; + } catch (error: unknown) { + const err = error as Error; + logger.error('Error in accessibility_query tool handler', { message: err.message }); + // If the error object from AXQueryExecutor contains debug_logs, include them + let errorMessage = `Failed to execute accessibility query: ${err.message}`; + const errorWithLogs = err as (Error & { debug_logs?: string[] }); // Cast here + if (errorWithLogs.debug_logs && Array.isArray(errorWithLogs.debug_logs) && errorWithLogs.debug_logs.length > 0) { + const debugHeader = "\n\n--- AX Binary Debug Logs (from error) ---"; + const logsString = errorWithLogs.debug_logs.join('\n'); + errorMessage += `\n${debugHeader}\n${logsString}`; + } + throw new sdkTypes.McpError(sdkTypes.ErrorCode.InternalError, errorMessage); + } + } + ); + const transport = new StdioServerTransport(); await server.connect(transport); diff --git a/start.sh b/start.sh index 8d08786..1b6849c 100755 --- a/start.sh +++ b/start.sh @@ -2,6 +2,7 @@ # start.sh export LOG_LEVEL="${LOG_LEVEL:-INFO}" +export PATH="/Users/mitsuhiko/.volta/bin:$PATH" SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" PROJECT_ROOT="$SCRIPT_DIR" @@ -9,10 +10,16 @@ PROJECT_ROOT="$SCRIPT_DIR" DIST_SERVER_JS="$PROJECT_ROOT/dist/server.js" SRC_SERVER_TS="$PROJECT_ROOT/src/server.ts" +# IMPORTANT: Running from dist/ is strongly recommended +# There are module resolution issues with tsx/ESM when running from src/ directly +# If changes are needed, use `npm run build` to compile before running if [ -f "$DIST_SERVER_JS" ]; then - # echo "INFO: Compiled version found. Running from dist/server.js" >&2 # Silenced + echo "INFO: Using compiled version (dist/server.js)" >&2 exec node "$DIST_SERVER_JS" else + echo "WARN: Compiled version not found. This may cause module resolution issues." >&2 + echo "WARN: Consider running 'npm run build' first." >&2 + # echo "INFO: Making sure tsx is available..." >&2 # Silenced if ! command -v tsx &> /dev/null && ! [ -f "$PROJECT_ROOT/node_modules/.bin/tsx" ]; then echo "WARN: tsx command not found locally or globally. Attempting to install via npm..." >&2 @@ -35,4 +42,4 @@ else # echo "INFO: Running from src/server.ts using global tsx" >&2 # Silenced exec tsx "$SRC_SERVER_TS" fi -fi \ No newline at end of file +fi