|
| 1 | +--- |
| 2 | +description: |
| 3 | +globs: |
| 4 | +alwaysApply: false |
| 5 | +--- |
| 6 | +# macOS Accessibility (`ax`) Binary Rules & Knowledge |
| 7 | + |
| 8 | +This document outlines the functionality, build process, testing procedures, and technical details of the `ax` Swift command-line utility, designed for interacting with the macOS Accessibility framework. |
| 9 | + |
| 10 | +## 1. `ax` Binary Overview |
| 11 | + |
| 12 | +* **Purpose**: Provides a JSON-based interface to query UI elements and perform actions using the macOS Accessibility API. It's intended to be called by other processes (like the MCP server). |
| 13 | +* **Communication**: Operates by reading JSON commands from `stdin` and writing JSON responses (or errors) to `stdout` (or `stderr` for errors). |
| 14 | +* **Core Commands**: |
| 15 | + * `query`: Retrieves information about UI elements. |
| 16 | + * `perform`: Executes an action on a UI element. |
| 17 | +* **Key Input Fields (JSON)**: |
| 18 | + * `cmd` (string): "query" or "perform". |
| 19 | + * `locator` (object): Specifies the target element(s). |
| 20 | + * `app` (string): Bundle ID or localized name of the target application (e.g., "com.apple.TextEdit", "Safari"). |
| 21 | + * `role` (string): The accessibility role of the target element (e.g., "AXWindow", "AXButton", "*"). |
| 22 | + * `match` (object): Key-value pairs of attributes to match (e.g., `{"AXMain": "true"}`). Values are strings. |
| 23 | + * `pathHint` (array of strings, optional): A path to navigate the UI tree (e.g., `["window[1]", "toolbar[1]"]`). |
| 24 | + * `attributes` (array of strings, optional): For `query`, specific attributes to retrieve. Defaults to a common set if omitted. |
| 25 | + * `action` (string, optional): For `perform`, the action to execute (e.g., "AXPress"). |
| 26 | + * `multi` (boolean, optional): For `query`, if `true`, returns all matching elements. Defaults to `false`. |
| 27 | + * `requireAction` (string, optional): For `query`, filters results to elements supporting a specific action. |
| 28 | + * `debug_logging` (boolean, optional): If `true`, includes detailed internal debug logs in the response. |
| 29 | +* **Key Output Fields (JSON)**: |
| 30 | + * Success (`query`): `{ "attributes": { "AXTitle": "...", ... } }` |
| 31 | + * Success (`query`, `multi: true`): `{ "elements": [ { "AXTitle": "..." }, ... ] }` |
| 32 | + * Success (`perform`): `{ "status": "ok" }` |
| 33 | + * Error: `{ "error": "Error message description" }` |
| 34 | + * `debug_logs` (array of strings, optional): Included in success or error responses if `debug_logging` was true. |
| 35 | + |
| 36 | +## 2. Functionality - How it Works |
| 37 | + |
| 38 | +The `ax` binary is implemented in Swift in `ax/Sources/AXHelper/main.swift`. |
| 39 | + |
| 40 | +* **Application Targeting**: |
| 41 | + * `getApplicationElement(bundleIdOrName: String)`: This function is the entry point to an application's accessibility tree. |
| 42 | + * It first tries to find the application using its bundle identifier (e.g., "com.apple.Safari") via `NSRunningApplication.runningApplications(withBundleIdentifier:)`. |
| 43 | + * If not found, it iterates through all running applications and attempts to match by the application's localized name (e.g., "Safari") via `NSRunningApplication.localizedName`. |
| 44 | + * Once the `NSRunningApplication` instance is found, `AXUIElementCreateApplication(pid)` is used to get the root `AXUIElement` for that application. |
| 45 | + |
| 46 | +* **Element Location**: |
| 47 | + * **`search(element:locator:depth:maxDepth:)`**: |
| 48 | + * Used for single-element queries (when `multi` is `false` or not set). |
| 49 | + * Performs a depth-first search starting from a given `element` (usually the application element or one found via `pathHint`). |
| 50 | + * It checks if an element's `AXRole` matches `locator.role`. |
| 51 | + * Then, it verifies that all attribute-value pairs in `locator.match` correspond to the element's actual attributes. This matching logic handles: |
| 52 | + * **Boolean attributes** (e.g., `AXMain`, `AXFocused`): Compares against string "true" or "false". |
| 53 | + * **Numeric attributes**: Attempts to parse `wantStr` (from `locator.match`) as an `Int` and compares numerically. |
| 54 | + * **String attributes**: Performs direct string comparison. |
| 55 | + * If a match is found, the `AXUIElement` is returned. Otherwise, it recursively searches children. |
| 56 | + * **`collectAll(element:locator:requireAction:hits:depth:maxDepth:)`**: |
| 57 | + * Used for multi-element queries (`multi: true`). |
| 58 | + * Recursively traverses the accessibility tree starting from `element`. |
| 59 | + * Matches elements against `locator.role` (supports `"*"` or empty for wildcard) and `locator.match` (using robust boolean, numeric, and string comparison similar to `search`). |
| 60 | + * If `requireAction` is specified, it further filters elements to those supporting the given action using `elementSupportsAction`. |
| 61 | + * It aggregates all matching `AXUIElement`s into the `hits` array. |
| 62 | + * To discover children, it queries a comprehensive list of attributes known to contain child elements: |
| 63 | + * Standard: `kAXChildrenAttribute` ("AXChildren") |
| 64 | + * Web-specific: "AXLinks", "AXButtons", "AXControls", "AXDOMChildren", etc. |
| 65 | + * Application-specific: `kAXWindowsAttribute` ("AXWindows") |
| 66 | + * General containers: "AXContents", "AXVisibleChildren", etc. |
| 67 | + * Includes deduplication of found elements based on their `ObjectIdentifier`. |
| 68 | + * **`navigateToElement(from:pathHint:)`**: |
| 69 | + * Processes the `pathHint` array (e.g., `["window[1]", "toolbar[1]"]`). |
| 70 | + * Each component (e.g., "window[1]") is parsed into a role ("window") and a 0-based index (0). |
| 71 | + * It navigates the tree by finding children of the current element that match the role and selecting the one at the specified index. |
| 72 | + * Special handling for "window" role uses the `AXWindows` attribute for direct access. |
| 73 | + * The element found at the end of the path is used as the starting point for `search` or `collectAll`. |
| 74 | + |
| 75 | +* **Attribute Retrieval**: |
| 76 | + * `getElementAttributes(element:attributes:)`: Fetches attributes for a given `AXUIElement`. |
| 77 | + * If the input `attributes` list is empty or nil, it discovers all available attributes for the element using `AXUIElementCopyAttributeNames`. |
| 78 | + * It then iterates through the attributes to retrieve their values using `AXUIElementCopyAttributeValue`. |
| 79 | + * Handles various `CFTypeRef` return types and converts them to Swift/JSON-compatible representations: |
| 80 | + * `CFString` -> `String` |
| 81 | + * `CFBoolean` -> `Bool` |
| 82 | + * `CFNumber` -> `Int` (or "Number (conversion failed)") |
| 83 | + * `CFArray` -> Array of strings (for "AXActions") or descriptive string like "Array with X elements". |
| 84 | + * `AXValue` (for `AXPosition`, `AXSize`): Extracts `CGPoint` or `CGSize` and converts to `{"x": Int, "y": Int}` or `{"width": Int, "height": Int}`. Uses `AXValueGetTypeID()`, `AXValueGetType()`, and `AXValueGetValue()`. |
| 85 | + * `AXUIElement` (for attributes like `AXTitleUIElement`): Attempts to extract a display string (e.g., its "AXValue" or "AXTitle"). |
| 86 | + * Includes a `ComputedName` by trying `AXTitle`, `AXTitleUIElement`, `AXValue`, `AXDescription`, `AXLabel`, `AXHelp`, `AXRoleDescription` in order of preference. |
| 87 | + * Includes `IsClickable` (boolean) if the element is an `AXButton` or has an `AXPress` action. |
| 88 | + |
| 89 | +* **Action Performing**: |
| 90 | + * `handlePerform(cmd:)` calls `AXUIElementPerformAction(element, actionName)` to execute the specified action on the located element. |
| 91 | + * `elementSupportsAction(element:action:)` checks if an element supports a given action by fetching `AXActionNames` and checking for the action's presence. |
| 92 | + |
| 93 | +* **Error Handling**: |
| 94 | + * Uses a custom `AXErrorString` Swift enum (`.notAuthorised`, `.elementNotFound`, `.actionFailed`). |
| 95 | + * Responds with a JSON `ErrorResponse` object: `{ "error": "message", "debug_logs": [...] }`. |
| 96 | + |
| 97 | +* **Debugging**: |
| 98 | + * `GLOBAL_DEBUG_ENABLED` (Swift constant, currently `true`): If true, all `debug()` messages are printed to `stderr` of the `ax` process. |
| 99 | + * `debug_logging` field in input JSON: If `true`, enables `commandSpecificDebugLoggingEnabled`. |
| 100 | + * `collectedDebugLogs` (Swift array): Stores debug messages if `commandSpecificDebugLoggingEnabled` is true. This array is then included in the `debug_logs` field of the JSON response (both success and error). |
| 101 | + * The `debug(_ message: String)` function handles appending to `collectedDebugLogs` and printing to `stderr`. |
| 102 | + |
| 103 | +## 3. Build Process & Optimization |
| 104 | + |
| 105 | +The `ax` binary is built using the `Makefile` located in the `ax/` directory. |
| 106 | + |
| 107 | +* **Makefile (`ax/Makefile`)**: |
| 108 | + * **Universal Binary**: Builds for both `arm64` and `x86_64` architectures. |
| 109 | + * **Optimization Flags**: |
| 110 | + * `-Xswiftc -Osize`: Instructs the Swift compiler to optimize for binary size. |
| 111 | + * `-Xlinker -Wl,-dead_strip`: Instructs the linker to perform dead code elimination. |
| 112 | + * **Symbol Stripping**: |
| 113 | + * `strip -x $(UNIVERSAL_BINARY_PATH)`: Aggressively removes symbols from the linked universal binary to further reduce size. |
| 114 | + * **Output**: The final, optimized, and stripped binary is placed at `ax/ax`. |
| 115 | + * **Targets**: |
| 116 | + * `all` (default): Ensures the old `ax/ax` binary is removed, then builds the new one. It calls `$(MAKE) $(FINAL_BINARY_PATH)` to trigger the dependent build steps. |
| 117 | + * `$(FINAL_BINARY_PATH)`: Copies the built and stripped universal binary from the Swift build directory to `ax/ax`. |
| 118 | + * `$(UNIVERSAL_BINARY_PATH)`: Contains the `swift build` and `strip` commands. |
| 119 | + * `clean`: Removes Swift build artifacts (`.build/`) and the `ax/ax` binary. |
| 120 | +* **Optimization Journey Summary**: |
| 121 | + * The combination of `-Xswiftc -Osize`, `-Xlinker -Wl,-dead_strip`, and `strip -x` proved most effective for size reduction (e.g., from an initial ~369KB down to ~336KB). |
| 122 | + * Link-Time Optimization (`-Xswiftc -lto=llvm-full` or `-Xswiftc -lto=llvm-thin`) was attempted but resulted in linker errors (`ld: file cannot be open()ed... main.o`). |
| 123 | + * UPX compression was explored. While it significantly reduced size (e.g., 338K to 130K with `--force-macos`), the resulting binary was malformed (`zsh: malformed Mach-o file`) and unusable. UPX was therefore abandoned. |
| 124 | + * Other flags like `-Xswiftc -Oz` (not recognized by `swift build`) and `-Xlinker -compress_text` (caused linker errors) were unsuccessful. |
| 125 | + |
| 126 | +## 4. Running & Testing |
| 127 | + |
| 128 | +The `ax` binary is designed to be invoked by a parent process (like the MCP server) but can also be tested manually from the command line. |
| 129 | + |
| 130 | +* **Runner Script (`ax/ax_runner.sh`)**: |
| 131 | + * This is the **recommended way to execute `ax` manually** for testing and debugging. |
| 132 | + * It's a simple Bash script that robustly determines its own directory and then executes the `ax/ax` binary, passing along any arguments. |
| 133 | + * The TypeScript `AXQueryExecutor.ts` uses this runner script. |
| 134 | + * Script content: |
| 135 | + ```bash |
| 136 | + #!/bin/bash |
| 137 | + SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)" |
| 138 | + exec "$SCRIPT_DIR/ax" "$@" |
| 139 | + ``` |
| 140 | + |
| 141 | +* **Manual Testing Workflow**: |
| 142 | + 1. **Ensure Target Application State**: Before running a test, **critically verify** that the target application is running and is in the specific state you intend to query. For example, if you are querying for a window with `AXMain=true`, ensure the application has an actual document window open and focused, not just a file dialog or a menu bar. Mismatched application state is a common reason for "element not found" errors. |
| 143 | + 2. **Construct JSON Input**: Prepare your command as a single line of JSON. |
| 144 | + 3. **Execute via `ax_runner.sh`**: Pipe the JSON to the runner script. |
| 145 | + * Example: |
| 146 | + ```bash |
| 147 | + echo '{"cmd":"query","locator":{"app":"TextEdit","role":"AXWindow","match":{"AXMain":"true"}},"debug_logging":true}' | ./ax/ax_runner.sh |
| 148 | + ``` |
| 149 | + (You can also run `./ax/ax` directly, but the runner is slightly more robust for scripting.) |
| 150 | + 4. **Interpret Output**: |
| 151 | + * **`stdout`**: Receives the primary JSON response from `ax`. This will be a `QueryResponse`, `MultiQueryResponse`, or `PerformResponse` on success. |
| 152 | + * **`stderr`**: |
| 153 | + * If `ax` encounters an internal error or fails to parse the input, it will output an `ErrorResponse` JSON to `stderr` (e.g., `{"error":"No element matches the locator","debug_logs":[...]}`). |
| 154 | + * If `GLOBAL_DEBUG_ENABLED` is `true` in `main.swift` (which it is by default), all `debug(...)` messages from `ax` are continuously printed to `stderr`, prefixed with `DEBUG:`. This provides a live trace of `ax`'s internal operations. |
| 155 | + * The `debug_logs` array within the JSON response (on `stdout` for success, or `stderr` for `ErrorResponse`) contains logs collected specifically for that command if `"debug_logging": true` was in the input JSON. |
| 156 | + |
| 157 | +* **Example Test Queries**: |
| 158 | + 1. **Find TextEdit's main window (single element query)**: |
| 159 | + *Ensure TextEdit is running and has an actual document window open and active.* |
| 160 | + ```bash |
| 161 | + echo '{"cmd":"query","locator":{"app":"com.apple.TextEdit","role":"AXWindow","match":{"AXMain":"true"}},"return_all_matches":false,"debug_logging":true}' | ./ax/ax_runner.sh |
| 162 | + ``` |
| 163 | + 2. **List all elements in TextEdit (multi-element query)**: |
| 164 | + *Ensure TextEdit is running.* |
| 165 | + ```bash |
| 166 | + echo '{"cmd":"query","locator":{"app":"com.apple.TextEdit","role":"*","match":{}},"return_all_matches":true,"debug_logging":true}' | ./ax/ax_runner.sh |
| 167 | + ``` |
| 168 | + |
| 169 | +* **Permissions**: |
| 170 | + * **Crucial**: The application that executes `ax` (e.g., Terminal, your IDE, the Node.js process running the MCP server) **must** have "Accessibility" permissions granted in macOS "System Settings > Privacy & Security > Accessibility". |
| 171 | + * The `ax` binary itself calls `checkAccessibilityPermissions()` at startup. If permissions are not granted, it prints detailed instructions to `stderr` and exits. |
| 172 | + |
| 173 | +## 5. macOS Accessibility (AX) Intricacies & Swift Integration |
| 174 | + |
| 175 | +Working with the macOS Accessibility framework via Swift involves several specific considerations: |
| 176 | + |
| 177 | +* **Frameworks**: |
| 178 | + * `ApplicationServices`: Essential for `AXUIElement` and related C APIs. |
| 179 | + * `AppKit`: Used for `NSRunningApplication` (to get PIDs) and `NSWorkspace`. |
| 180 | +* **Element Hierarchy**: UI elements form a tree. Traversal typically involves getting an element's children via attributes like `kAXChildrenAttribute` ("AXChildren"), `kAXWindowsAttribute` ("AXWindows"), etc. |
| 181 | +* **Attributes (`AX...`)**: |
| 182 | + * Elements possess a wide range of attributes (e.g., `AXRole`, `AXTitle`, `AXSubrole`, `AXValue`, `AXFocused`, `AXMain`, `AXPosition`, `AXSize`, `AXIdentifier`). The presence of attributes can vary. |
| 183 | + * `CFTypeRef`: Attribute values are returned as `CFTypeRef`. Runtime type checking using `CFGetTypeID()` and `AXValueGetTypeID()` (for `AXValue` types) is necessary before safe casting. |
| 184 | + * `AXValue`: A special CoreFoundation type used for geometry (like `CGPoint` for `AXPosition`, `CGSize` for `AXSize`) and other structured data. Requires `AXValueGetValue()` to extract the underlying data. |
| 185 | +* **Actions (`AX...Action`)**: |
| 186 | + * Elements expose supported actions (e.g., `kAXPressAction` ("AXPress"), "AXShowMenu") via the `kAXActionsAttribute` ("AXActions") or `AXUIElementCopyActionNames()`. |
| 187 | + * Actions are performed using `AXUIElementPerformAction()`. |
| 188 | +* **Roles**: |
| 189 | + * `AXRole` (e.g., "AXWindow", "AXButton", "AXTextField") and `AXRoleDescription` (a human-readable string) describe the type/function of an element. |
| 190 | + * `AXRoleDescription` can sometimes be missing or less reliable than `AXRole`. |
| 191 | + * Using `"*"` or an empty string for `locator.role` acts as a wildcard in `collectAll`. |
| 192 | +* **Data Type Matching**: |
| 193 | + * When matching attributes from JSON input (where values are strings), the Swift code must correctly interpret these strings against the actual attribute types (e.g., string "true" for a `Bool` attribute, string "123" for a numeric attribute). Both `search` and `collectAll` implement logic for this. |
| 194 | +* **Bridging & Constants**: |
| 195 | + * Some C-based Accessibility constants (like `kAXWindowsAttribute`) might need to be defined as Swift constants if not directly available. |
| 196 | + * Private C functions like `AXUIElementGetTypeID_Impl()` might require `@_silgen_name` bridging. |
| 197 | +* **Debugging Tool**: |
| 198 | + * **Accessibility Inspector** (available in Xcode under "Xcode > Open Developer Tool > Accessibility Inspector") is an indispensable tool for visually exploring the accessibility hierarchy of any running application, viewing element attributes, and testing actions. |
| 199 | + |
| 200 | +This document should serve as a good reference for understanding and working with the `ax` binary. |
0 commit comments