Skip to content

Commit 364f32f

Browse files
committed
Add accessbility debugging rule
1 parent 10ca087 commit 364f32f

File tree

1 file changed

+200
-0
lines changed

1 file changed

+200
-0
lines changed

.cursor/rules/ax.mdc

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# macOS Accessibility (`ax`) Binary Rules & Knowledge
7+
8+
This document outlines the functionality, build process, testing procedures, and technical details of the `ax` Swift command-line utility, designed for interacting with the macOS Accessibility framework.
9+
10+
## 1. `ax` Binary Overview
11+
12+
* **Purpose**: Provides a JSON-based interface to query UI elements and perform actions using the macOS Accessibility API. It's intended to be called by other processes (like the MCP server).
13+
* **Communication**: Operates by reading JSON commands from `stdin` and writing JSON responses (or errors) to `stdout` (or `stderr` for errors).
14+
* **Core Commands**:
15+
* `query`: Retrieves information about UI elements.
16+
* `perform`: Executes an action on a UI element.
17+
* **Key Input Fields (JSON)**:
18+
* `cmd` (string): "query" or "perform".
19+
* `locator` (object): Specifies the target element(s).
20+
* `app` (string): Bundle ID or localized name of the target application (e.g., "com.apple.TextEdit", "Safari").
21+
* `role` (string): The accessibility role of the target element (e.g., "AXWindow", "AXButton", "*").
22+
* `match` (object): Key-value pairs of attributes to match (e.g., `{"AXMain": "true"}`). Values are strings.
23+
* `pathHint` (array of strings, optional): A path to navigate the UI tree (e.g., `["window[1]", "toolbar[1]"]`).
24+
* `attributes` (array of strings, optional): For `query`, specific attributes to retrieve. Defaults to a common set if omitted.
25+
* `action` (string, optional): For `perform`, the action to execute (e.g., "AXPress").
26+
* `multi` (boolean, optional): For `query`, if `true`, returns all matching elements. Defaults to `false`.
27+
* `requireAction` (string, optional): For `query`, filters results to elements supporting a specific action.
28+
* `debug_logging` (boolean, optional): If `true`, includes detailed internal debug logs in the response.
29+
* **Key Output Fields (JSON)**:
30+
* Success (`query`): `{ "attributes": { "AXTitle": "...", ... } }`
31+
* Success (`query`, `multi: true`): `{ "elements": [ { "AXTitle": "..." }, ... ] }`
32+
* Success (`perform`): `{ "status": "ok" }`
33+
* Error: `{ "error": "Error message description" }`
34+
* `debug_logs` (array of strings, optional): Included in success or error responses if `debug_logging` was true.
35+
36+
## 2. Functionality - How it Works
37+
38+
The `ax` binary is implemented in Swift in `ax/Sources/AXHelper/main.swift`.
39+
40+
* **Application Targeting**:
41+
* `getApplicationElement(bundleIdOrName: String)`: This function is the entry point to an application's accessibility tree.
42+
* It first tries to find the application using its bundle identifier (e.g., "com.apple.Safari") via `NSRunningApplication.runningApplications(withBundleIdentifier:)`.
43+
* If not found, it iterates through all running applications and attempts to match by the application's localized name (e.g., "Safari") via `NSRunningApplication.localizedName`.
44+
* Once the `NSRunningApplication` instance is found, `AXUIElementCreateApplication(pid)` is used to get the root `AXUIElement` for that application.
45+
46+
* **Element Location**:
47+
* **`search(element:locator:depth:maxDepth:)`**:
48+
* Used for single-element queries (when `multi` is `false` or not set).
49+
* Performs a depth-first search starting from a given `element` (usually the application element or one found via `pathHint`).
50+
* It checks if an element's `AXRole` matches `locator.role`.
51+
* Then, it verifies that all attribute-value pairs in `locator.match` correspond to the element's actual attributes. This matching logic handles:
52+
* **Boolean attributes** (e.g., `AXMain`, `AXFocused`): Compares against string "true" or "false".
53+
* **Numeric attributes**: Attempts to parse `wantStr` (from `locator.match`) as an `Int` and compares numerically.
54+
* **String attributes**: Performs direct string comparison.
55+
* If a match is found, the `AXUIElement` is returned. Otherwise, it recursively searches children.
56+
* **`collectAll(element:locator:requireAction:hits:depth:maxDepth:)`**:
57+
* Used for multi-element queries (`multi: true`).
58+
* Recursively traverses the accessibility tree starting from `element`.
59+
* Matches elements against `locator.role` (supports `"*"` or empty for wildcard) and `locator.match` (using robust boolean, numeric, and string comparison similar to `search`).
60+
* If `requireAction` is specified, it further filters elements to those supporting the given action using `elementSupportsAction`.
61+
* It aggregates all matching `AXUIElement`s into the `hits` array.
62+
* To discover children, it queries a comprehensive list of attributes known to contain child elements:
63+
* Standard: `kAXChildrenAttribute` ("AXChildren")
64+
* Web-specific: "AXLinks", "AXButtons", "AXControls", "AXDOMChildren", etc.
65+
* Application-specific: `kAXWindowsAttribute` ("AXWindows")
66+
* General containers: "AXContents", "AXVisibleChildren", etc.
67+
* Includes deduplication of found elements based on their `ObjectIdentifier`.
68+
* **`navigateToElement(from:pathHint:)`**:
69+
* Processes the `pathHint` array (e.g., `["window[1]", "toolbar[1]"]`).
70+
* Each component (e.g., "window[1]") is parsed into a role ("window") and a 0-based index (0).
71+
* It navigates the tree by finding children of the current element that match the role and selecting the one at the specified index.
72+
* Special handling for "window" role uses the `AXWindows` attribute for direct access.
73+
* The element found at the end of the path is used as the starting point for `search` or `collectAll`.
74+
75+
* **Attribute Retrieval**:
76+
* `getElementAttributes(element:attributes:)`: Fetches attributes for a given `AXUIElement`.
77+
* If the input `attributes` list is empty or nil, it discovers all available attributes for the element using `AXUIElementCopyAttributeNames`.
78+
* It then iterates through the attributes to retrieve their values using `AXUIElementCopyAttributeValue`.
79+
* Handles various `CFTypeRef` return types and converts them to Swift/JSON-compatible representations:
80+
* `CFString` -> `String`
81+
* `CFBoolean` -> `Bool`
82+
* `CFNumber` -> `Int` (or "Number (conversion failed)")
83+
* `CFArray` -> Array of strings (for "AXActions") or descriptive string like "Array with X elements".
84+
* `AXValue` (for `AXPosition`, `AXSize`): Extracts `CGPoint` or `CGSize` and converts to `{"x": Int, "y": Int}` or `{"width": Int, "height": Int}`. Uses `AXValueGetTypeID()`, `AXValueGetType()`, and `AXValueGetValue()`.
85+
* `AXUIElement` (for attributes like `AXTitleUIElement`): Attempts to extract a display string (e.g., its "AXValue" or "AXTitle").
86+
* Includes a `ComputedName` by trying `AXTitle`, `AXTitleUIElement`, `AXValue`, `AXDescription`, `AXLabel`, `AXHelp`, `AXRoleDescription` in order of preference.
87+
* Includes `IsClickable` (boolean) if the element is an `AXButton` or has an `AXPress` action.
88+
89+
* **Action Performing**:
90+
* `handlePerform(cmd:)` calls `AXUIElementPerformAction(element, actionName)` to execute the specified action on the located element.
91+
* `elementSupportsAction(element:action:)` checks if an element supports a given action by fetching `AXActionNames` and checking for the action's presence.
92+
93+
* **Error Handling**:
94+
* Uses a custom `AXErrorString` Swift enum (`.notAuthorised`, `.elementNotFound`, `.actionFailed`).
95+
* Responds with a JSON `ErrorResponse` object: `{ "error": "message", "debug_logs": [...] }`.
96+
97+
* **Debugging**:
98+
* `GLOBAL_DEBUG_ENABLED` (Swift constant, currently `true`): If true, all `debug()` messages are printed to `stderr` of the `ax` process.
99+
* `debug_logging` field in input JSON: If `true`, enables `commandSpecificDebugLoggingEnabled`.
100+
* `collectedDebugLogs` (Swift array): Stores debug messages if `commandSpecificDebugLoggingEnabled` is true. This array is then included in the `debug_logs` field of the JSON response (both success and error).
101+
* The `debug(_ message: String)` function handles appending to `collectedDebugLogs` and printing to `stderr`.
102+
103+
## 3. Build Process & Optimization
104+
105+
The `ax` binary is built using the `Makefile` located in the `ax/` directory.
106+
107+
* **Makefile (`ax/Makefile`)**:
108+
* **Universal Binary**: Builds for both `arm64` and `x86_64` architectures.
109+
* **Optimization Flags**:
110+
* `-Xswiftc -Osize`: Instructs the Swift compiler to optimize for binary size.
111+
* `-Xlinker -Wl,-dead_strip`: Instructs the linker to perform dead code elimination.
112+
* **Symbol Stripping**:
113+
* `strip -x $(UNIVERSAL_BINARY_PATH)`: Aggressively removes symbols from the linked universal binary to further reduce size.
114+
* **Output**: The final, optimized, and stripped binary is placed at `ax/ax`.
115+
* **Targets**:
116+
* `all` (default): Ensures the old `ax/ax` binary is removed, then builds the new one. It calls `$(MAKE) $(FINAL_BINARY_PATH)` to trigger the dependent build steps.
117+
* `$(FINAL_BINARY_PATH)`: Copies the built and stripped universal binary from the Swift build directory to `ax/ax`.
118+
* `$(UNIVERSAL_BINARY_PATH)`: Contains the `swift build` and `strip` commands.
119+
* `clean`: Removes Swift build artifacts (`.build/`) and the `ax/ax` binary.
120+
* **Optimization Journey Summary**:
121+
* The combination of `-Xswiftc -Osize`, `-Xlinker -Wl,-dead_strip`, and `strip -x` proved most effective for size reduction (e.g., from an initial ~369KB down to ~336KB).
122+
* Link-Time Optimization (`-Xswiftc -lto=llvm-full` or `-Xswiftc -lto=llvm-thin`) was attempted but resulted in linker errors (`ld: file cannot be open()ed... main.o`).
123+
* UPX compression was explored. While it significantly reduced size (e.g., 338K to 130K with `--force-macos`), the resulting binary was malformed (`zsh: malformed Mach-o file`) and unusable. UPX was therefore abandoned.
124+
* Other flags like `-Xswiftc -Oz` (not recognized by `swift build`) and `-Xlinker -compress_text` (caused linker errors) were unsuccessful.
125+
126+
## 4. Running & Testing
127+
128+
The `ax` binary is designed to be invoked by a parent process (like the MCP server) but can also be tested manually from the command line.
129+
130+
* **Runner Script (`ax/ax_runner.sh`)**:
131+
* This is the **recommended way to execute `ax` manually** for testing and debugging.
132+
* It's a simple Bash script that robustly determines its own directory and then executes the `ax/ax` binary, passing along any arguments.
133+
* The TypeScript `AXQueryExecutor.ts` uses this runner script.
134+
* Script content:
135+
```bash
136+
#!/bin/bash
137+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &> /dev/null && pwd)"
138+
exec "$SCRIPT_DIR/ax" "$@"
139+
```
140+
141+
* **Manual Testing Workflow**:
142+
1. **Ensure Target Application State**: Before running a test, **critically verify** that the target application is running and is in the specific state you intend to query. For example, if you are querying for a window with `AXMain=true`, ensure the application has an actual document window open and focused, not just a file dialog or a menu bar. Mismatched application state is a common reason for "element not found" errors.
143+
2. **Construct JSON Input**: Prepare your command as a single line of JSON.
144+
3. **Execute via `ax_runner.sh`**: Pipe the JSON to the runner script.
145+
* Example:
146+
```bash
147+
echo '{"cmd":"query","locator":{"app":"TextEdit","role":"AXWindow","match":{"AXMain":"true"}},"debug_logging":true}' | ./ax/ax_runner.sh
148+
```
149+
(You can also run `./ax/ax` directly, but the runner is slightly more robust for scripting.)
150+
4. **Interpret Output**:
151+
* **`stdout`**: Receives the primary JSON response from `ax`. This will be a `QueryResponse`, `MultiQueryResponse`, or `PerformResponse` on success.
152+
* **`stderr`**:
153+
* If `ax` encounters an internal error or fails to parse the input, it will output an `ErrorResponse` JSON to `stderr` (e.g., `{"error":"No element matches the locator","debug_logs":[...]}`).
154+
* If `GLOBAL_DEBUG_ENABLED` is `true` in `main.swift` (which it is by default), all `debug(...)` messages from `ax` are continuously printed to `stderr`, prefixed with `DEBUG:`. This provides a live trace of `ax`'s internal operations.
155+
* The `debug_logs` array within the JSON response (on `stdout` for success, or `stderr` for `ErrorResponse`) contains logs collected specifically for that command if `"debug_logging": true` was in the input JSON.
156+
157+
* **Example Test Queries**:
158+
1. **Find TextEdit's main window (single element query)**:
159+
*Ensure TextEdit is running and has an actual document window open and active.*
160+
```bash
161+
echo '{"cmd":"query","locator":{"app":"com.apple.TextEdit","role":"AXWindow","match":{"AXMain":"true"}},"return_all_matches":false,"debug_logging":true}' | ./ax/ax_runner.sh
162+
```
163+
2. **List all elements in TextEdit (multi-element query)**:
164+
*Ensure TextEdit is running.*
165+
```bash
166+
echo '{"cmd":"query","locator":{"app":"com.apple.TextEdit","role":"*","match":{}},"return_all_matches":true,"debug_logging":true}' | ./ax/ax_runner.sh
167+
```
168+
169+
* **Permissions**:
170+
* **Crucial**: The application that executes `ax` (e.g., Terminal, your IDE, the Node.js process running the MCP server) **must** have "Accessibility" permissions granted in macOS "System Settings > Privacy & Security > Accessibility".
171+
* The `ax` binary itself calls `checkAccessibilityPermissions()` at startup. If permissions are not granted, it prints detailed instructions to `stderr` and exits.
172+
173+
## 5. macOS Accessibility (AX) Intricacies & Swift Integration
174+
175+
Working with the macOS Accessibility framework via Swift involves several specific considerations:
176+
177+
* **Frameworks**:
178+
* `ApplicationServices`: Essential for `AXUIElement` and related C APIs.
179+
* `AppKit`: Used for `NSRunningApplication` (to get PIDs) and `NSWorkspace`.
180+
* **Element Hierarchy**: UI elements form a tree. Traversal typically involves getting an element's children via attributes like `kAXChildrenAttribute` ("AXChildren"), `kAXWindowsAttribute` ("AXWindows"), etc.
181+
* **Attributes (`AX...`)**:
182+
* Elements possess a wide range of attributes (e.g., `AXRole`, `AXTitle`, `AXSubrole`, `AXValue`, `AXFocused`, `AXMain`, `AXPosition`, `AXSize`, `AXIdentifier`). The presence of attributes can vary.
183+
* `CFTypeRef`: Attribute values are returned as `CFTypeRef`. Runtime type checking using `CFGetTypeID()` and `AXValueGetTypeID()` (for `AXValue` types) is necessary before safe casting.
184+
* `AXValue`: A special CoreFoundation type used for geometry (like `CGPoint` for `AXPosition`, `CGSize` for `AXSize`) and other structured data. Requires `AXValueGetValue()` to extract the underlying data.
185+
* **Actions (`AX...Action`)**:
186+
* Elements expose supported actions (e.g., `kAXPressAction` ("AXPress"), "AXShowMenu") via the `kAXActionsAttribute` ("AXActions") or `AXUIElementCopyActionNames()`.
187+
* Actions are performed using `AXUIElementPerformAction()`.
188+
* **Roles**:
189+
* `AXRole` (e.g., "AXWindow", "AXButton", "AXTextField") and `AXRoleDescription` (a human-readable string) describe the type/function of an element.
190+
* `AXRoleDescription` can sometimes be missing or less reliable than `AXRole`.
191+
* Using `"*"` or an empty string for `locator.role` acts as a wildcard in `collectAll`.
192+
* **Data Type Matching**:
193+
* When matching attributes from JSON input (where values are strings), the Swift code must correctly interpret these strings against the actual attribute types (e.g., string "true" for a `Bool` attribute, string "123" for a numeric attribute). Both `search` and `collectAll` implement logic for this.
194+
* **Bridging & Constants**:
195+
* Some C-based Accessibility constants (like `kAXWindowsAttribute`) might need to be defined as Swift constants if not directly available.
196+
* Private C functions like `AXUIElementGetTypeID_Impl()` might require `@_silgen_name` bridging.
197+
* **Debugging Tool**:
198+
* **Accessibility Inspector** (available in Xcode under "Xcode > Open Developer Tool > Accessibility Inspector") is an indispensable tool for visually exploring the accessibility hierarchy of any running application, viewing element attributes, and testing actions.
199+
200+
This document should serve as a good reference for understanding and working with the `ax` binary.

0 commit comments

Comments
 (0)