You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, AI-generated actions (BasicAction) are limited to low-level mouse/keyboard operations. Many common automation tasks — opening URLs, launching apps, getting window information — require fragile multi-step sequences (move to dock icon → click → wait → move to address bar → click → type URL → press Enter). These break when screen layouts change.
macOS provides AppleScript as a robust, coordinate-independent way to control applications. By adding structured, parameterized app-control actions to BasicAction, the AI agent can accomplish high-level tasks in a single action instead of chaining error-prone mouse/keyboard steps.
Motivation
Task
Current approach (mouse/keyboard)
With app-control actions
Open a URL
move → click browser → wait → move to address bar → click → type URL → enter
openURL(url: "https://...")
Launch/activate an app
move to Dock → click (or Spotlight → type → enter)
activateApp(name: "Safari")
Get frontmost app name
Not possible
getFrontmostApp → returns info for decision-making
Key benefits:
Reliability: No dependency on screen coordinates or UI layout
Efficiency: Single action instead of 3-7 step sequences
Capability: Enables read-back (window titles, app states) that pure mouse/keyboard cannot achieve
Proposed New Actions
Tier 1 — High value, simple parameters, safe
Action
Parameters
What it does
AppleScript backing
openURL
url: String
Opens a URL in the default browser
open location "..."
activateApp
name: String
Brings an app to front (launches if needed)
tell application "..." to activate
quitApp
name: String
Gracefully quits an app
tell application "..." to quit
getFrontmostApp
none
Returns the name of the frontmost application
tell application "System Events" to get name of first process whose frontmost is true
Tier 2 — Medium complexity, very useful
Action
Parameters
What it does
AppleScript backing
typeInApp
appName: String, text: String
Activate app then type — combines two common steps
activate + keystroke
getWindowList
appName: String?
Returns list of open window titles
System Events windows query
setVolume
level: Int (0-100)
Set system output volume
set volume output volume N
Tier 3 — Powerful but needs careful scoping
Action
Parameters
What it does
Notes
tellApp
appName: String, command: String
Send a specific command to an app
See Security section — this is the most flexible but also most dangerous
Security Considerations
The key design principle: the AI generates structured parameters, not arbitrary AppleScript code.
Tier 1-2 actions are safe by construction — they map to fixed AppleScript templates with only string/number parameters injected
Parameter sanitization is needed to prevent injection (e.g., appName containing " & do shell script "..."). Use NSAppleScript parameter binding or strict validation (alphanumeric + spaces only for app names, URL scheme validation for URLs)
tellApp (Tier 3) is intentionally deferred — it requires either:
An allowlist of known-safe commands per app, or
User confirmation before execution, or
A separate "unsafe" action set that must be explicitly opted into
Entitlements
AppleScript execution already requires specific entitlements (documented in AppleScript.swift):
com.apple.security.app-sandbox = false
com.apple.security.automation.apple-events = true
NSAppleEventsUsageDescription in Info.plist
These are existing requirements for Action.executeAppleScript. The new actions would share the same requirements and should be clearly documented.
Implementation Plan
Phase 1: Tier 1 actions (this issue)
Files to modify:
Sources/SwiftAutoGUI/ActionGenerator.swift
Add openURL, activateApp, quitApp, getFrontmostApp cases to BasicAction
Update toAction() to map each to Action.executeAppleScript(...) with template strings
Sources/SwiftAutoGUI/OpenAIBackend.swift — Update system prompt and JSON schema
Sources/SwiftAutoGUI/OpenAIVisionBackend.swift — Update system prompt, parseAction, JSON schema, describeBasicAction
Tests/SwiftAutoGUITests/ActionGeneratorTests.swift — Add tests for new action types
FoundationModelsBackend.swift — No changes needed (@Generable auto-synthesizes)
Phase 2: Tier 2 actions (follow-up PR)
Phase 3: Evaluate Tier 3 with safety mechanism (separate discussion)
Design Notes
All Tier 1 parameters are String → fully @Generable compatible
The toAction() conversion constructs AppleScript strings internally — BasicAction stays clean and the AppleScript details are an implementation concern
getFrontmostApp returns a value, which is useful for agentic loops where the AI can observe state and decide next actions. The return value flows through Action.executeAppleScript's existing String? return path
Summary
Currently, AI-generated actions (
BasicAction) are limited to low-level mouse/keyboard operations. Many common automation tasks — opening URLs, launching apps, getting window information — require fragile multi-step sequences (move to dock icon → click → wait → move to address bar → click → type URL → press Enter). These break when screen layouts change.macOS provides AppleScript as a robust, coordinate-independent way to control applications. By adding structured, parameterized app-control actions to
BasicAction, the AI agent can accomplish high-level tasks in a single action instead of chaining error-prone mouse/keyboard steps.Motivation
openURL(url: "https://...")activateApp(name: "Safari")getFrontmostApp→ returns info for decision-makingKey benefits:
Proposed New Actions
Tier 1 — High value, simple parameters, safe
openURLurl: Stringopen location "..."activateAppname: Stringtell application "..." to activatequitAppname: Stringtell application "..." to quitgetFrontmostApptell application "System Events" to get name of first process whose frontmost is trueTier 2 — Medium complexity, very useful
typeInAppappName: String, text: Stringactivate+ keystrokegetWindowListappName: String?windowsquerysetVolumelevel: Int(0-100)set volume output volume NTier 3 — Powerful but needs careful scoping
tellAppappName: String, command: StringSecurity Considerations
The key design principle: the AI generates structured parameters, not arbitrary AppleScript code.
appNamecontaining" & do shell script "..."). UseNSAppleScriptparameter binding or strict validation (alphanumeric + spaces only for app names, URL scheme validation for URLs)tellApp(Tier 3) is intentionally deferred — it requires either:Entitlements
AppleScript execution already requires specific entitlements (documented in
AppleScript.swift):com.apple.security.app-sandbox=falsecom.apple.security.automation.apple-events=trueNSAppleEventsUsageDescriptionin Info.plistThese are existing requirements for
Action.executeAppleScript. The new actions would share the same requirements and should be clearly documented.Implementation Plan
Phase 1: Tier 1 actions (this issue)
Files to modify:
Sources/SwiftAutoGUI/ActionGenerator.swiftopenURL,activateApp,quitApp,getFrontmostAppcases toBasicActiontoAction()to map each toAction.executeAppleScript(...)with template stringsCodableconformance (CodingKeys,ActionType,init(from:),encode(to:))Sources/SwiftAutoGUI/OpenAIBackend.swift— Update system prompt and JSON schemaSources/SwiftAutoGUI/OpenAIVisionBackend.swift— Update system prompt,parseAction, JSON schema,describeBasicActionTests/SwiftAutoGUITests/ActionGeneratorTests.swift— Add tests for new action typesFoundationModelsBackend.swift— No changes needed (@Generableauto-synthesizes)Phase 2: Tier 2 actions (follow-up PR)
Phase 3: Evaluate Tier 3 with safety mechanism (separate discussion)
Design Notes
String→ fully@GenerablecompatibletoAction()conversion constructs AppleScript strings internally —BasicActionstays clean and the AppleScript details are an implementation concerngetFrontmostAppreturns a value, which is useful for agentic loops where the AI can observe state and decide next actions. The return value flows throughAction.executeAppleScript's existingString?return pathExample: How
toAction()would workRelated