Skip to content

Add GPT semantic command stack with guarded execution#198

Open
vovka wants to merge 2 commits intoC-Loftus:mainfrom
vovka:main
Open

Add GPT semantic command stack with guarded execution#198
vovka wants to merge 2 commits intoC-Loftus:mainfrom
vovka:main

Conversation

@vovka
Copy link

@vovka vovka commented Feb 15, 2026

Summary

This PR adds a new standalone semantic command stack inside GPT/semantic for talon-ai-tools.

The goal is to let users speak high-level requests (for example, “model semantic open google chrome and search for …”), generate a constrained action plan, preview it, and only execute after explicit confirmation.

Purpose

  • Add model semantic ... voice commands with preview/run/cancel/copy/repeat flows.
  • Keep runtime safety by requiring confirmation before execution and stopping on first execution error.

Mechanics: How Semantic Commands Work

  • User says model semantic <request>.
  • Talon sends that text to the semantic runtime.
  • Runtime gathers context (active app/window, running apps, launchable apps).
  • Runtime builds a strict prompt with allowed actions + safety rules + user request.
  • Prompt is sent via existing AI-tools transport (llm CLI or API endpoint).
  • Model returns a JSON action plan (steps + optional summary).
  • Plan is parsed and schema-validated (with one repair retry on parse/schema failure).
  • Guardrails validate limits (step count, sleep budget, insert size, blocked key combos).
  • If valid, plan is stored as pending and shown in preview (no execution yet).
  • User can run plan, cancel plan, copy plan, or repeat last.
  • On run plan, executor runs steps in order with focus/settle synchronization.
  • Execution stops on first failure and reports the exact failed step/action.

Examples:

Speech: model semantic open chrome and search for recent trends
Typical plan:

{
  "steps": [
    {"action": "switch_app", "args": {"app_name": "Google Chrome"}},
    {"action": "new_tab", "args": {}},
    {"action": "focus_address", "args": {}},
    {"action": "insert_text", "args": {"text": "recent trends"}},
    {"action": "key", "args": {"combo": "enter"}}
  ],
  "summary": "Open Chrome and run a web search"
}

Speech: model semantic open gedit
Typical plan:

{
  "steps": [
    {"action": "launch_app", "args": {"app_name": "gedit"}}
  ],
  "summary": "Launch text editor"
}

Speech: model semantic find budget in this page
Typical plan:

{
  "steps": [
    {"action": "find_text", "args": {"text": "budget"}}
  ],
  "summary": "Open in-page search for budget"
}

Speech: model semantic go to github.com in current tab
Typical plan:

{
  "steps": [
    {"action": "go_url", "args": {"url": "https://github.com"}}
  ],
  "summary": "Navigate current tab to GitHub"
}

Speech: model semantic select all and copy
Typical plan:

{
  "steps": [
    {"action": "select_all", "args": {}},
    {"action": "copy", "args": {}}
  ],
  "summary": "Copy all content"
}

Speech: model semantic type hello world and press enter
Typical plan:

{
  "steps": [
    {"action": "insert_text", "args": {"text": "hello world"}},
    {"action": "key", "args": {"combo": "enter"}}
  ],
  "summary": "Insert text and submit"
}

Speech: model semantic wait half a second then paste
Typical plan:

{
  "steps": [
    {"action": "sleep", "args": {"ms": 500}},
    {"action": "paste", "args": {}}
  ],
  "summary": "Delay then paste"
}

User-facing behavior added

  • model semantic <user.text>
  • model semantic run plan
  • model semantic cancel plan
  • model semantic copy plan
  • model semantic repeat last

@vovka vovka marked this pull request as ready for review February 17, 2026 18:58
@C-Loftus
Copy link
Owner

Thanks for your PR. While I appreciate the work, I am a bit worried on the size of the changes and the amount that appears to be generated from AI.

I think in order for me to review this thoroughly and feel comfortable merging, I would need to get buy in from other users in the talon community (i.e. that it seems useful and intuitive enough to warrant the additional code), ensure it works across platforms, and get some more real-world examples. Something like a screen recording demo would be ideal if possible just since the example plans are a bit abstract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants