
Recursive Control is an innovative project designed to enable artificial intelligence (AI) to interact seamlessly with your computer, automating tasks, performing complex workflows, and enhancing productivity.
Our mission is to create an AI-driven interface that can autonomously control your computer, intelligently perform tasks, open applications, execute commands, and streamline workflows, effectively turning natural language into actionable operations.
- AI-Powered Interaction: Utilize AI models (such as GPT-based models) to interpret user input and intelligently execute actions.
- Automated Workflow Execution: Automate repetitive or complex sequences of computer actions.
- Natural Language Commands: Simply describe tasks in plain language, and let the AI handle execution.
- .NET 4.8 or later
- Windows Operating System
- Azure OpenAI API Key (More models will be supported in the future)
Download the latest release from the Releases page and follow three easy steps.
- Run recursivecontrol.exe
- Setup your LLM
- Input your commands directly into the UI, and watch as AI automate your tasks.
-
Clone this repository:
git clone https://github.com/flowdevs-io/Recursive-Control.git
-
Navigate to the cloned directory:
cd Recursive-Control
-
Restore dependencies and build the project:
dotnet restore dotnet build
Recursive Control supports a modular plugin system, allowing you to extend its capabilities. Plugins can automate keyboard, mouse, window management, screen capture, command line, and more. You can find plugin implementations in the FlowVision/lib/Plugins/
directory. To add your own plugin, implement the required interface and register it in the application.
- CMDPlugin: Execute Windows command line instructions.
- PowershellPlugin: Run PowerShell scripts and commands.
- KeyboardPlugin: Automate keyboard input.
- MousePlugin: Automate mouse actions.
- ScreenCapturePlugin: Capture screenshots.
- WindowSelectionPlugin: Select and interact with application windows.
- PlaywrightPlugin: Automate web browsers using Playwright. Use
LaunchBrowser
to start,ExecuteScript
to run JavaScript, andCloseBrowser
when finished. - RemoteControlPlugin: Listen for HTTP JSON commands and forward them to the AI executor.
Start the server by enabling it in the ToolConfig. Send POST requests with
{ "command": "your text" }
to the configured port.
FlowVision.sln # Solution file
FlowVision/ # Main application source
lib/ # Core libraries and plugins
Classes/ # Helper and service classes
Plugins/ # Built-in plugins
UI/ # UI theming
Models/ # Data models
Properties/ # .NET project properties
content/ # Images and assets
General Structure
- The project is a Windows Forms application targeting .NET 4.8. The solution (
FlowVision.sln
) loads a single projectFlowVision
. Program.cs
contains the entry point, which startsForm1
.- Core logic lives under
FlowVision/lib/Classes/
andFlowVision/lib/Plugins/
. - Plugins include modules such as
CMDPlugin
,KeyboardPlugin
,MousePlugin
, and screen-capture tools. - Configuration classes (
APIConfig
,ToolConfig
, etc.) store user settings under%APPDATA%\FlowVision\...
for persistence.
Important Components
- Plugin System β Explained above; it allows extending the toolset with keyboard/mouse automation, window management, PowerShell, etc. Plugins are stored in
FlowVision/lib/Plugins/
. - ToolConfig β Holds feature toggles and prompt templates. Default values and prompts are defined here.
- MultiAgentActioner β Implements a multi-agent workflow using Semantic Kernel to coordinate a "coordinator," "planner," and "executor" agent.
- User Interface β
Form1
presents a chat-like UI with text and speech input, usesThemeManager
for light/dark themes, and logs plugin operations usingPluginLogger
.
Getting Started
The README provides prerequisites, setup instructions, and folder layout.
Pointers for Next Steps
- Explore Plugin Development β Each plugin class uses Semantic Kernel's
[KernelFunction]
attributes to expose commands. Creating new plugins or modifying existing ones is a good way to extend functionality. - Review Multi-Agent Logic β
MultiAgentActioner
demonstrates coordinating multiple models/agents. Understanding its workflow helps when adapting the app to other LLMs or custom behaviors. - Understand Configuration Handling β Look into how
ToolConfig
andAPIConfig
store settings in JSON files under%APPDATA%
. Learning this pattern is important for customizing the tool for different environments. - UI Customization β The
ThemeManager
andMarkdownHelper
classes show how theming and markdown rendering are done. This is useful if you want to adapt the interface. - Security and Logging β Read
SECURITY.md
for guidelines on reporting issues and inspectPluginLogger
for how plugin usage is tracked.
graph TD
Program[Program.cs] --> Form1
Form1 --> PluginSystem
Form1 --> MultiAgentActioner
PluginSystem --> CMDPlugin
PluginSystem --> KeyboardPlugin
PluginSystem --> MousePlugin
PluginSystem --> ScreenCapturePlugin
MultiAgentActioner --> CoordinatorAgent
MultiAgentActioner --> PlannerAgent
MultiAgentActioner --> ExecutorAgent
APIConfig -.-> Form1
ToolConfig -.-> Form1
- Control applications via natural language (e.g., "Open Excel and create a new spreadsheet")
- Capture and process screenshots for documentation
- Batch rename files or organize folders
- Use PlaywrightPlugin to automate websites, e.g.,
LaunchBrowser
,NavigateTo
, thenExecuteScript("return document.title;")
to read the page title
- Content warning logging: Implement logging for content warnings to improve safety and transparency.
- Model Support: Add support for Gemini, OLLAMA, OpenAI, Bedrock, Phi4, and Phi Silica models.
- Improved Speech Recognition: Move away from System.Speech.Recognition (which is slow and inaccurate for voice commands) and adopt real-time audio models from OpenAI or similar providers.
- Local Bbox Search: Reduce token usage by integrating Bbox search locally (using OLLAMA, Phi Silica, or other novel SLMs).
- Managed LLM Integration: Develop Recursive Control managed LLM for non-user configurable integration, enabling billing for usage or subscription plans.
- YOLO Bbox Parser Integration: Integrate Yolo Bbox parser using ONNX for advanced vision capabilities.
Recursive Control running on every Windows computer, leveraging local SLMs, Recursive Control hosted LLMs, and embedded YOLO vision models. The ultimate aim is to make the integration so seamless that new PC users will no longer need a keyboard or mouseβjust interact with the latest LLM, and it will turn words into commands. So easy our elders will even use it.
- Ensure you have .NET 4.8+ installed
- Check your API key and network connection for LLM access
- For plugin errors, review the application logs in %appdata%\FlowVision\plugin_usage.log
We welcome contributions! Please feel free to submit issues, suggestions, or pull requests. Your collaboration is essential for making Recursive Control powerful and versatile.
- GitHub Issues for bug reports and feature requests
- Discussions for Q&A and ideas
- LinkedIn for updates and networking
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions, feedback, or collaboration inquiries, please connect with us through our GitHub repository, or via LinkedIn.
If you use Recursive Control in your research or project, please cite:
@software{recursive-control2025,
author = {Trantham, Justin},
title = {Recursive Control: AI Control for Windows Computers },
year = {2025},
publisher = {GitHub},
url = {https://github.com/flowdevs-io/Recursive-Control}
}