A modern, cross-platform, AI-powered voice assistant for your desktop. Instantly open or close apps, files, folders, and websites using natural language voice commands. Combines local fuzzy search, LLM intent extraction, and real-time web search for a truly human, productive experience.
- Voice-Activated: Control your desktop with your voice using advanced speech recognition.
- Open & Close Anything: Instantly open or close apps, files, folders, and Chrome tabs with natural language ("open terminal", "close downloads folder", "open kunaal updated resume file").
- AI-Powered Intent Extraction: Uses LLM (Mistral) for robust, context-aware command understanding.
- Fuzzy File Search: Finds files even with typos, word order changes, or partial names ("open ai internal 2 pdf file" →
AI Internal - 2.pdf). - Smart App & System App Handling: Recognizes and opens/closes system apps (Terminal, Finder, Settings, Activity Monitor, etc.) by name.
- Bulletproof Website Handling: Opens the correct website in Chrome using Google Custom Search API ("open cbit in chrome" →
https://www.cbit.ac.in). - Deep, Fast Indexing: Recursively indexes all user-facing folders (Downloads, Desktop, Documents, /Applications, etc.) for instant lookups.
- Natural Language: No rigid syntax—just speak as you would to a human assistant.
- Cross-Platform: macOS (full), Windows/Linux (open/close app logic can be extended).
open terminalclose terminalopen downloads folderclose downloads folderopen kunaal updated resume fileopen ai internal 2 pdf fileopen gmail in chromeclose gmail in chromeopen notionopen chatgptopen codechefopen python filesearch for best pizza near me
- True AI Desktop Agent: Combines LLM, fuzzy search, and real-time web search for human-like understanding and action.
- Enterprise-Ready: Handles ambiguous, typo-prone, and context-rich queries with ease.
- Productivity Booster: No more hunting for files, apps, or websites—just say what you want.
- Modern Stack: Python, SpeechRecognition, PyAudio, OpenAI/Mistral LLM, Google Custom Search API, AppleScript (macOS).
- Extensible: Easily add new commands, system integrations, or support for more platforms.
-
Clone the Repository
git clone https://github.com/Yuvakunaal/AI-Voice-Desktop-Assistant.git cd AI-Voice-Desktop-Assistant -
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
- Create a .env file with your API keys:
OPENROUTER_API_KEY=your_openrouter_api_key_here GOOGLE_API_KEY=your_google_custom_search_api_key GOOGLE_CSE_ID=your_custom_search_engine_id
- Go to https://openrouter.ai/.
- Log in with your account.
- Go to Settings.
- Navigate to Keys.
- Click on Create Key, give it a name.
- Copy the generated key.
- Paste it as the value of the
OPENROUTER_API_KEYvariable in your project (.env).
- Create a .env file with your API keys:
-
Set up Google Custom Search API:
- Instructions here
- Add your API key and CSE ID in
commands/website_opener.py.
-
Run the assistant:
python3 main.py
-
Speak your command!
- Python 3.8+
- SpeechRecognition, PyAudio
- Mistral LLM (via OpenRouter)
- Google Custom Search API
- AppleScript (macOS automation)
- Fuzzy matching, deep file indexing
This project leverages advanced AI capabilities including:
- Natural Language Understanding: Mistral LLM processes voice commands with human-like comprehension.
- Intent Extraction: AI determines whether you want to open/close apps, files, folders, or websites.
- Smart Matching: Fuzzy search algorithms find the right files even with imperfect queries.
- Context Awareness: Understands relationships between words and resolves ambiguities.
- Voice Input: Speak your command naturally.
- Speech Recognition: Converts speech to text using Google's speech recognition.
- AI Intent Parsing: Mistral LLM analyzes your command and extracts intent + target.
- Smart Search: Finds the best matching app, file, folder, or website.
- Execution: Opens/closes the requested item using system automation.
- Web Integration: Uses Google Custom Search to find correct websites for ambiguous queries.
- Real-time Web Resolution: Finds correct websites even for obscure queries.
- Antonym Handling: Intelligently avoids conflicting file matches (e.g., "internal" vs "external").
- LLM Tie-Breaking: Uses AI to choose between multiple potential matches.
- System App Integration: Special handling for macOS system applications.
Kunaal – GenAI, AI, Python Enthusiast
- OpenRouter for providing access to Mistral LLM.
- Google for speech recognition and custom search APIs.
- The open-source community for the libraries and tools used.
⭐ If you like this project, please support by starring the repository!