AI Voice-Controlled Desktop Assistant

A modern, cross-platform, AI-powered voice assistant for your desktop. Instantly open or close apps, files, folders, and websites using natural language voice commands. Combines local fuzzy search, LLM intent extraction, and real-time web search for a truly human, productive experience.

🚀 Key Features

Voice-Activated: Control your desktop with your voice using advanced speech recognition.
Open & Close Anything: Instantly open or close apps, files, folders, and Chrome tabs with natural language ("open terminal", "close downloads folder", "open kunaal updated resume file").
AI-Powered Intent Extraction: Uses LLM (Mistral) for robust, context-aware command understanding.
Fuzzy File Search: Finds files even with typos, word order changes, or partial names ("open ai internal 2 pdf file" → AI Internal - 2.pdf).
Smart App & System App Handling: Recognizes and opens/closes system apps (Terminal, Finder, Settings, Activity Monitor, etc.) by name.
Bulletproof Website Handling: Opens the correct website in Chrome using Google Custom Search API ("open cbit in chrome" → https://www.cbit.ac.in).
Deep, Fast Indexing: Recursively indexes all user-facing folders (Downloads, Desktop, Documents, /Applications, etc.) for instant lookups.
Natural Language: No rigid syntax—just speak as you would to a human assistant.
Cross-Platform: macOS (full), Windows/Linux (open/close app logic can be extended).

🧠 Example Commands

open terminal
close terminal
open downloads folder
close downloads folder
open kunaal updated resume file
open ai internal 2 pdf file
open gmail in chrome
close gmail in chrome
open notion
open chatgpt
open codechef
open python file
search for best pizza near me

🏆 Why This Project Stands Out

True AI Desktop Agent: Combines LLM, fuzzy search, and real-time web search for human-like understanding and action.
Enterprise-Ready: Handles ambiguous, typo-prone, and context-rich queries with ease.
Productivity Booster: No more hunting for files, apps, or websites—just say what you want.
Modern Stack: Python, SpeechRecognition, PyAudio, OpenAI/Mistral LLM, Google Custom Search API, AppleScript (macOS).
Extensible: Easily add new commands, system integrations, or support for more platforms.

⚡️ Quick Start

Clone the Repository

git clone https://github.com/Yuvakunaal/AI-Voice-Desktop-Assistant.git
cd AI-Voice-Desktop-Assistant

Install dependencies:
```
pip install -r requirements.txt
```
Set up environment variables:
- Create a .env file with your API keys:
```
OPENROUTER_API_KEY=your_openrouter_api_key_here
GOOGLE_API_KEY=your_google_custom_search_api_key
GOOGLE_CSE_ID=your_custom_search_engine_id
```
🔑 How to Get OpenRouter API Key
1. Go to https://openrouter.ai/.
2. Log in with your account.
3. Go to Settings.
4. Navigate to Keys.
5. Click on Create Key, give it a name.
6. Copy the generated key.
7. Paste it as the value of the OPENROUTER_API_KEY variable in your project (.env).
Set up Google Custom Search API:
- Instructions here
- Add your API key and CSE ID in commands/website_opener.py.
Run the assistant:
```
python3 main.py
```
Speak your command!

🛠️ Tech Stack

Python 3.8+
SpeechRecognition, PyAudio
Mistral LLM (via OpenRouter)
Google Custom Search API
AppleScript (macOS automation)
Fuzzy matching, deep file indexing

🤖 AI-Powered Features

This project leverages advanced AI capabilities including:

Natural Language Understanding: Mistral LLM processes voice commands with human-like comprehension.
Intent Extraction: AI determines whether you want to open/close apps, files, folders, or websites.
Smart Matching: Fuzzy search algorithms find the right files even with imperfect queries.
Context Awareness: Understands relationships between words and resolves ambiguities.

🎯 How It Works

Voice Input: Speak your command naturally.
Speech Recognition: Converts speech to text using Google's speech recognition.
AI Intent Parsing: Mistral LLM analyzes your command and extracts intent + target.
Smart Search: Finds the best matching app, file, folder, or website.
Execution: Opens/closes the requested item using system automation.
Web Integration: Uses Google Custom Search to find correct websites for ambiguous queries.

🌟 Advanced Features

Real-time Web Resolution: Finds correct websites even for obscure queries.
Antonym Handling: Intelligently avoids conflicting file matches (e.g., "internal" vs "external").
LLM Tie-Breaking: Uses AI to choose between multiple potential matches.
System App Integration: Special handling for macOS system applications.

👨‍💻 Developer

Kunaal – GenAI, AI, Python Enthusiast

🙏 Acknowledgments

OpenRouter for providing access to Mistral LLM.
Google for speech recognition and custom search APIs.
The open-source community for the libraries and tools used.

⭐ If you like this project, please support by starring the repository!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
commands		commands
helpers		helpers
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Voice-Controlled Desktop Assistant

🚀 Key Features

🧠 Example Commands

🏆 Why This Project Stands Out

⚡️ Quick Start

🔑 How to Get OpenRouter API Key

🛠️ Tech Stack

🤖 AI-Powered Features

🎯 How It Works

🌟 Advanced Features

👨‍💻 Developer

🙏 Acknowledgments

About

Uh oh!

Contributors

Uh oh!

Languages

Yuvakunaal/AI-Voice-Desktop-Assistant

Folders and files

Latest commit

History

Repository files navigation

AI Voice-Controlled Desktop Assistant

🚀 Key Features

🧠 Example Commands

🏆 Why This Project Stands Out

⚡️ Quick Start

🔑 How to Get OpenRouter API Key

🛠️ Tech Stack

🤖 AI-Powered Features

🎯 How It Works

🌟 Advanced Features

👨‍💻 Developer

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages