Skip to content

Control your computer with natural voice commands. Open or close apps, files, folders, and websites using advanced AI intent parsing and real-time web search. Combines speech recognition, LLM intelligence, and deep file indexing for seamless, human-like interaction. Speak it, and watch it happen.

Notifications You must be signed in to change notification settings

Yuvakunaal/AI-Voice-Desktop-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Voice-Controlled Desktop Assistant

AI-Powered OpenRouter-Mistral Voice-Assisted macOS

A modern, cross-platform, AI-powered voice assistant for your desktop. Instantly open or close apps, files, folders, and websites using natural language voice commands. Combines local fuzzy search, LLM intent extraction, and real-time web search for a truly human, productive experience.


🚀 Key Features

  • Voice-Activated: Control your desktop with your voice using advanced speech recognition.
  • Open & Close Anything: Instantly open or close apps, files, folders, and Chrome tabs with natural language ("open terminal", "close downloads folder", "open kunaal updated resume file").
  • AI-Powered Intent Extraction: Uses LLM (Mistral) for robust, context-aware command understanding.
  • Fuzzy File Search: Finds files even with typos, word order changes, or partial names ("open ai internal 2 pdf file" → AI Internal - 2.pdf).
  • Smart App & System App Handling: Recognizes and opens/closes system apps (Terminal, Finder, Settings, Activity Monitor, etc.) by name.
  • Bulletproof Website Handling: Opens the correct website in Chrome using Google Custom Search API ("open cbit in chrome" → https://www.cbit.ac.in).
  • Deep, Fast Indexing: Recursively indexes all user-facing folders (Downloads, Desktop, Documents, /Applications, etc.) for instant lookups.
  • Natural Language: No rigid syntax—just speak as you would to a human assistant.
  • Cross-Platform: macOS (full), Windows/Linux (open/close app logic can be extended).

🧠 Example Commands

  • open terminal
  • close terminal
  • open downloads folder
  • close downloads folder
  • open kunaal updated resume file
  • open ai internal 2 pdf file
  • open gmail in chrome
  • close gmail in chrome
  • open notion
  • open chatgpt
  • open codechef
  • open python file
  • search for best pizza near me

🏆 Why This Project Stands Out

  • True AI Desktop Agent: Combines LLM, fuzzy search, and real-time web search for human-like understanding and action.
  • Enterprise-Ready: Handles ambiguous, typo-prone, and context-rich queries with ease.
  • Productivity Booster: No more hunting for files, apps, or websites—just say what you want.
  • Modern Stack: Python, SpeechRecognition, PyAudio, OpenAI/Mistral LLM, Google Custom Search API, AppleScript (macOS).
  • Extensible: Easily add new commands, system integrations, or support for more platforms.

⚡️ Quick Start

  1. Clone the Repository

    git clone https://github.com/Yuvakunaal/AI-Voice-Desktop-Assistant.git
    cd AI-Voice-Desktop-Assistant
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up environment variables:

    • Create a .env file with your API keys:
      OPENROUTER_API_KEY=your_openrouter_api_key_here
      GOOGLE_API_KEY=your_google_custom_search_api_key
      GOOGLE_CSE_ID=your_custom_search_engine_id

    🔑 How to Get OpenRouter API Key

    1. Go to https://openrouter.ai/.
    2. Log in with your account.
    3. Go to Settings.
    4. Navigate to Keys.
    5. Click on Create Key, give it a name.
    6. Copy the generated key.
    7. Paste it as the value of the OPENROUTER_API_KEY variable in your project (.env).
  4. Set up Google Custom Search API:

  5. Run the assistant:

    python3 main.py
  6. Speak your command!


🛠️ Tech Stack

  • Python 3.8+
  • SpeechRecognition, PyAudio
  • Mistral LLM (via OpenRouter)
  • Google Custom Search API
  • AppleScript (macOS automation)
  • Fuzzy matching, deep file indexing

🤖 AI-Powered Features

This project leverages advanced AI capabilities including:

  • Natural Language Understanding: Mistral LLM processes voice commands with human-like comprehension.
  • Intent Extraction: AI determines whether you want to open/close apps, files, folders, or websites.
  • Smart Matching: Fuzzy search algorithms find the right files even with imperfect queries.
  • Context Awareness: Understands relationships between words and resolves ambiguities.

🎯 How It Works

  1. Voice Input: Speak your command naturally.
  2. Speech Recognition: Converts speech to text using Google's speech recognition.
  3. AI Intent Parsing: Mistral LLM analyzes your command and extracts intent + target.
  4. Smart Search: Finds the best matching app, file, folder, or website.
  5. Execution: Opens/closes the requested item using system automation.
  6. Web Integration: Uses Google Custom Search to find correct websites for ambiguous queries.

🌟 Advanced Features

  • Real-time Web Resolution: Finds correct websites even for obscure queries.
  • Antonym Handling: Intelligently avoids conflicting file matches (e.g., "internal" vs "external").
  • LLM Tie-Breaking: Uses AI to choose between multiple potential matches.
  • System App Integration: Special handling for macOS system applications.

👨‍💻 Developer

Kunaal – GenAI, AI, Python Enthusiast

🙏 Acknowledgments

  • OpenRouter for providing access to Mistral LLM.
  • Google for speech recognition and custom search APIs.
  • The open-source community for the libraries and tools used.

⭐ If you like this project, please support by starring the repository!

About

Control your computer with natural voice commands. Open or close apps, files, folders, and websites using advanced AI intent parsing and real-time web search. Combines speech recognition, LLM intelligence, and deep file indexing for seamless, human-like interaction. Speak it, and watch it happen.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages