This repository contains an offline voice assistant implemented in Python. It uses Vosk for local speech-to-text, pyttsx3 for TTS, and optionally the Ollama CLI as a local LLM decider. The project provides a CLI and a lightweight Tkinter GUI.
Features
- Real-time local STT using Vosk (optional - falls back to typed input when missing)
- Local LLM decider integration via the
ollamaCLI (optional) - TTS via
pyttsx3(optional) - Safe execution mode (dry-run by default) and an option to allow direct execution of actions
- Clickable GUI launcher included:
run_voice_assistant_ui.bat
Quick start (recommended)
- Create a virtual environment and activate it (Windows PowerShell):
python -m venv .venv
.\.venv\Scripts\Activate.ps1- Install Python dependencies:
pip install -r requirements.txt-
Make sure the Vosk model folder
vosk-model-en-in-0.5/is present in the project root. If not, download and place it in that folder. -
Run the GUI:
python -m voice_assistant.gui
# or double-click run_voice_assistant_ui.batNotes about Ollama
- Ollama is used via its CLI binary (
ollama). It's not a pip package. Install Ollama and ensureollamais available on PATH or configureollama_pathinconfig.yaml.
Configuration
- Edit
config.yamlto change defaults (wake word, allow_execution, use_ollama, etc.). Command-line flags are available viapython -m voice_assistant.
Development
- The package is structured as a Python module
voice_assistant/. Runpython -m voice_assistantfor CLI usage orpython -m voice_assistant.guifor the UI.
Troubleshooting
- If the GUI shows "Simulated input - type or speak", Vosk or sounddevice may not be installed in the Python used to launch the GUI. Ensure you run the GUI using the same Python that has deps installed (the included
run_voice_assistant_ui.battries local virtualenvs first). - To see which Python the GUI uses, run it from a terminal and compare
python -c "import sys; print(sys.executable)"output.
License
- This project is provided as-is. Add your preferred license if you plan to distribute. Offline Voice Assistant - skeleton
This repository contains a minimal, dependency-free skeleton for an offline voice assistant. It's intended as a starting point to integrate:
- Vosk (offline speech recognition)
- Ollama + local LLM (intent parsing)
- pyttsx3 (offline TTS)
What I created:
voice_assistant/main.py- runner withrun_interactive()andrun_test()(one-shot smoke test)voice_assistant/stt.py- simulated STT (reads stdin)voice_assistant/nlp_model.py- tiny rule-based intent parservoice_assistant/executor.py- safe executor (dry-run by default)voice_assistant/config.yaml- wake word and optionsvoice_assistant/memory.json- persistent memoryvoice_assistant/requirements.txt- recommended optional deps
Quick smoke test (one-shot, no external deps):
From the project root (parent of voice_assistant) run:
python -c "from voice_assistant.main import run_test; run_test()"Or run the package directly (recommended) from the project root:
python -m voice_assistant --testThis will simulate: "hey gng open github" and print the parsed intent and dry-run result.
Next steps to integrate full features:
- Install Vosk and replace
stt.listen()with a real audio capture + recognizer pipeline - Integrate Ollama or another local LLM in
nlp_model.parse_intent()for better intent/entity extraction - Integrate Ollama or another local LLM in
nlp_model.parse_intent()for better intent/entity extraction (now implemented; enable withuse_ollama: trueinconfig.yamland install Ollama). - Integrate Ollama or another local LLM in
nlp_model.parse_intent()for better intent/entity extraction (now implemented; enable withuse_ollama: trueinconfig.yamland install Ollama). - LLM-driven decision mode: the LLM can also act as a decider that maps a user's utterance directly to an actionable JSON plan (open a URL, launch an app, tell the time). Enable with
use_ollama_decider: trueinconfig.yamlor pass--use-ollama --ollama-path <path> --debug-llmto the CLI. When enabled the LLM returns a JSONactionobject which the assistant validates and executes (or dry-runs ifallow_execution: false). - LLM-driven decision mode: the LLM can also act as a decider that maps a user's utterance directly to an actionable JSON plan (open a URL, launch an app, tell the time). Enable with
use_ollama_decider: trueinconfig.yamlor pass--use-ollama --ollama-path <path> --debug-llmto the CLI. When enabled the LLM returns a JSONactionobject which the assistant validates and executes (or dry-runs ifallow_execution: false).
Wake-word / always-on listening
- By default this skeleton used a wake word (e.g., "hey gng"). If you find the wake word annoying you can disable it and let the assistant process all input immediately by setting
always_listen: trueinconfig.yamlor by passing--always-listento the CLI. Use this with care - enablingalways_listenmeans every captured phrase will be interpreted as a command. - Enable
allow_execution: truecarefully inconfig.yamlwhen you're ready to allow system commands - Add TTS with
pyttsx3inexecutor.execute()or a separatetts.py - Add TTS with
pyttsx3inexecutor.execute()or a separatetts.py(now included). To enable voice output setallow_tts: trueinconfig.yamland installpyttsx3.