A minimal, real-time voice assistant powered by WebRTC streaming, featuring ultra-low latency audio processing with professional-grade components.
- 🌊 WebRTC Streaming: Ultra-low latency audio streaming via FastRTC
- 🎯 Advanced VAD: Silero voice activity detection with configurable parameters
- 🎤 Speech Recognition: Whisper ASR for accurate speech-to-text transcription
- 🔊 Neural TTS: High-quality Kokoro text-to-speech with natural-sounding voices
- 🤖 AI Integration: Support for both Ollama (local) and OpenRouter GPT-5 Nano
- 💬 Context-Aware: Maintains conversation history for coherent responses
┌─────────────┐ WebRTC ┌──────────────────┐
│ Browser │◄──────────────►│ FastRTC │
│ (Client) │ Ultra-low │ Stream │
└─────────────┘ latency └──────────────────┘
│
▼
┌──────────────────┐
│ Silero VAD │
│ (Voice Activity)│
└──────────────────┘
│
▼
┌──────────────────┐
│ Whisper ASR │
│ (Speech-to-Text)│
└──────────────────┘
│
▼
┌──────────────────┐
│ LLM Processing │
│ (Ollama/OpenAI) │
└──────────────────┘
│
▼
┌──────────────────┐
│ Kokoro TTS │
│ (Text-to-Speech)│
└──────────────────┘
│
▼
┌──────────────────┐
│ Audio Stream │
│ (Back to client)│
└──────────────────┘
- Python: 3.12 or higher
- uv: Modern Python package manager
- System Dependencies:
- Build tools (gcc/g++)
- Python development headers
- espeak-ng for phonemization
- AI Backend: Either:
- Ollama with
gemma3:4bmodel (local), OR - OpenRouter API key for GPT-5 Nano (cloud)
- Ollama with
git clone https://github.com/voidstarr/minimal-voice-assistant.git
cd minimal-voice-assistantThe setup script will automatically:
- Install
uvif not present - Install system dependencies (gcc, python3-dev, espeak)
- Create a virtual environment
- Install all Python dependencies
- Generate SSL certificates for HTTPS
chmod +x setup.sh
./setup.shChoose ONE of the following options:
- Install Ollama from ollama.ai
- Pull the required model:
ollama pull gemma3:4b
- Create a
.envfile:echo "OPENROUTER_API_KEY=your_api_key_here" > .env
- Get your API key from OpenRouter
chmod +x run.sh
./run.shThe assistant will start on https://localhost:7860
-
Open the Interface: Navigate to
https://localhost:7860in your browser- Accept the self-signed certificate warning (this is expected for local development)
-
Grant Microphone Access: Allow browser microphone permissions when prompted
-
Start Speaking: The assistant will automatically detect when you start and stop speaking using advanced VAD
-
Receive Responses: The AI will process your speech and respond with natural voice
Edit voice_assistant.py to adjust VAD parameters:
self.vad_options = SileroVadOptions(
threshold=0.5, # Speech detection sensitivity
min_speech_duration_ms=250, # Minimum speech length
max_speech_duration_s=30.0, # Maximum speech length
min_silence_duration_ms=500, # Silence before processing
window_size_samples=1024, # VAD processing window
speech_pad_ms=200 # Padding around speech
)Change the voice in the generate_tts method:
samples, sample_rate = self.kokoro.create(
text, voice="af_heart", speed=1.0, lang="en-us"
)Available voices depend on your Kokoro model configuration.
minimal-voice-assistant/
├── voice_assistant.py # Main application
├── pyproject.toml # Project dependencies
├── setup.sh # Setup script
├── run.sh # Run script
├── models/ # TTS models
│ └── kokoro-v1.0.onnx # Kokoro TTS model
├── ssl_certs/ # SSL certificates
│ ├── cert.pem
│ └── key.pem
└── README.md # This file
Issue: Browser shows security warning
Solution: This is expected for self-signed certificates. Click "Advanced" and "Proceed" to continue. For production, use proper SSL certificates.
Issue: No audio detected
Solution:
- Ensure HTTPS is enabled (required for WebRTC)
- Grant microphone permissions in browser
- Check browser console for errors
- Verify microphone works in other applications
Issue: "Ollama connection failed" error
Solution:
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Start Ollama if needed
ollama serve
# Pull the model
ollama pull gemma3:4bIssue: Phonemizer or espeak errors
Solution: Reinstall system dependencies:
# Ubuntu/Debian
sudo apt-get install espeak espeak-data libespeak-dev
# Fedora/RHEL
sudo dnf install espeak espeak-develIssue: TTS or ASR models fail to load
Solution:
- Ensure
models/kokoro-v1.0.onnxis present - Check sufficient RAM (4GB+ recommended)
- Verify Python 3.12+ is installed
If you prefer not to use the setup script:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment
uv venv
# Activate environment
source .venv/bin/activate # Linux/Mac
# or
.venv\Scripts\activate # Windows
# Install dependencies
uv pip install -e .For development without HTTPS (note: WebRTC may not work):
# In voice_assistant.py, modify the launch call:
interface.launch(
server_name="0.0.0.0",
server_port=7860,
share=False
)- Latency: ~200-500ms end-to-end (depends on LLM)
- CPU Usage: Moderate (optimized for CPU-only operation)
- RAM Usage: ~2-4GB (with models loaded)
- Network: Minimal (except for OpenRouter API calls)
Contributions are welcome! Please feel free to submit issues or pull requests.