🎙️ Real-time Voice Assistant

A minimal, real-time voice assistant powered by WebRTC streaming, featuring ultra-low latency audio processing with professional-grade components.

✨ Features

🌊 WebRTC Streaming: Ultra-low latency audio streaming via FastRTC
🎯 Advanced VAD: Silero voice activity detection with configurable parameters
🎤 Speech Recognition: Whisper ASR for accurate speech-to-text transcription
🔊 Neural TTS: High-quality Kokoro text-to-speech with natural-sounding voices
🤖 AI Integration: Support for both Ollama (local) and OpenRouter GPT-5 Nano
💬 Context-Aware: Maintains conversation history for coherent responses

🏗️ Architecture

┌─────────────┐    WebRTC     ┌──────────────────┐
│   Browser   │◄──────────────►│   FastRTC        │
│  (Client)   │   Ultra-low    │   Stream         │
└─────────────┘   latency      └──────────────────┘
                                        │
                                        ▼
                                ┌──────────────────┐
                                │  Silero VAD      │
                                │  (Voice Activity)│
                                └──────────────────┘
                                        │
                                        ▼
                                ┌──────────────────┐
                                │  Whisper ASR     │
                                │  (Speech-to-Text)│
                                └──────────────────┘
                                        │
                                        ▼
                                ┌──────────────────┐
                                │  LLM Processing  │
                                │  (Ollama/OpenAI) │
                                └──────────────────┘
                                        │
                                        ▼
                                ┌──────────────────┐
                                │  Kokoro TTS      │
                                │  (Text-to-Speech)│
                                └──────────────────┘
                                        │
                                        ▼
                                ┌──────────────────┐
                                │  Audio Stream    │
                                │  (Back to client)│
                                └──────────────────┘

📋 Requirements

Python: 3.12 or higher
uv: Modern Python package manager
System Dependencies:
- Build tools (gcc/g++)
- Python development headers
- espeak-ng for phonemization
AI Backend: Either:
- Ollama with gemma3:4b model (local), OR
- OpenRouter API key for GPT-5 Nano (cloud)

🚀 Quick Setup

1. Clone the Repository

git clone https://github.com/voidstarr/minimal-voice-assistant.git
cd minimal-voice-assistant

2. Run Setup Script

The setup script will automatically:

Install uv if not present
Install system dependencies (gcc, python3-dev, espeak)
Create a virtual environment
Install all Python dependencies
Generate SSL certificates for HTTPS

chmod +x setup.sh
./setup.sh

3. Configure LLM Backend

Choose ONE of the following options:

Option A: Local Ollama (Recommended for Privacy)

Install Ollama from ollama.ai
Pull the required model:
```
ollama pull gemma3:4b
```

Option B: OpenRouter Cloud API

Create a .env file:

echo "OPENROUTER_API_KEY=your_api_key_here" > .env

Get your API key from OpenRouter

4. Run the Assistant

chmod +x run.sh
./run.sh

The assistant will start on https://localhost:7860

🎯 Usage

Open the Interface: Navigate to https://localhost:7860 in your browser
- Accept the self-signed certificate warning (this is expected for local development)
Grant Microphone Access: Allow browser microphone permissions when prompted
Start Speaking: The assistant will automatically detect when you start and stop speaking using advanced VAD
Receive Responses: The AI will process your speech and respond with natural voice

⚙️ Configuration

Voice Activity Detection (VAD)

Edit voice_assistant.py to adjust VAD parameters:

self.vad_options = SileroVadOptions(
    threshold=0.5,                    # Speech detection sensitivity
    min_speech_duration_ms=250,       # Minimum speech length
    max_speech_duration_s=30.0,       # Maximum speech length
    min_silence_duration_ms=500,      # Silence before processing
    window_size_samples=1024,         # VAD processing window
    speech_pad_ms=200                 # Padding around speech
)

TTS Voice

Change the voice in the generate_tts method:

samples, sample_rate = self.kokoro.create(
    text, voice="af_heart", speed=1.0, lang="en-us"
)

Available voices depend on your Kokoro model configuration.

📁 Project Structure

minimal-voice-assistant/
├── voice_assistant.py      # Main application
├── pyproject.toml          # Project dependencies
├── setup.sh                # Setup script
├── run.sh                  # Run script
├── models/                 # TTS models
│   └── kokoro-v1.0.onnx   # Kokoro TTS model
├── ssl_certs/              # SSL certificates
│   ├── cert.pem
│   └── key.pem
└── README.md               # This file

🔧 Troubleshooting

SSL Certificate Warnings

Issue: Browser shows security warning

Solution: This is expected for self-signed certificates. Click "Advanced" and "Proceed" to continue. For production, use proper SSL certificates.

Microphone Not Working

Issue: No audio detected

Solution:

Ensure HTTPS is enabled (required for WebRTC)
Grant microphone permissions in browser
Check browser console for errors
Verify microphone works in other applications

Ollama Connection Failed

Issue: "Ollama connection failed" error

Solution:

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama if needed
ollama serve

# Pull the model
ollama pull gemma3:4b

espeak Not Found

Issue: Phonemizer or espeak errors

Solution: Reinstall system dependencies:

# Ubuntu/Debian
sudo apt-get install espeak espeak-data libespeak-dev

# Fedora/RHEL
sudo dnf install espeak espeak-devel

Model Loading Errors

Issue: TTS or ASR models fail to load

Solution:

Ensure models/kokoro-v1.0.onnx is present
Check sufficient RAM (4GB+ recommended)
Verify Python 3.12+ is installed

🛠️ Development

Manual Installation

If you prefer not to use the setup script:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment
uv venv

# Activate environment
source .venv/bin/activate  # Linux/Mac
# or
.venv\Scripts\activate     # Windows

# Install dependencies
uv pip install -e .

Running Without SSL

For development without HTTPS (note: WebRTC may not work):

# In voice_assistant.py, modify the launch call:
interface.launch(
    server_name="0.0.0.0",
    server_port=7860,
    share=False
)

📊 Performance

Latency: ~200-500ms end-to-end (depends on LLM)
CPU Usage: Moderate (optimized for CPU-only operation)
RAM Usage: ~2-4GB (with models loaded)
Network: Minimal (except for OpenRouter API calls)

🤝 Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

🙏 Acknowledgments

FastRTC - Ultra-low latency WebRTC streaming
Whisper - Speech recognition
Kokoro - High-quality TTS
Silero VAD - Voice activity detection
Ollama - Local LLM runtime
Gradio - Web interface framework

🔗 Links

Repository: https://github.com/voidstarr/minimal-voice-assistant
Issues: https://github.com/voidstarr/minimal-voice-assistant/issues

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
ssl_certs		ssl_certs
.env.sample		.env.sample
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
run.sh		run.sh
setup.sh		setup.sh
voice_assistant.py		voice_assistant.py

Folders and files

Latest commit

History

Repository files navigation

🎙️ Real-time Voice Assistant

✨ Features

🏗️ Architecture

📋 Requirements

🚀 Quick Setup

1. Clone the Repository

2. Run Setup Script

3. Configure LLM Backend

Option A: Local Ollama (Recommended for Privacy)

Option B: OpenRouter Cloud API

4. Run the Assistant

🎯 Usage

⚙️ Configuration

Voice Activity Detection (VAD)

TTS Voice

📁 Project Structure

🔧 Troubleshooting

SSL Certificate Warnings

Microphone Not Working

Ollama Connection Failed

espeak Not Found

Model Loading Errors

🛠️ Development

Manual Installation

Running Without SSL

📊 Performance

🤝 Contributing

🙏 Acknowledgments

🔗 Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages