Get up and running with Velloris in 5 minutes! This guide will help you install and test all three modes.
Before you begin, make sure you have:
- Python 3.12+ installed (
python3 --version) - Git installed
- 16GB+ RAM recommended
- GPU (optional but recommended):
- NVIDIA GPU with 16GB+ VRAM for realtime mode
- 6GB+ VRAM for dubbing/creative modes
- macOS M1/M2/M3/M4 works with MPS
- Ollama installed (only for creative mode) - Download here
# Clone repository
git clone https://github.com/randsley/Velloris.git
cd Velloris
# Run installation script
chmod +x install_macos.sh
./install_macos.sh
# Activate environment
source venv_py312/bin/activate# Clone repository
git clone https://github.com/randsley/Velloris.git
cd Velloris
# Run installation script
install_windows.bat
# Activate environment
venv_py312\Scripts\activate# Clone repository
git clone https://github.com/randsley/Velloris.git
cd Velloris
# Install dependencies (similar to Windows)
python3 -m venv venv_py312
source venv_py312/bin/activate
pip install -r requirements.txtpython3 main.py --show-configExpected Output:
=== Velloris Configuration ===
Platform:
OS: Darwin (arm64) # or Windows/Linux
Python: 3.12.x
CUDA Available: True/False
MPS Available: True/False
Audio:
Input SR: 16000 Hz
Output SR: 24000 Hz
Buffer: 2.0s
Models:
Device: cuda/mps/cpu
Dtype: bfloat16/float32
Application:
Default Mode: realtime
Available Modes: realtime, dubbing, creative
Velloris has three modes. Pick the one that fits your use case:
| Mode | Best For | Status | Latency Target |
|---|---|---|---|
| Realtime | Conversations, interactive chat | π§ Infrastructure Ready | 70-170ms (CUDA) |
| Dubbing | Narration, audiobooks, videos | β Production Ready | N/A |
| Creative | Storytelling, emotional content | β Production Ready | 1-3s |
Status: π§ Infrastructure Ready (99 tests passing) | Target: Windows/Linux CUDA
β What's Working:
- Complete audio I/O pipeline (microphone/speaker)
- Voice Activity Detection (Silero VAD)
- Background transcription (MLX-Whisper)
- Interruption/barge-in capability
- 99 comprehensive tests validating infrastructure
- Windows/Linux (CUDA): Requires PersonaPlex-7B installation
- macOS (Apple Silicon): Requires MacEcho integration (future)
When PersonaPlex-7B is installed on Windows/Linux with NVIDIA GPU:
Target Features:
- β‘ Ultra-low latency (70-170ms)
- β Full-duplex (natural interruptions)
- π 16 voice options
- β No Ollama needed
- π― Real-time speech-to-speech
# Test the complete audio infrastructure
python3 main.py --mode realtime --persona "You are a helpful assistant" --voice NATF2What happens (current):
- β Audio I/O system initializes
- β You speak into the microphone (captured and transcribed)
β οΈ S2S engine returns stub response (infrastructure validated)- β Interruption handling works correctly
- Press Ctrl+C to exit
For production voice synthesis, use Creative or Dubbing modes below.
Requirements:
- NVIDIA GPU (Ampere+: RTX 3000/4000, A100, H100)
- 16GB+ VRAM
- Windows or Linux
- CUDA 12.1+
Installation (PersonaPlex-7B):
# 1. Install system dependencies
# Ubuntu/Debian:
sudo apt install libopus-dev
# 2. Clone PersonaPlex repository
git clone https://github.com/NVIDIA/personaplex
# 3. Install PersonaPlex
pip install personaplex/moshi/.
# 4. Login to Hugging Face (accept model license)
huggingface-cli login
# 5. Run realtime mode
python3 main.py --mode realtime --persona "helpful assistant" --voice NATF2Target Voices (when PersonaPlex installed):
- Female:
NATF0,NATF1,NATF2,NATF3,VARF0-4 - Male:
NATM0,NATM1,NATM2,NATM3,VARM0-4
macOS Users: Realtime infrastructure is validated. MacEcho integration planned for future release. Use Creative or Dubbing modes for production.
Status: β Production Ready (User-verified quality)
Best for: Content creation, video narration, audiobooks, podcasts
Features:
- π¨ Professional quality audio (MLX-Audio TTS)
- π 10 languages supported
- π Voice cloning capability
- β No Ollama needed
- β Works on all platforms (CUDA, MPS, CPU)
python3 main.py --mode dubbing --script "Once upon a time in a digital landscape, AI models lived in harmony."What happens:
- Velloris loads Qwen3-TTS (one-time, ~15 seconds)
- Generates high-quality narration
- Plays audio through speakers
- Audio saved to
output.wav(if configured)
python3 main.py --mode dubbing --script "Your narration here" --voice-ref voices/my_voice.wavTip: Provide a 3-5 second reference audio for best results
Status: β Production Ready (User-verified: "perfect audio")
Best for: Storytelling, creative writing, emotional content
Features:
- π§ LLM reasoning (Ollama)
- π Emotion control (natural language prompts)
- π Multilingual support
- π¨ High-quality MLX-Audio TTS
- β Requires Ollama running
- β Works on all platforms (CUDA, MPS, CPU)
# Terminal 1: Start Ollama
ollama serve
# Terminal 2: Download model (first time only)
ollama pull llama3# Terminal 2 (with Ollama running in Terminal 1):
python3 main.py --mode creative --script "Tell me a short story about a space explorer" --emotion "Speak with excitement"What happens:
- Velloris connects to Ollama
- LLM generates creative response
- Qwen3-TTS synthesizes with emotion
- Plays audio through speakers
Explanation: Realtime mode infrastructure is complete (99 tests passing), but S2S engines require installation:
- Windows/Linux: Install PersonaPlex-7B (see setup instructions above)
- macOS: MacEcho integration pending (use Creative or Dubbing modes)
Current Options:
# β
Production: Creative mode (LLM + TTS)
python3 main.py --mode creative --script "Tell me a story" --emotion "excited"
# β
Production: Dubbing mode (high-quality TTS)
python3 main.py --mode dubbing --script "Test narration"Solution: Ollama is only required for creative mode. Make sure it's running:
# Terminal 1: Start Ollama
ollama serve
# Terminal 2: Pull model and run
ollama pull llama3
python3 main.py --mode creative --script "Test"Solution:
- Check system volume
- Verify speaker/headphone connection
- Try CPU mode:
python3 main.py --mode dubbing --script "Test" --device cpu - Verify with:
python3 main.py --show-config
Solution:
- Close other GPU applications
- Use smaller LLM:
--llm-model llama3:8b(creative mode) - Try CPU mode:
--device cpu - Dubbing mode uses less VRAM than creative mode
- README.md - Full documentation and usage guide
- ARCHITECTURE.md - Technical architecture and design
- MIGRATION.md - Upgrading from v1.x
- FAQ.md - Common questions answered
- EXAMPLES.md - More code examples
-
Edit Configuration:
cp .env.example .env nano .env # Edit default settings -
Change Default Mode:
# In .env file DEFAULT_MODE=realtime # or dubbing, creative
-
Customize Voice:
# In .env file REALTIME_VOICE=NATM1 # Male voice REALTIME_PERSONA="You are a friendly tutor"
# All tests (99 total: 98 passing, 1 skipped)
pytest tests/ -v
# Integration tests only (17 tests)
pytest tests/test_pipeline.py -v
# Realtime infrastructure tests (40 tests)
pytest tests/test_realtime_callbacks.py tests/test_vad_interruption.py tests/test_realtime_e2e.py -v# Test all modes
python3 main.py --mode realtime --device cpu
python3 main.py --mode dubbing --script "Welcome to Velloris" --device cpu
python3 main.py --mode creative --script "Hello" --device cpu # Requires Ollama- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Full Docs
# Show configuration
python3 main.py --show-config
# Realtime mode (fastest, no Ollama)
python3 main.py --mode realtime --persona "helpful assistant" --voice NATF2
# Dubbing mode (high quality, no Ollama)
python3 main.py --mode dubbing --script "Your text here"
# Creative mode (LLM reasoning, needs Ollama)
ollama serve # Terminal 1
python3 main.py --mode creative --script "Tell a story" --emotion "excited" # Terminal 2
# Device selection
python3 main.py --device auto # Auto-detect (default)
python3 main.py --device cuda # NVIDIA GPU
python3 main.py --device mps # Apple Metal (M1/M2/M3/M4)
python3 main.py --device cpu # CPU fallbackπ Congratulations! You're ready to use Velloris!
For more advanced usage, see the full documentation.