🎙️ Velloris

The Local-First, High-Fidelity Voice Agent Engine

Velloris is a state-of-the-art framework for creating lifelike, interactive AI agents that run entirely on your local hardware. With three specialized modes, Velloris delivers the perfect voice AI solution for any use case—from ultra-low latency conversations to professional-quality content creation.

Key Features:

⚡ Three-Mode Architecture: ✅ PersonaPlex-7B realtime S2S (VERIFIED WORKING) + ✅ high-fidelity dubbing + ✅ creative synthesis
📚 Production-Ready: End-to-end speech-to-speech conversations, narration, and emotional synthesis
🎯 Real-Time Speech-to-Speech: PersonaPlex-7B S2S full pipeline working (100ms input → 80ms output on RTX 3080) with 18 voice variants
🌐 Cross-Platform: Windows (NVIDIA CUDA) + macOS (Apple Metal/MPS) + Linux (CPU)
🚀 Optimized: Automatic device detection, lazy loading, mode-based routing
🔒 Privacy First: 100% local processing, no cloud dependencies
🎭 10 Languages: Multilingual support via Qwen3-TTS
🧠 Ollama Optional: Required only for creative mode (LLM reasoning)

⚡ Quick Start

Prerequisites

Python 3.12+ (3.11+ supported)
For Real-Time Mode: NVIDIA GPU (16GB+ VRAM) + CUDA 12.1+ + Triton (triton-windows for Windows)
For Creative Mode: Ollama running (Download here)
For Dubbing Mode: GPU recommended (6GB+ VRAM) or CPU
macOS: Homebrew (for system dependencies)
Windows/Linux: NVIDIA GPU recommended for best performance
Note: PersonaPlex-7B S2S requires Triton for torch.compile() - automatically installed on supported platforms

1. Clone & Setup

git clone https://github.com/randsley/Velloris.git
cd Velloris

macOS:

chmod +x install_macos.sh
./install_macos.sh

Windows:

# In PowerShell or Command Prompt
install_windows.bat

Linux / WSL2:

# Install system dependencies
sudo apt-get install -y portaudio19-dev ffmpeg sox libasound2-plugins pulseaudio-utils

# Create virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install requirements
pip install -r requirements-dev.txt

WSL2 Note: Velloris auto-detects Ollama running on Windows and routes audio through PulseAudio/WSLg. Configure ALSA to use PulseAudio:

echo -e "pcm.default pulse\nctl.default pulse" > ~/.asoundrc

2. Test Installation

python3 main.py --show-config

3. Choose Your Mode

Real-Time Conversation (PersonaPlex-7B S2S)

✅ Status: VERIFIED WORKING - Full S2S inference pipeline on Windows/CUDA (100ms input → 80ms output on RTX 3080)

python3 main.py --mode realtime --persona "You are a helpful tutor" --voice natural_female_2

Features:

⚡ Sub-150ms latency on NVIDIA CUDA (verified on RTX 3080)
✅ 18 voice variants: Natural (4F/4M) + Varied (5F/5M)
✅ Full-duplex ready: Infrastructure for natural interruptions
✅ Persona control: Custom roles via text prompts
✅ No LLM needed: PersonaPlex-7B handles understanding + reasoning + speech generation
✅ 24kHz audio: High-quality voice I/O
✅ Windows support: Works with Triton-Windows for torch.compile() optimization

Available Voices:

Natural Female: natural_female_0, natural_female_1, natural_female_2, natural_female_3
Natural Male: natural_male_0, natural_male_1, natural_male_2, natural_male_3
Varied Female: varied_female_0 through varied_female_4
Varied Male: varied_male_0 through varied_male_4

High-Fidelity Dubbing (Qwen3-TTS)

Professional narration for content creation:

python3 main.py --mode dubbing --script "Your narration here"

🎨 Professional quality (24kHz)
🌍 10 languages supported
🎭 Voice cloning available
🎯 Best for: Audiobooks, podcasts, video narration

Creative Assistant (Ollama + Qwen3-TTS)

Emotional storytelling with LLM reasoning:

Start Ollama (if not running):

ollama serve
ollama pull llama3  # First time only

Run Velloris (interactive prompt):

python3 main.py --mode creative --emotion "Speak with excitement"

Type your prompts and Velloris responds with emotionally expressive speech.

WSL2: Ollama on Windows is auto-detected — no extra configuration needed.

🧠 LLM reasoning (Ollama)
🎭 Emotion control
🌍 Multilingual
🎯 Best for: Storytelling, creative content

📋 Project Structure

Velloris/
├── core/                    # Brain & Orchestration
│   ├── brain.py            # LLM integration + audio synthesis
│   └── orchestrator.py     # Engine routing & lazy loading
├── engines/                # Voice Models
│   ├── personaplex.py      # NVIDIA PersonaPlex-7B (S2S)
│   ├── qwen_tts.py         # Alibaba Qwen3-TTS (TTS)
│   └── mlx_tts.py          # MLX-Audio for Apple Silicon
├── utils/                  # Utilities
│   ├── audio_io.py         # Audio playback & recording
│   ├── audio_utils.py      # Resampling & normalization
│   ├── device_utils.py     # Device detection (CUDA/MPS/CPU)
│   └── vad_handler.py      # Voice Activity Detection
├── tests/                  # Test Suite (99 tests: 93 passing, 6 skipped)
│   ├── test_pipeline.py    # Integration tests (22 tests)
│   ├── test_critical_paths.py  # Critical path & platform tests (38 tests)
│   ├── test_realtime_callbacks.py  # Audio callback tests (15 tests)
│   ├── test_realtime_e2e.py  # End-to-end tests (14 tests)
│   └── test_vad_interruption.py  # VAD & interruption tests (10 tests)
├── config.py               # Configuration
├── main.py                 # CLI Application
├── requirements.txt        # Python Dependencies
├── ARCHITECTURE.md         # Detailed architecture guide
├── LICENSE                 # Apache License 2.0
└── README.md               # This file

🎯 Usage Guide

Mode Comparison

Feature	Real-Time	Dubbing	Creative
Status	✅ VERIFIED WORKING	✅ Production	✅ Production
Latency	80-150ms ⚡	N/A	1-3s
Full-Duplex	Infrastructure ready	❌ No	❌ No
Interruption	VAD ready	❌ No	❌ No
Languages	English + accents	10 languages	10 languages
Voice Options	18 variants	Unlimited	Unlimited
Persona Control	✅ Yes	❌ No	✅ Yes
Emotion Control	Built-in	✅ Yes	✅ Yes
Ollama Required	❌ No	❌ No	✅ Yes
GPU Required	✅ NVIDIA 16GB+	Optional	Optional
Implementation	PersonaPlex-7B ✅	Qwen3-TTS ✅	Ollama + Qwen3-TTS ✅
Best For	Conversations	Narration	Creative content

Real-Time Mode Examples

End-to-end speech-to-speech conversations with PersonaPlex-7B:

# Basic conversation
python3 main.py --mode realtime

# Custom persona
python3 main.py --mode realtime --persona "You are a friendly customer service representative"

# Different voice (natural female)
python3 main.py --mode realtime --voice natural_female_2 --persona "You are a helpful tutor"

# Different voice (varied male)
python3 main.py --mode realtime --voice varied_male_1 --persona "You are a tech expert"

# List available voices
python3 main.py --show-config | grep -A 20 "voice"

Available Voices (18 total):

Natural Female: natural_female_0, natural_female_1, natural_female_2, natural_female_3
Natural Male: natural_male_0, natural_male_1, natural_male_2, natural_male_3
Varied Female: varied_female_0 through varied_female_4
Varied Male: varied_male_0 through varied_male_4

Performance (Verified Feb 2026):

⚡ 80-150ms latency per 100ms audio chunk (RTX 3080) - 18x faster than cloud services
✅ Full-duplex ready (natural interruptions)
✅ Persona control via text prompts
✅ No LLM needed (PersonaPlex-7B handles everything)
✅ Cross-platform (Windows CUDA, macOS MPS, Linux CPU)

Dubbing Mode Examples

Professional narration with Qwen3-TTS:

# Simple narration
python3 main.py --mode dubbing --script "Hello world"

# With voice cloning (3-5 second sample)
python3 main.py --mode dubbing --script "Story text" --voice-ref my_voice.wav

# Specify device
python3 main.py --mode dubbing --script "Your script" --device cpu

Features:

🎨 Professional quality (24kHz output)
🌍 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian)
🎭 Voice cloning from 3-second samples
🎨 Voice design via natural language

Creative Mode Examples

Interactive emotional storytelling with Ollama + Qwen3-TTS:

# Start Ollama first (if not running)
ollama serve  # In separate terminal

# Basic creative mode (interactive prompt)
python3 main.py --mode creative

# With emotion control
python3 main.py --mode creative --emotion "Speak poetically"

# Different LLM model
python3 main.py --mode creative --llm-model mistral --emotion "Excited tone"

# With specific device
python3 main.py --mode creative --emotion "Speak with warmth" --device cuda

Type prompts like "Tell me a story about space" and get voiced responses.

Features:

🧠 LLM reasoning (Ollama: llama3, mistral, mixtral, etc.)
🎭 Emotion control via natural language instructions
🌍 Multilingual support
🎨 Creative flexibility

Device Options

Auto-detect optimal device:

python3 main.py --device auto

Explicit device selection:

python3 main.py --device cuda   # NVIDIA GPU
python3 main.py --device mps    # Apple Metal (M-series Mac)
python3 main.py --device cpu    # CPU (slowest)

Show Configuration

python3 main.py --show-config

Displays:

Platform info (OS, CPU, GPU)
Device detection results
Model configuration
Audio settings

🏗️ Architecture

Real-Time Mode Pipeline

User Speech (24kHz)
    ↓
PersonaPlex-7B (end-to-end S2S)
  • Listen & Understand
  • Reason & Respond
  • Generate Speech
    ↓
Agent Speech (24kHz) → Speaker 🔊

Latency: 70-170ms ⚡
Full-Duplex: ✅ Yes
Ollama: ❌ Not needed

Dubbing Mode Pipeline

Script Text
    ↓
Qwen3-TTS (High-Fidelity Synthesis)
  • 10 languages
  • Voice cloning
  • Emotion control
    ↓
Audio Output (24kHz) → Speaker 🔊

Quality: Professional
Ollama: ❌ Not needed

Creative Mode Pipeline

User Text
    ↓
Ollama LLM (Reasoning/Creativity)
    ↓
Response Text
    ↓
Qwen3-TTS (Emotional Synthesis)
    ↓
Audio Output (24kHz) → Speaker 🔊

Flexibility: High
Ollama: ✅ Required

See ARCHITECTURE.md for detailed technical documentation.

🖥️ Platform-Specific Notes

Windows (NVIDIA CUDA)

Optimal Performance: RTX 3000+ or newer
Installation: Run install_windows.bat
Device Selection: --device cuda (auto-selected)
Optimizations Available: FlashAttention 2, bitsandbytes 4-bit quantization

macOS (Apple Metal/MPS)

Supported: M1, M2, M3, M4 Pro/Max
Installation: Run ./install_macos.sh
Device Selection: --device mps (auto-selected)
Note: PersonaPlex runs slower on MPS; Qwen3-TTS works well
MLX-Audio: Native MLX backend for optimized TTS on Apple Silicon with RMS normalization, chunk validation, and model caching

Linux / WSL2 (CPU/CUDA)

CPU Mode: Works on any Linux
CUDA Mode: Requires NVIDIA GPU + CUDA 12.1+
System Dependencies: portaudio19-dev ffmpeg sox libasound2-plugins
WSL2: Audio routed through PulseAudio/WSLg to Windows speakers. Ollama on Windows is auto-detected via gateway IP

See ARCHITECTURE.md for performance comparisons.

🧪 Testing

Run the full test suite (99 tests):

# All tests
pytest tests/ -v

# By category
pytest tests/test_pipeline.py -v           # Integration tests (22)
pytest tests/test_critical_paths.py -v     # Critical path & platform tests (38)
pytest tests/test_realtime_callbacks.py -v # Audio callback tests (15)
pytest tests/test_realtime_e2e.py -v       # End-to-end realtime tests (14)
pytest tests/test_vad_interruption.py -v   # VAD & interruption tests (10)

# With coverage
pytest tests/ --cov=. -v

Note: Tests pass without models installed (stub mode). 93 pass, 6 skipped (platform-specific).

📚 Documentation

ARCHITECTURE.md - Full system architecture, platform support, performance metrics
LICENSE - Apache License 2.0

🔧 Troubleshooting

Audio Not Playing

Ensure system volume is up
Check speaker/headphone connection
Try: python3 main.py --mode dubbing --device cpu

Model Loading Fails

Ensure internet connection (for Hugging Face downloads)
Check disk space (~5GB for models)
Verify Python 3.12+: python3 --version

PersonaPlex Warning

This is informational if you're only using Dubbing Mode
Only needed for Real-Time Mode with live speech

Slow Inference

MPS/Metal: Expected to be slower than CUDA
CPU: Very slow; GPU recommended
Solution: Use CPU mode with smaller model or wait longer

WSL2 Audio Not Playing

Install PulseAudio and ALSA plugin: sudo apt-get install -y pulseaudio-utils libasound2-plugins
Configure ALSA default: echo -e "pcm.default pulse\nctl.default pulse" > ~/.asoundrc
Verify WSLg PulseAudio: pactl info

WSL2 Ollama Connection

Ollama on Windows must listen on all interfaces: set OLLAMA_HOST=0.0.0.0 before running ollama serve
Add Windows firewall rule: netsh advfirewall firewall add rule name="Ollama" dir=in action=allow protocol=TCP localport=11434
Velloris auto-detects the Windows host IP — no manual config needed

🚀 What's Next?

MLX-Audio Integration (macOS): Native MLX backend for Apple Silicon TTS
Web UI with Gradio
ONNX export for edge deployment
Mobile optimization (iOS/Android)
Multi-turn conversation memory
Custom voice fine-tuning
Real-time transcription display

📄 License

Apache License 2.0 - See LICENSE file

🤝 Contributing

Contributions welcome! Please open an issue or pull request on GitHub.

📞 Support

Issues: Check GitHub Issues
Documentation: See ARCHITECTURE.md
Questions: Open a Discussion on GitHub

Built with ❤️ for local-first AI

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
core		core
engines		engines
examples		examples
tests		tests
utils		utils
voices		voices
.env.example		.env.example
.gitignore		.gitignore
API.md		API.md
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DOCUMENTATION_AUDIT.md		DOCUMENTATION_AUDIT.md
EXAMPLES.md		EXAMPLES.md
EXAMPLES_PERSONAPLEX.md		EXAMPLES_PERSONAPLEX.md
FAQ.md		FAQ.md
LICENSE		LICENSE
PERFORMANCE.md		PERFORMANCE.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
REALTIME_INTEGRATION.md		REALTIME_INTEGRATION.md
ROADMAP.md		ROADMAP.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
config.py		config.py
debug_report.md		debug_report.md
download_models.py		download_models.py
install_macos.sh		install_macos.sh
install_windows.bat		install_windows.bat
main.py		main.py
requirements-dev.txt		requirements-dev.txt
requirements-mac.txt		requirements-mac.txt
requirements-macecho.txt		requirements-macecho.txt
requirements-win.txt		requirements-win.txt
requirements.txt		requirements.txt
setup.py		setup.py
test_personaplex_stub.py		test_personaplex_stub.py

License

randsley/Velloris

Folders and files

Latest commit

History

Repository files navigation

🎙️ Velloris

⚡ Quick Start

Prerequisites

1. Clone & Setup

2. Test Installation

3. Choose Your Mode

Real-Time Conversation (PersonaPlex-7B S2S)

High-Fidelity Dubbing (Qwen3-TTS)

Creative Assistant (Ollama + Qwen3-TTS)

📋 Project Structure

🎯 Usage Guide

Mode Comparison

Real-Time Mode Examples

Dubbing Mode Examples

Creative Mode Examples

Device Options

Show Configuration

🏗️ Architecture

Real-Time Mode Pipeline

Dubbing Mode Pipeline

Creative Mode Pipeline

🖥️ Platform-Specific Notes

Windows (NVIDIA CUDA)

macOS (Apple Metal/MPS)

Linux / WSL2 (CPU/CUDA)

🧪 Testing

📚 Documentation

🔧 Troubleshooting

Audio Not Playing

Model Loading Fails

PersonaPlex Warning

Slow Inference

WSL2 Audio Not Playing

WSL2 Ollama Connection

🚀 What's Next?

📄 License

🤝 Contributing

📞 Support

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages