This document outlines the planned features and improvements for Velloris.
Released: February 2026
Major Features:
- β Three-mode architecture (realtime, dubbing, creative)
- β Proper PersonaPlex-7B integration (end-to-end S2S)
- β Ollama made optional (only for creative mode)
- β 10-15x performance improvement in realtime mode
- β Voice cloning support (dubbing/creative modes)
- β Comprehensive documentation
See CHANGELOG.md for full release notes.
Priority: High
-
Model quantization support
- Goal: 8-bit and 4-bit quantization for PersonaPlex-7B.
- Outcome: Reduce VRAM requirements from 16GB to 8GB with a latency increase of no more than 10%.
-
Streaming TTS in dubbing mode
- Goal: Generate audio progressively instead of waiting for full synthesis.
- Outcome: Reduce perceived latency for long scripts by 50% and enable real-time playback during generation.
-
Batch processing optimization
- Goal: Parallel processing for multiple scripts.
- Outcome: Smart batching to maximize GPU utilization, with progress tracking and cancellation support.
Target: v2.1.0 release by April 2026
Priority: Medium
-
Audio post-processing pipeline
- Goal: Implement a post-processing pipeline with noise reduction, normalization, compression, and EQ adjustment.
- Outcome: Achieve broadcast-quality audio output.
-
Voice mixing and effects
- Goal: Add support for voice mixing and effects like reverb, echo, chorus, pitch shifting, speed adjustment, and background music mixing.
- Outcome: Users can create more dynamic and engaging audio content.
-
Multi-speaker support
- Goal: Detect speaker changes in script and automatically assign different voices.
- Outcome: Maintain speaker consistency in multi-speaker scripts.
Target: v2.1.1 release by May 2026
Priority: High
-
Enhanced multilingual support
- Goal: Improve quality for non-English languages, with a focus on French, German, and Spanish.
- Outcome: Achieve human-level quality for the target languages.
- Dependencies: Language-specific voice models.
-
Dialect support
- Goal: Add support for regional accents and cultural pronunciation nuances.
- Outcome: User-selectable dialect variants for English (US, UK, Australian), Spanish (Spain, Mexico), and French (France, Canada).
Target: v2.2.0 release by July 2026
Priority: High
-
RESTful API server
- Goal: Implement a RESTful API server with endpoints for all three modes.
- Outcome: Enable easy integration with other applications and services.
-
Python SDK improvements
- Goal: Improve the Python SDK with type hints, async/await support, context managers, and better error messages.
- Outcome: A more developer-friendly and robust SDK.
-
Streaming API
- Goal: Implement a streaming API using Server-Sent Events (SSE) and WebSockets.
- Outcome: Enable real-time, low-latency audio streaming.
Target: v2.2.1 release by August 2026
Priority: Medium
-
Docker container
- Goal: Provide an official Docker image with multi-architecture support (AMD64, ARM64).
- Outcome: Simplify deployment and ensure consistency across different environments.
-
Cloud deployment guides
- Goal: Create deployment guides for popular cloud platforms (AWS, GCP, Azure) and services (RunPod, Vast.ai, Lambda Labs).
- Outcome: Lower the barrier to entry for cloud-based deployments.
-
Mobile support (experimental)
- Goal: Develop experimental iOS and Android apps with on-device inference.
- Outcome: Showcase the potential of Velloris on mobile devices.
- Dependencies: Model quantization support (v2.1).
Target: v2.3.0 release by September 2026
Priority: Medium
-
Emotion detection and matching
- Goal: Detect emotion from user voice in realtime mode and generate a response with matching emotion.
- Outcome: More natural and engaging conversations.
-
Voice conversion
- Goal: Implement real-time voice conversion, including cross-gender and age progression/regression.
- Outcome: A powerful tool for content creators and privacy-conscious users.
-
Speech-to-speech translation
- Goal: Translate speech while preserving voice characteristics in a multi-language conversation mode.
- Outcome: Break down language barriers in real-time communication.
Target: v3.0.0 release by Q4 2026
Priority: High
-
Fine-tuning support
- Goal: Add support for fine-tuning PersonaPlex-7B and Qwen3-TTS for custom voices and specialized domains.
- Outcome: Users can create their own high-quality custom voices.
-
Custom model support
- Goal: Implement a plugin system for third-party models, including new TTS engines and LLMs.
- Outcome: A more flexible and extensible platform.
-
Zero-shot voice cloning
- Goal: Improve voice cloning with minimal reference audio (e.g., a single sentence).
- Outcome: Make voice cloning more accessible and easier to use.
Target: v3.1.0 release by Q1 2027
Priority: Low (community-driven)
- Multi-tenancy support
- Monitoring and observability
- High availability
Target: v3.2.0 release by Q2 2027
Priority: Low (exploratory)
These features are under investigation and may or may not be implemented:
- Chain-of-thought speech
- Multi-turn planning
- Speaker diarization
- Acoustic scene analysis
- Vision integration
- Text + audio input
Features requested by the community. Vote on GitHub Discussions. The top 3 most upvoted features will be considered for the next release cycle.
- Web UI / GUI (150+ votes)
- More voice options (120+ votes)
- Voice editor (100+ votes)
- Plugin system (75+ votes)
- Mobile apps (60+ votes)
- Video dubbing (50+ votes)
We welcome contributions to any feature on this roadmap!
- Check open issues labeled
help wanted - Comment on the issue to claim it
- Fork, implement, and submit a PR
- See CONTRIBUTING.md for guidelines
- Open a discussion in GitHub Discussions
- Share your research findings or ideas
- Collaborate on experimental features
- Co-author papers on Velloris innovations
- Vote on features in GitHub Discussions
- Share your use cases and requirements
- Test beta features and provide feedback
- Report bugs and suggest improvements
Velloris follows Semantic Versioning:
- Major versions (v2.0, v3.0): Breaking changes, major new features
- Minor versions (v2.1, v2.2): New features, backward compatible
- Patch versions (v2.0.1, v2.0.2): Bug fixes, backward compatible
- Major releases: 6-12 months
- Minor releases: 1-3 months
- Patch releases: As needed (critical bugs)
Features from the original v1.x roadmap that are now complete:
- β Three-mode architecture (v2.0.0)
- β Proper PersonaPlex usage (v2.0.0)
- β Optional Ollama (v2.0.0)
- β Voice cloning (v2.0.0)
- β Comprehensive documentation (v2.0.0)
Features from v1.x that are no longer supported:
- β Interactive mode (deprecated in v2.0)
| Version | Release Date | Status | Highlights |
|---|---|---|---|
| v2.0.0 | Feb 2026 | Current | Three-mode architecture, proper PersonaPlex usage |
| v1.0.0 | Jan 2026 | Deprecated | Initial release, interactive mode |
Have ideas for the roadmap? We'd love to hear from you!
- Feature requests: GitHub Issues
- Discussions: GitHub Discussions
- Voting: Upvote existing feature requests
Your feedback shapes the future of Velloris!
Last updated: February 2026
For the latest updates, see CHANGELOG.md.