Releases: Sinapsis-AI/sinapsis-speech
sinapsis-speech v0.4.0
Sinapsis-speech V0.4.0
The sinapsis-speech package continues to evolve, bringing broader compatibility and more powerful speech capabilities. This release introduces new integrations and support for cutting-edge TTS and STT models, along with extended support for ElevenLabs 2.0+.
🚀 New Integrations
Sinapsis Orpheus-CPP
Enables text-to-speech (TTS) using the Orpheus-TTS engine, providing high-quality neural voice synthesis.
OrpheusTTS
Converts text to speech using Orpheus.
Accepts text packets from an input container and returns synthesized audio.
Includes memory-safe error handling for GPU-intensive workloads.
📄 See the full setup in the README.
Sinapsis Parakeet-TDT
Brings speech-to-text (STT) capabilities using NVIDIA’s Parakeet TDT 0.6B model.
ParakeetTDTInference
Transcribes audio input from containers or files.
Supports timestamp prediction.
Adds the resulting text packets back into the container.
📄 See the full setup in the README.
ElevenLabs 2.0+ Support
We now offer seamless compatibility with ElevenLabs v2.0 and above, unlocking improved voice fidelity and additional model options.
ElevenLabsTTS
Text-to-speech using ElevenLabs voice models.
ElevenLabsVoiceGeneration
Generate synthetic voices based on descriptions.
📄 Setup instructions available in the package README.
🔧 Full Package Overview
Sinapsis ElevenLabs – TTS + voice generation via ElevenLabs
Sinapsis F5-TTS – TTS with voice cloning
Sinapsis Kokoro – TTS with Kokoro 82M
Sinapsis Zonos – TTS and voice cloning using Zonos
Sinapsis Orpheus-CPP – NEW: TTS via Orpheus
Sinapsis Parakeet-TDT – NEW: STT via Parakeet TDT
sinapsis-speech v0.3.0
Sinapsis-speech v0.3.0
We are excited to introduce the sinapsis-kokoro package into the sinapsis-speech monorepo. sinapsis-kokoro is a powerful tool for integrating high-quality text-to-speech (TTS) capabilities into your applications. Built on the Kokoro model, this package offers a lightweight yet efficient solution for generating synthetic speech, making it ideal for a wide range of applications.
Note:
This release includes an upgrade in the webapp to follow the sinapsis design
Key Features
Text-to-Speech Synthesis
High-Quality Speech Generation: Kokoro delivers speech output comparable to larger models, ensuring clear and natural voice synthesis.
Versatile Use Cases: Perfect for applications like audiobooks, voice assistants, and accessibility tools.
Voice Customization
Multiple Voice Options: Choose from a variety of voices to suit different content types and user preferences.
Customizable Output: Adjust pitch, speed, and tone to create the desired auditory experience.
Deployment Flexibility
Apache-Licensed Weights: Easily deploy Kokoro in any environment, from production servers to personal projects.
Cross-Platform Compatibility: Works seamlessly across various platforms and devices.
Performance and Efficiency
Lightning-Fast Processing: Kokoro's lightweight architecture ensures quick synthesis, reducing latency in real-time applications.
Cost-Effective: Lower computational requirements make it an economical choice for continuous use.
sinapsis-speech v0.2.2
This release includes a minor fix in the webapps for text-to-speech for the different packages
sinapsis-speech v0.2.0
We present sinapsis-speech V0.2.0, which now includes two new packages: sinapsis-zonos and sinapsis-f5tts, which expand the functionality provided by sinapsis-elevenlabs, showcasing how easy it is to integrate new functionality within the sinapsis framework
sinapsis-zonos Package
High-Quality Voice Synthesis
Generate natural-sounding speech with customizable voice characteristics.
Supports multiple languages and voice styles for diverse use cases.
Advanced Voice Customization
Fine-tune voice attributes such as pitch, speed, and tone.
Create unique synthetic voices tailored to specific applications.
Efficient Workflow Integration
Seamlessly integrate with existing workflows for text-to-speech tasks.
Compatible with the core template system for flexible pipeline construction.
sinapsis-f5tts Package
Real-Time Text-to-Speech Synthesis
Generate speech on-the-fly with low latency.
Ideal for applications requiring immediate audio output.
Emotional Modulation
Add emotional tone to synthetic speech, such as happiness, sadness, or neutrality.
Enhance the expressiveness of generated speech for more engaging interactions.
Multi-Language Support
Synthesize speech in multiple languages and dialects.
Handle long-form text inputs for extended speech generation.
Furthermore, we include two new webapps to test the functionality of these packages
sinapsis-speech v0.1.0
The sinapsis-speech monorepo provides a sinapsis-elevenlabs package with templates that provide a flexible and reusable framework for text-to-speech and voice generation workflows. This release introduces a core template system that allows developers to:
- Define Custom Templates: Create reusable blueprints for text-to-speech and voice synthesis tasks.
- Compose Complex Workflows: Combine multiple templates to build sophisticated data processing pipelines.
- Modify Attributes Dynamically: Update template attributes through a dedicated method while preserving metadata.
- Integrate with Data Containers: Process and transform data within a unified data container system.
The sinapsis-elevenlabs include templates to perform:
- Text-to-speech: Template for converting text into speech using ElevenLabs' voice models.
- Voice generation: Template for generating custom synthetic voices based on user-provided descriptions.