A high-quality text-to-speech plugin for LiveKit agents integrating Kyutai TTS with streaming implementation for real-time voice synthesis.
- High-Quality Audio: Premium voice synthesis with natural-sounding speech
- Streaming Implementation: Real-time audio generation with low-latency streaming
- Excellent Performance: ~260ms time-to-first-byte (TTFB) on RTX 4090
- LiveKit Agents v1.2 or higher
- Kyutai TTS server instance
- Clone or download this plugin into your LiveKit-based agents project root directory
- Set up the Kyutai TTS server using taresh18/delayed-streams-modeling
- Ensure your server is running and accessible
Use the delayed streams modeling server for optimal performance:
Repository: taresh18/delayed-streams-modeling
Follow the setup instructions in the repository to get your Kyutai TTS server running with streaming capabilities.
Initialize your agent session with the KyutaiTTS plugin:
from your_plugin_path import kyutTTS
session = AgentSession(
# ... other configuration
tts=kyutTTS(
base_url="<kyutai_server_url>", # e.g., "http://localhost:8000"
voice="expresso/ex04-ex02_happy_001_channel2_140s.wav", # voice file path
)
)- Latency: ~260ms TTFB on RTX 4090 GPU
- Quality: High-quality voice synthesis with natural prosody
- Streaming: Real-time audio generation and playback
- Efficiency: Optimized for production use with streaming implementation