Skip to content

Commit b42239a

Browse files
committed
feat: Add Camb.ai TTS plugin for LiveKit Agents
- Implement TTS class with streaming audio synthesis - Support MARS models (mars-8, mars-8-flash, mars-8-instruct) - Add voice listing functionality via list_voices() - Support configurable voice parameters (speed, language, voice_id) - Include PCM, WAV, FLAC, and ADTS output formats - Add examples and documentation
1 parent d774d26 commit b42239a

File tree

15 files changed

+1492
-0
lines changed

15 files changed

+1492
-0
lines changed
Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
# Camb.ai Plugin for LiveKit Agents
2+
3+
Text-to-Speech plugin for [Camb.ai](https://camb.ai) TTS API, powered by MARS-8 technology.
4+
5+
## Features
6+
7+
- High-quality neural text-to-speech with MARS-8 series models
8+
- Multiple model variants (mars-8, mars-8-flash, mars-8-instruct)
9+
- User instructions for style and tone control
10+
- Speed control and enhanced pronunciation
11+
- Support for 140+ languages
12+
- Real-time HTTP streaming
13+
- Pre-built voice library
14+
15+
## Installation
16+
17+
```bash
18+
pip install livekit-plugins-camb
19+
```
20+
21+
## Prerequisites
22+
23+
You'll need a Camb.ai API key. Set it as an environment variable:
24+
25+
```bash
26+
export CAMB_API_KEY=your_api_key_here
27+
```
28+
29+
Or obtain it from [Camb.ai Studio](https://studio.camb.ai/public/onboarding).
30+
31+
## Quick Start
32+
33+
```python
34+
from livekit.plugins.camb import TTS
35+
36+
# Initialize TTS (uses CAMB_API_KEY env var)
37+
tts = TTS()
38+
39+
# Synthesize speech
40+
stream = tts.synthesize("Hello from Camb.ai!")
41+
audio_frame = await stream.collect()
42+
43+
# Save to file
44+
with open("output.wav", "wb") as f:
45+
f.write(audio_frame.to_wav_bytes())
46+
```
47+
48+
## List Available Voices
49+
50+
```python
51+
from livekit.plugins.camb import list_voices
52+
53+
voices = await list_voices()
54+
for voice in voices:
55+
print(f"{voice.name} ({voice.id}): {voice.gender}, {voice.language}")
56+
```
57+
58+
## Select a Specific Voice
59+
60+
```python
61+
tts = TTS(voice_id=2681) # Use Attic voice
62+
stream = tts.synthesize("Using a specific voice!")
63+
```
64+
65+
## Model Selection
66+
67+
Camb.ai offers multiple MARS-8 models for different use cases:
68+
69+
```python
70+
# Default balanced model
71+
tts = TTS(model="mars-8")
72+
73+
# Faster inference
74+
tts = TTS(model="mars-8-flash")
75+
76+
# Supports user instructions for style/tone
77+
tts = TTS(
78+
model="mars-8-instruct",
79+
user_instructions="Speak in a friendly, conversational tone"
80+
)
81+
```
82+
83+
## Advanced Configuration
84+
85+
```python
86+
tts = TTS(
87+
api_key="your-api-key", # Or use CAMB_API_KEY env var
88+
voice_id=2681, # Voice ID from list-voices (Attic voice)
89+
language="en-us", # BCP-47 locale
90+
model="mars-8-instruct", # MARS model variant
91+
speed=1.0, # Speech rate (0.5-2.0)
92+
user_instructions="Speak energetically with clear enunciation",
93+
output_format="pcm_s16le", # Audio format
94+
enhance_named_entities=True, # Better pronunciation for names/places
95+
)
96+
```
97+
98+
## Usage with LiveKit Agents
99+
100+
```python
101+
from livekit import agents
102+
from livekit.plugins.camb import TTS
103+
104+
async def entrypoint(ctx: agents.JobContext):
105+
# Connect to room
106+
await ctx.connect()
107+
108+
# Initialize TTS
109+
tts = TTS(language="en-us", speed=1.1)
110+
111+
# Synthesize and publish
112+
stream = tts.synthesize("Hello from LiveKit with Camb.ai!")
113+
audio_frame = await stream.collect()
114+
115+
# Publish to room
116+
source = agents.AudioSource(tts.sample_rate, tts.num_channels)
117+
track = agents.LocalAudioTrack.create_audio_track("tts", source)
118+
await ctx.room.local_participant.publish_track(track)
119+
await source.capture_frame(audio_frame)
120+
```
121+
122+
## Configuration Options
123+
124+
### TTS Constructor Parameters
125+
126+
- **api_key** (str | None): Camb.ai API key
127+
- **voice_id** (int): Voice ID to use (default: 2681)
128+
- **language** (str): BCP-47 locale (default: "en-us")
129+
- **model** (SpeechModel): MARS model variant (default: "mars-8")
130+
- **speed** (float): Speech rate (default: 1.0)
131+
- **user_instructions** (str | None): Style/tone guidance (requires mars-8-instruct)
132+
- **output_format** (OutputFormat): Audio format (default: "pcm_s16le")
133+
- **enhance_named_entities** (bool): Enhanced pronunciation (default: False)
134+
- **base_url** (str): API base URL
135+
- **http_session** (aiohttp.ClientSession | None): Reusable HTTP session
136+
137+
### Available Models
138+
139+
- **mars-8**: Default, balanced quality and speed
140+
- **mars-8-flash**: Faster inference, lower latency
141+
- **mars-8-instruct**: Supports user_instructions for style control
142+
- **mars-7**: Previous generation model
143+
- **mars-6**: Older generation model
144+
- **auto**: Automatic model selection
145+
146+
### Output Formats
147+
148+
- **pcm_s16le**: 16-bit PCM (recommended for streaming)
149+
- **pcm_s32le**: 32-bit PCM (highest quality)
150+
- **wav**: WAV with headers
151+
- **flac**: Lossless compression
152+
- **adts**: ADTS streaming format
153+
154+
## API Reference
155+
156+
### TTS Class
157+
158+
Main text-to-speech interface.
159+
160+
**Methods:**
161+
- `synthesize(text: str) -> ChunkedStream`: Synthesize text to speech
162+
- `update_options(**kwargs)`: Update voice settings dynamically
163+
- `aclose()`: Clean up resources
164+
165+
**Properties:**
166+
- `model` (str): Current MARS model name
167+
- `provider` (str): Provider name ("Camb.ai")
168+
- `sample_rate` (int): Audio sample rate (24000 Hz)
169+
- `num_channels` (int): Number of audio channels (1)
170+
171+
### list_voices Function
172+
173+
```python
174+
async def list_voices(
175+
api_key: str | None = None,
176+
base_url: str = "https://client.camb.ai/apis",
177+
) -> list[VoiceInfo]
178+
```
179+
180+
Returns list of available voices with metadata.
181+
182+
### VoiceInfo
183+
184+
Voice metadata object with:
185+
- **id** (int): Unique voice identifier
186+
- **name** (str): Human-readable voice name
187+
- **gender** (str | None): Voice gender
188+
- **language** (str | None): BCP-47 locale
189+
190+
## Multi-Language Support
191+
192+
Camb.ai supports 140+ languages. Specify using BCP-47 locales:
193+
194+
```python
195+
# French
196+
tts = TTS(language="fr-fr", voice_id=...)
197+
198+
# Spanish
199+
tts = TTS(language="es-es", voice_id=...)
200+
201+
# Japanese
202+
tts = TTS(language="ja-jp", voice_id=...)
203+
```
204+
205+
## Dynamic Options
206+
207+
Update TTS settings without recreating the instance:
208+
209+
```python
210+
tts = TTS()
211+
212+
# Change voice
213+
tts.update_options(voice_id=12345)
214+
215+
# Change speed and model
216+
tts.update_options(speed=1.2, model="mars-8-flash")
217+
218+
# Add user instructions
219+
tts.update_options(
220+
model="mars-8-instruct",
221+
user_instructions="Speak warmly and enthusiastically"
222+
)
223+
```
224+
225+
## Error Handling
226+
227+
The plugin handles errors according to LiveKit conventions:
228+
229+
```python
230+
from livekit.agents import APIStatusError, APIConnectionError, APITimeoutError
231+
232+
try:
233+
stream = tts.synthesize("Hello!")
234+
audio = await stream.collect()
235+
except APIStatusError as e:
236+
print(f"API error: {e.status_code} - {e.message}")
237+
except APIConnectionError as e:
238+
print(f"Connection error: {e}")
239+
except APITimeoutError as e:
240+
print(f"Request timed out: {e}")
241+
```
242+
243+
## Future Features
244+
245+
Coming soon:
246+
- GCP Vertex AI integration
247+
- Voice cloning via custom voice creation
248+
- Voice generation from text descriptions
249+
- WebSocket streaming for real-time applications
250+
251+
## Links
252+
253+
- [Camb.ai Documentation](https://camb.mintlify.app/)
254+
- [LiveKit Agents Documentation](https://docs.livekit.io/agents/)
255+
- [GitHub Repository](https://github.com/livekit/agents)
256+
257+
## License
258+
259+
Apache License 2.0

0 commit comments

Comments
 (0)