|
| 1 | +# Camb.ai Plugin for LiveKit Agents |
| 2 | + |
| 3 | +Text-to-Speech plugin for [Camb.ai](https://camb.ai) TTS API, powered by MARS-8 technology. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- High-quality neural text-to-speech with MARS-8 series models |
| 8 | +- Multiple model variants (mars-8, mars-8-flash, mars-8-instruct) |
| 9 | +- User instructions for style and tone control |
| 10 | +- Speed control and enhanced pronunciation |
| 11 | +- Support for 140+ languages |
| 12 | +- Real-time HTTP streaming |
| 13 | +- Pre-built voice library |
| 14 | + |
| 15 | +## Installation |
| 16 | + |
| 17 | +```bash |
| 18 | +pip install livekit-plugins-camb |
| 19 | +``` |
| 20 | + |
| 21 | +## Prerequisites |
| 22 | + |
| 23 | +You'll need a Camb.ai API key. Set it as an environment variable: |
| 24 | + |
| 25 | +```bash |
| 26 | +export CAMB_API_KEY=your_api_key_here |
| 27 | +``` |
| 28 | + |
| 29 | +Or obtain it from [Camb.ai Studio](https://studio.camb.ai/public/onboarding). |
| 30 | + |
| 31 | +## Quick Start |
| 32 | + |
| 33 | +```python |
| 34 | +from livekit.plugins.camb import TTS |
| 35 | + |
| 36 | +# Initialize TTS (uses CAMB_API_KEY env var) |
| 37 | +tts = TTS() |
| 38 | + |
| 39 | +# Synthesize speech |
| 40 | +stream = tts.synthesize("Hello from Camb.ai!") |
| 41 | +audio_frame = await stream.collect() |
| 42 | + |
| 43 | +# Save to file |
| 44 | +with open("output.wav", "wb") as f: |
| 45 | + f.write(audio_frame.to_wav_bytes()) |
| 46 | +``` |
| 47 | + |
| 48 | +## List Available Voices |
| 49 | + |
| 50 | +```python |
| 51 | +from livekit.plugins.camb import list_voices |
| 52 | + |
| 53 | +voices = await list_voices() |
| 54 | +for voice in voices: |
| 55 | + print(f"{voice.name} ({voice.id}): {voice.gender}, {voice.language}") |
| 56 | +``` |
| 57 | + |
| 58 | +## Select a Specific Voice |
| 59 | + |
| 60 | +```python |
| 61 | +tts = TTS(voice_id=2681) # Use Attic voice |
| 62 | +stream = tts.synthesize("Using a specific voice!") |
| 63 | +``` |
| 64 | + |
| 65 | +## Model Selection |
| 66 | + |
| 67 | +Camb.ai offers multiple MARS-8 models for different use cases: |
| 68 | + |
| 69 | +```python |
| 70 | +# Default balanced model |
| 71 | +tts = TTS(model="mars-8") |
| 72 | + |
| 73 | +# Faster inference |
| 74 | +tts = TTS(model="mars-8-flash") |
| 75 | + |
| 76 | +# Supports user instructions for style/tone |
| 77 | +tts = TTS( |
| 78 | + model="mars-8-instruct", |
| 79 | + user_instructions="Speak in a friendly, conversational tone" |
| 80 | +) |
| 81 | +``` |
| 82 | + |
| 83 | +## Advanced Configuration |
| 84 | + |
| 85 | +```python |
| 86 | +tts = TTS( |
| 87 | + api_key="your-api-key", # Or use CAMB_API_KEY env var |
| 88 | + voice_id=2681, # Voice ID from list-voices (Attic voice) |
| 89 | + language="en-us", # BCP-47 locale |
| 90 | + model="mars-8-instruct", # MARS model variant |
| 91 | + speed=1.0, # Speech rate (0.5-2.0) |
| 92 | + user_instructions="Speak energetically with clear enunciation", |
| 93 | + output_format="pcm_s16le", # Audio format |
| 94 | + enhance_named_entities=True, # Better pronunciation for names/places |
| 95 | +) |
| 96 | +``` |
| 97 | + |
| 98 | +## Usage with LiveKit Agents |
| 99 | + |
| 100 | +```python |
| 101 | +from livekit import agents |
| 102 | +from livekit.plugins.camb import TTS |
| 103 | + |
| 104 | +async def entrypoint(ctx: agents.JobContext): |
| 105 | + # Connect to room |
| 106 | + await ctx.connect() |
| 107 | + |
| 108 | + # Initialize TTS |
| 109 | + tts = TTS(language="en-us", speed=1.1) |
| 110 | + |
| 111 | + # Synthesize and publish |
| 112 | + stream = tts.synthesize("Hello from LiveKit with Camb.ai!") |
| 113 | + audio_frame = await stream.collect() |
| 114 | + |
| 115 | + # Publish to room |
| 116 | + source = agents.AudioSource(tts.sample_rate, tts.num_channels) |
| 117 | + track = agents.LocalAudioTrack.create_audio_track("tts", source) |
| 118 | + await ctx.room.local_participant.publish_track(track) |
| 119 | + await source.capture_frame(audio_frame) |
| 120 | +``` |
| 121 | + |
| 122 | +## Configuration Options |
| 123 | + |
| 124 | +### TTS Constructor Parameters |
| 125 | + |
| 126 | +- **api_key** (str | None): Camb.ai API key |
| 127 | +- **voice_id** (int): Voice ID to use (default: 2681) |
| 128 | +- **language** (str): BCP-47 locale (default: "en-us") |
| 129 | +- **model** (SpeechModel): MARS model variant (default: "mars-8") |
| 130 | +- **speed** (float): Speech rate (default: 1.0) |
| 131 | +- **user_instructions** (str | None): Style/tone guidance (requires mars-8-instruct) |
| 132 | +- **output_format** (OutputFormat): Audio format (default: "pcm_s16le") |
| 133 | +- **enhance_named_entities** (bool): Enhanced pronunciation (default: False) |
| 134 | +- **base_url** (str): API base URL |
| 135 | +- **http_session** (aiohttp.ClientSession | None): Reusable HTTP session |
| 136 | + |
| 137 | +### Available Models |
| 138 | + |
| 139 | +- **mars-8**: Default, balanced quality and speed |
| 140 | +- **mars-8-flash**: Faster inference, lower latency |
| 141 | +- **mars-8-instruct**: Supports user_instructions for style control |
| 142 | +- **mars-7**: Previous generation model |
| 143 | +- **mars-6**: Older generation model |
| 144 | +- **auto**: Automatic model selection |
| 145 | + |
| 146 | +### Output Formats |
| 147 | + |
| 148 | +- **pcm_s16le**: 16-bit PCM (recommended for streaming) |
| 149 | +- **pcm_s32le**: 32-bit PCM (highest quality) |
| 150 | +- **wav**: WAV with headers |
| 151 | +- **flac**: Lossless compression |
| 152 | +- **adts**: ADTS streaming format |
| 153 | + |
| 154 | +## API Reference |
| 155 | + |
| 156 | +### TTS Class |
| 157 | + |
| 158 | +Main text-to-speech interface. |
| 159 | + |
| 160 | +**Methods:** |
| 161 | +- `synthesize(text: str) -> ChunkedStream`: Synthesize text to speech |
| 162 | +- `update_options(**kwargs)`: Update voice settings dynamically |
| 163 | +- `aclose()`: Clean up resources |
| 164 | + |
| 165 | +**Properties:** |
| 166 | +- `model` (str): Current MARS model name |
| 167 | +- `provider` (str): Provider name ("Camb.ai") |
| 168 | +- `sample_rate` (int): Audio sample rate (24000 Hz) |
| 169 | +- `num_channels` (int): Number of audio channels (1) |
| 170 | + |
| 171 | +### list_voices Function |
| 172 | + |
| 173 | +```python |
| 174 | +async def list_voices( |
| 175 | + api_key: str | None = None, |
| 176 | + base_url: str = "https://client.camb.ai/apis", |
| 177 | +) -> list[VoiceInfo] |
| 178 | +``` |
| 179 | + |
| 180 | +Returns list of available voices with metadata. |
| 181 | + |
| 182 | +### VoiceInfo |
| 183 | + |
| 184 | +Voice metadata object with: |
| 185 | +- **id** (int): Unique voice identifier |
| 186 | +- **name** (str): Human-readable voice name |
| 187 | +- **gender** (str | None): Voice gender |
| 188 | +- **language** (str | None): BCP-47 locale |
| 189 | + |
| 190 | +## Multi-Language Support |
| 191 | + |
| 192 | +Camb.ai supports 140+ languages. Specify using BCP-47 locales: |
| 193 | + |
| 194 | +```python |
| 195 | +# French |
| 196 | +tts = TTS(language="fr-fr", voice_id=...) |
| 197 | + |
| 198 | +# Spanish |
| 199 | +tts = TTS(language="es-es", voice_id=...) |
| 200 | + |
| 201 | +# Japanese |
| 202 | +tts = TTS(language="ja-jp", voice_id=...) |
| 203 | +``` |
| 204 | + |
| 205 | +## Dynamic Options |
| 206 | + |
| 207 | +Update TTS settings without recreating the instance: |
| 208 | + |
| 209 | +```python |
| 210 | +tts = TTS() |
| 211 | + |
| 212 | +# Change voice |
| 213 | +tts.update_options(voice_id=12345) |
| 214 | + |
| 215 | +# Change speed and model |
| 216 | +tts.update_options(speed=1.2, model="mars-8-flash") |
| 217 | + |
| 218 | +# Add user instructions |
| 219 | +tts.update_options( |
| 220 | + model="mars-8-instruct", |
| 221 | + user_instructions="Speak warmly and enthusiastically" |
| 222 | +) |
| 223 | +``` |
| 224 | + |
| 225 | +## Error Handling |
| 226 | + |
| 227 | +The plugin handles errors according to LiveKit conventions: |
| 228 | + |
| 229 | +```python |
| 230 | +from livekit.agents import APIStatusError, APIConnectionError, APITimeoutError |
| 231 | + |
| 232 | +try: |
| 233 | + stream = tts.synthesize("Hello!") |
| 234 | + audio = await stream.collect() |
| 235 | +except APIStatusError as e: |
| 236 | + print(f"API error: {e.status_code} - {e.message}") |
| 237 | +except APIConnectionError as e: |
| 238 | + print(f"Connection error: {e}") |
| 239 | +except APITimeoutError as e: |
| 240 | + print(f"Request timed out: {e}") |
| 241 | +``` |
| 242 | + |
| 243 | +## Future Features |
| 244 | + |
| 245 | +Coming soon: |
| 246 | +- GCP Vertex AI integration |
| 247 | +- Voice cloning via custom voice creation |
| 248 | +- Voice generation from text descriptions |
| 249 | +- WebSocket streaming for real-time applications |
| 250 | + |
| 251 | +## Links |
| 252 | + |
| 253 | +- [Camb.ai Documentation](https://camb.mintlify.app/) |
| 254 | +- [LiveKit Agents Documentation](https://docs.livekit.io/agents/) |
| 255 | +- [GitHub Repository](https://github.com/livekit/agents) |
| 256 | + |
| 257 | +## License |
| 258 | + |
| 259 | +Apache License 2.0 |
0 commit comments