You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: third_party/ElevenLabs/README.md
+139-6Lines changed: 139 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,18 +16,45 @@ We recommend following this sequence to get the most out of this cookbook:
16
16
17
17
### Step 1: Set Up Your Environment
18
18
19
-
1.**Get your API keys:**
20
-
- ElevenLabs API key: [elevenlabs.io/app/developers/api-keys](https://elevenlabs.io/app/developers/api-keys)
21
-
- Anthropic API key: [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys)
19
+
1.**Create a virtual environment:**
20
+
```bash
21
+
# Navigate to the ElevenLabs directory
22
+
cd /path/to/claude-cookbooks/third_party/ElevenLabs
23
+
24
+
# Create virtual environment
25
+
python -m venv venv
26
+
27
+
# Activate it
28
+
source venv/bin/activate # On macOS/Linux
29
+
# OR
30
+
venv\Scripts\activate # On Windows
31
+
```
32
+
33
+
2.**Get your API keys:**
34
+
-**ElevenLabs API key:**[elevenlabs.io/app/developers/api-keys](https://elevenlabs.io/app/developers/api-keys)
35
+
36
+
When creating your API key, ensure it has the following minimum permissions:
37
+
- Text to speech
38
+
- Speech to text
39
+
- Read access on voices
40
+
- Read access on models
41
+
42
+
-**Anthropic API key:**[console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys)
22
43
23
-
2.**Configure your environment:**
44
+
3.**Configure your environment:**
24
45
```bash
25
46
cp .env.example .env
26
-
# Edit .env and add your API keys
27
47
```
28
48
29
-
3.**Install dependencies:**
49
+
Edit `.env` and add your API keys:
50
+
```
51
+
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
52
+
ANTHROPIC_API_KEY=sk-ant-api03-...
53
+
```
54
+
55
+
4.**Install dependencies:**
30
56
```bash
57
+
# With venv activated
31
58
pip install -r requirements.txt
32
59
```
33
60
@@ -65,6 +92,112 @@ The script demonstrates production-ready implementations of:
65
92
- WebSocket-based streaming for minimal latency
66
93
- Custom audio queue for seamless playback
67
94
95
+
## Troubleshooting
96
+
97
+
### Audio Popping or Crackling
98
+
99
+
**Symptom:** You may occasionally hear brief pops, clicks, or audio dropouts during playback.
100
+
101
+
**Explanation:**
102
+
103
+
This occurs because the script uses MP3 format audio, which is required for the ElevenLabs free tier. When streaming MP3 data in real-time chunks, FFmpeg occasionally receives incomplete frames that cannot be decoded. This typically happens:
104
+
- At the start of streaming (first chunk may be too small)
105
+
- During brief network delays
106
+
- At the end of audio generation (final chunk may be partial)
107
+
108
+
The script automatically handles these failed chunks by skipping them (using a try-except pattern in the audio decoding logic), which prevents errors from appearing in the console but may result in brief audio gaps that manifest as pops or clicks.
109
+
110
+
**Impact:**
111
+
- Audio playback continues normally
112
+
- Brief pops or clicks are usually imperceptible or minor
113
+
- The WebSocket connection remains stable
114
+
- No functionality is lost
115
+
116
+
**Solution:**
117
+
118
+
This is expected behavior when using MP3 format on the free tier. If you want to eliminate audio popping entirely:
119
+
1. Upgrade to a paid ElevenLabs tier
120
+
2. Modify the script to use `pcm_44100` format instead of MP3
121
+
3. PCM format provides cleaner streaming without decoding issues
122
+
123
+
### API Key Issues
124
+
125
+
**Symptom:**`AssertionError: ELEVENLABS_API_KEY is not set` or `AssertionError: ANTHROPIC_API_KEY is not set`
126
+
127
+
**Solution:**
128
+
1. Verify you've copied `.env.example` to `.env`: `cp .env.example .env`
129
+
2. Edit `.env` and ensure both API keys are set correctly
130
+
3. Check for typos or extra spaces in your API keys
131
+
4. Confirm your ElevenLabs key has the required permissions (see Step 1)
132
+
133
+
### Dependency Issues
134
+
135
+
**Symptom:** Errors like `ImportError: PortAudio library not found` or audio playback failures
136
+
137
+
**Solution:**
138
+
139
+
**macOS:**
140
+
```bash
141
+
brew install portaudio ffmpeg
142
+
```
143
+
144
+
**Ubuntu/Debian:**
145
+
```bash
146
+
sudo apt-get install portaudio19-dev ffmpeg
147
+
```
148
+
149
+
**Windows:**
150
+
- Install FFmpeg from [ffmpeg.org](https://ffmpeg.org/download.html)
151
+
- Add FFmpeg to your system PATH
152
+
- PortAudio typically installs automatically with sounddevice on Windows
153
+
154
+
Then reinstall Python dependencies:
155
+
```bash
156
+
pip install -r requirements.txt
157
+
```
158
+
159
+
### Microphone Permissions
160
+
161
+
**Symptom:**`OSError: [Errno -9999] Unanticipated host error` or microphone not accessible
162
+
163
+
**Solution:**
164
+
-**macOS:** Go to System Preferences → Security & Privacy → Privacy → Microphone, and enable Terminal (or your Python IDE)
165
+
-**Windows:** Go to Settings → Privacy → Microphone, and enable microphone access for Python/Terminal
166
+
-**Linux:** Check your user is in the `audio` group: `sudo usermod -a -G audio $USER` (then log out and back in)
167
+
168
+
Test your microphone setup:
169
+
```bash
170
+
python -c "import sounddevice as sd; print(sd.query_devices())"
171
+
```
172
+
173
+
### WebSocket Connection Failures
174
+
175
+
**Symptom:** Connection errors, timeouts, or stream interruptions
4. Ensure you're not exceeding API rate limits (see ElevenLabs dashboard for usage)
182
+
183
+
If you continue to experience issues, check [ElevenLabs Status](https://status.elevenlabs.io/) for service updates.
184
+
185
+
## Project Ideas
186
+
187
+
Once you're comfortable with the voice assistant, here are some inspiring projects you can build:
188
+
189
+
-**Meeting Note-Taker** - Record and transcribe meetings in real-time, then use Claude to generate summaries, action items, and key takeaways from the conversation.
190
+
191
+
-**Language Learning Tutor** - Practice conversations in any language with real-time feedback. Claude can correct pronunciation, suggest better phrasing, and adapt difficulty to your skill level.
192
+
193
+
-**Interactive Storyteller** - Create choose-your-own-adventure games where Claude narrates the story and responds to your spoken choices, with different voice characters for each role.
194
+
195
+
-**Hands-Free Coding Assistant** - Describe code changes, bugs, or features verbally while keeping your hands on the keyboard. Perfect for rubber duck debugging or pair programming solo.
196
+
197
+
-**Voice-Activated Smart Home** - Build natural conversation interfaces for controlling home devices. Ask complex questions like "Is it cold enough to turn on the heater?" instead of simple on/off commands.
198
+
199
+
-**Personal Voice Journal** - Keep a daily journal by speaking your thoughts. Claude can organize entries by theme, track your mood over time, and surface relevant past entries when you need them.
200
+
68
201
## More About ElevenLabs
69
202
70
203
Here are some helpful resources to deepen your understanding:
0 commit comments