Skip to content

Commit 2b94c71

Browse files
Adriaan-ANTclaude
andcommitted
Improve ElevenLabs cookbook documentation and error handling
- Add virtual environment setup instructions - Document required ElevenLabs API key permissions - Add troubleshooting section covering common issues - Add project ideas to inspire users - Suppress MP3 decoding errors with try-except pattern - Document audio popping as expected free-tier behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent 8148beb commit 2b94c71

File tree

2 files changed

+166
-25
lines changed

2 files changed

+166
-25
lines changed

third_party/ElevenLabs/README.md

Lines changed: 139 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,18 +16,45 @@ We recommend following this sequence to get the most out of this cookbook:
1616

1717
### Step 1: Set Up Your Environment
1818

19-
1. **Get your API keys:**
20-
- ElevenLabs API key: [elevenlabs.io/app/developers/api-keys](https://elevenlabs.io/app/developers/api-keys)
21-
- Anthropic API key: [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys)
19+
1. **Create a virtual environment:**
20+
```bash
21+
# Navigate to the ElevenLabs directory
22+
cd /path/to/claude-cookbooks/third_party/ElevenLabs
23+
24+
# Create virtual environment
25+
python -m venv venv
26+
27+
# Activate it
28+
source venv/bin/activate # On macOS/Linux
29+
# OR
30+
venv\Scripts\activate # On Windows
31+
```
32+
33+
2. **Get your API keys:**
34+
- **ElevenLabs API key:** [elevenlabs.io/app/developers/api-keys](https://elevenlabs.io/app/developers/api-keys)
35+
36+
When creating your API key, ensure it has the following minimum permissions:
37+
- Text to speech
38+
- Speech to text
39+
- Read access on voices
40+
- Read access on models
41+
42+
- **Anthropic API key:** [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys)
2243

23-
2. **Configure your environment:**
44+
3. **Configure your environment:**
2445
```bash
2546
cp .env.example .env
26-
# Edit .env and add your API keys
2747
```
2848

29-
3. **Install dependencies:**
49+
Edit `.env` and add your API keys:
50+
```
51+
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
52+
ANTHROPIC_API_KEY=sk-ant-api03-...
53+
```
54+
55+
4. **Install dependencies:**
3056
```bash
57+
# With venv activated
3158
pip install -r requirements.txt
3259
```
3360

@@ -65,6 +92,112 @@ The script demonstrates production-ready implementations of:
6592
- WebSocket-based streaming for minimal latency
6693
- Custom audio queue for seamless playback
6794

95+
## Troubleshooting
96+
97+
### Audio Popping or Crackling
98+
99+
**Symptom:** You may occasionally hear brief pops, clicks, or audio dropouts during playback.
100+
101+
**Explanation:**
102+
103+
This occurs because the script uses MP3 format audio, which is required for the ElevenLabs free tier. When streaming MP3 data in real-time chunks, FFmpeg occasionally receives incomplete frames that cannot be decoded. This typically happens:
104+
- At the start of streaming (first chunk may be too small)
105+
- During brief network delays
106+
- At the end of audio generation (final chunk may be partial)
107+
108+
The script automatically handles these failed chunks by skipping them (using a try-except pattern in the audio decoding logic), which prevents errors from appearing in the console but may result in brief audio gaps that manifest as pops or clicks.
109+
110+
**Impact:**
111+
- Audio playback continues normally
112+
- Brief pops or clicks are usually imperceptible or minor
113+
- The WebSocket connection remains stable
114+
- No functionality is lost
115+
116+
**Solution:**
117+
118+
This is expected behavior when using MP3 format on the free tier. If you want to eliminate audio popping entirely:
119+
1. Upgrade to a paid ElevenLabs tier
120+
2. Modify the script to use `pcm_44100` format instead of MP3
121+
3. PCM format provides cleaner streaming without decoding issues
122+
123+
### API Key Issues
124+
125+
**Symptom:** `AssertionError: ELEVENLABS_API_KEY is not set` or `AssertionError: ANTHROPIC_API_KEY is not set`
126+
127+
**Solution:**
128+
1. Verify you've copied `.env.example` to `.env`: `cp .env.example .env`
129+
2. Edit `.env` and ensure both API keys are set correctly
130+
3. Check for typos or extra spaces in your API keys
131+
4. Confirm your ElevenLabs key has the required permissions (see Step 1)
132+
133+
### Dependency Issues
134+
135+
**Symptom:** Errors like `ImportError: PortAudio library not found` or audio playback failures
136+
137+
**Solution:**
138+
139+
**macOS:**
140+
```bash
141+
brew install portaudio ffmpeg
142+
```
143+
144+
**Ubuntu/Debian:**
145+
```bash
146+
sudo apt-get install portaudio19-dev ffmpeg
147+
```
148+
149+
**Windows:**
150+
- Install FFmpeg from [ffmpeg.org](https://ffmpeg.org/download.html)
151+
- Add FFmpeg to your system PATH
152+
- PortAudio typically installs automatically with sounddevice on Windows
153+
154+
Then reinstall Python dependencies:
155+
```bash
156+
pip install -r requirements.txt
157+
```
158+
159+
### Microphone Permissions
160+
161+
**Symptom:** `OSError: [Errno -9999] Unanticipated host error` or microphone not accessible
162+
163+
**Solution:**
164+
- **macOS:** Go to System Preferences → Security & Privacy → Privacy → Microphone, and enable Terminal (or your Python IDE)
165+
- **Windows:** Go to Settings → Privacy → Microphone, and enable microphone access for Python/Terminal
166+
- **Linux:** Check your user is in the `audio` group: `sudo usermod -a -G audio $USER` (then log out and back in)
167+
168+
Test your microphone setup:
169+
```bash
170+
python -c "import sounddevice as sd; print(sd.query_devices())"
171+
```
172+
173+
### WebSocket Connection Failures
174+
175+
**Symptom:** Connection errors, timeouts, or stream interruptions
176+
177+
**Solution:**
178+
1. Check your internet connection is stable
179+
2. Verify firewall isn't blocking WebSocket connections (port 443)
180+
3. Try disabling VPN or proxy temporarily
181+
4. Ensure you're not exceeding API rate limits (see ElevenLabs dashboard for usage)
182+
183+
If you continue to experience issues, check [ElevenLabs Status](https://status.elevenlabs.io/) for service updates.
184+
185+
## Project Ideas
186+
187+
Once you're comfortable with the voice assistant, here are some inspiring projects you can build:
188+
189+
- **Meeting Note-Taker** - Record and transcribe meetings in real-time, then use Claude to generate summaries, action items, and key takeaways from the conversation.
190+
191+
- **Language Learning Tutor** - Practice conversations in any language with real-time feedback. Claude can correct pronunciation, suggest better phrasing, and adapt difficulty to your skill level.
192+
193+
- **Interactive Storyteller** - Create choose-your-own-adventure games where Claude narrates the story and responds to your spoken choices, with different voice characters for each role.
194+
195+
- **Hands-Free Coding Assistant** - Describe code changes, bugs, or features verbally while keeping your hands on the keyboard. Perfect for rubber duck debugging or pair programming solo.
196+
197+
- **Voice-Activated Smart Home** - Build natural conversation interfaces for controlling home devices. Ask complex questions like "Is it cold enough to turn on the heater?" instead of simple on/off commands.
198+
199+
- **Personal Voice Journal** - Keep a daily journal by speaking your thoughts. Claude can organize entries by theme, track your mood over time, and surface relevant past entries when you need them.
200+
68201
## More About ElevenLabs
69202

70203
Here are some helpful resources to deepen your understanding:

third_party/ElevenLabs/stream_voice_assistant_websocket.py

Lines changed: 27 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -109,29 +109,37 @@ def add(self, audio_data):
109109
Args:
110110
audio_data: Raw MP3 audio bytes
111111
"""
112-
# Decode MP3 to PCM
113-
audio_segment = AudioSegment.from_mp3(io.BytesIO(audio_data))
112+
try:
113+
# Decode MP3 to PCM
114+
audio_segment = AudioSegment.from_mp3(io.BytesIO(audio_data))
114115

115-
# Convert to numpy array
116-
samples = np.array(audio_segment.get_array_of_samples(), dtype=np.int16)
117-
samples = samples.astype(np.float32) / 32768.0
116+
# Convert to numpy array
117+
samples = np.array(audio_segment.get_array_of_samples(), dtype=np.int16)
118+
samples = samples.astype(np.float32) / 32768.0
118119

119-
if not self.playing:
120-
self.sample_rate = audio_segment.frame_rate
121-
self.channels = audio_segment.channels
120+
if not self.playing:
121+
self.sample_rate = audio_segment.frame_rate
122+
self.channels = audio_segment.channels
122123

123-
# Reshape based on number of channels
124-
if self.channels > 1:
125-
samples = samples.reshape((-1, self.channels))
126-
else:
127-
samples = samples.reshape((-1, 1))
128-
129-
with self.buffer_lock:
130-
self.buffer.extend(samples.tobytes())
124+
# Reshape based on number of channels
125+
if self.channels > 1:
126+
samples = samples.reshape((-1, self.channels))
127+
else:
128+
samples = samples.reshape((-1, 1))
131129

132-
# Start playback after pre-buffering
133-
if not self.playing and len(self.buffer) >= self.PRE_BUFFER_SIZE:
134-
self.start_playback()
130+
with self.buffer_lock:
131+
self.buffer.extend(samples.tobytes())
132+
133+
# Start playback after pre-buffering
134+
if not self.playing and len(self.buffer) >= self.PRE_BUFFER_SIZE:
135+
self.start_playback()
136+
except:
137+
# Silently skip invalid MP3 chunks that fail to decode
138+
# This is common when streaming MP3 data in real-time, as chunks may contain
139+
# incomplete frames. Skipping these prevents console errors but may cause
140+
# brief audio pops. To eliminate popping, upgrade to a paid ElevenLabs tier
141+
# and use pcm_44100 format instead of MP3.
142+
pass
135143

136144
def start_playback(self):
137145
"""Start the audio output stream."""

0 commit comments

Comments
 (0)