Skip to content

Commit 7a98841

Browse files
committed
elevenlabs stt support including scribe v2 realtime
1 parent 9a58cd3 commit 7a98841

File tree

4 files changed

+733
-0
lines changed

4 files changed

+733
-0
lines changed

plugins/elevenlabs/README.stt.md

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
# ElevenLabs STT Plugin for LiveKit Agents
2+
3+
This plugin provides speech-to-text capabilities using ElevenLabs Scribe API for LiveKit agents.
4+
5+
## Features
6+
7+
- **Multiple Model Support**: Choose between Scribe v1, v2, and v2 realtime
8+
- **Streaming & Non-Streaming**: Support for both batch and real-time transcription
9+
- **Multi-Language**: Supports 35+ languages with automatic language detection
10+
- **Audio Event Tagging**: Optional tagging of non-speech audio events (laughter, footsteps, etc.)
11+
- **VAD Configuration**: Customizable voice activity detection for streaming mode
12+
13+
## Installation
14+
15+
```bash
16+
pnpm add @livekit/agents-plugin-elevenlabs
17+
```
18+
19+
## Supported Models
20+
21+
### Scribe v1 (`scribe_v1`)
22+
- **Type**: Non-streaming
23+
- **Method**: HTTP POST
24+
- **Use Case**: Batch transcription of pre-recorded audio
25+
- **Features**: Audio event tagging, language detection
26+
27+
### Scribe v2 (`scribe_v2`)
28+
- **Type**: Non-streaming
29+
- **Method**: HTTP POST
30+
- **Use Case**: Improved accuracy for batch transcription
31+
- **Features**: Enhanced model, language detection
32+
33+
### Scribe v2 Realtime (`scribe_v2_realtime`)
34+
- **Type**: Streaming
35+
- **Method**: WebSocket
36+
- **Use Case**: Real-time conversation transcription
37+
- **Features**: Interim results, VAD-based segmentation, manual commit support
38+
39+
## Quick Start
40+
41+
### Non-Streaming (Scribe v1)
42+
43+
```typescript
44+
import { STT } from '@livekit/agents-plugin-elevenlabs';
45+
46+
const stt = new STT({
47+
apiKey: process.env.ELEVEN_API_KEY, // or set ELEVEN_API_KEY env var
48+
model: 'scribe_v1',
49+
languageCode: 'en',
50+
tagAudioEvents: true,
51+
});
52+
```
53+
54+
### Streaming (Scribe v2 Realtime)
55+
56+
```typescript
57+
import { STT } from '@livekit/agents-plugin-elevenlabs';
58+
import { SpeechEventType } from '@livekit/agents';
59+
60+
const stt = new STT({
61+
model: 'scribe_v2_realtime', // default
62+
sampleRate: 16000,
63+
languageCode: 'en',
64+
commitStrategy: 'vad', // auto-commit on speech end
65+
vadSilenceThresholdSecs: 1.0,
66+
});
67+
```
68+
69+
## Configuration Options
70+
71+
### Common Options
72+
73+
| Option | Type | Default | Description |
74+
|--------|------|---------|-------------|
75+
| `apiKey` | `string` | `process.env.ELEVEN_API_KEY` | ElevenLabs API key |
76+
| `baseURL` | `string` | `https://api.elevenlabs.io/v1` | API base URL |
77+
| `model` | `STTModels` | `'scribe_v1'` | Model to use |
78+
| `languageCode` | `string` | `undefined` | Language code (auto-detected if not set) |
79+
80+
### Non-Streaming Options (v1, v2)
81+
82+
| Option | Type | Default | Description |
83+
|--------|------|---------|-------------|
84+
| `tagAudioEvents` | `boolean` | `true` | Tag non-speech events like (laughter) |
85+
86+
### Streaming Options (v2_realtime)
87+
88+
| Option | Type | Default | Description |
89+
|--------|------|---------|-------------|
90+
| `sampleRate` | `number` | `16000` | Audio sample rate in Hz (16000, 22050, or 44100) |
91+
| `numChannels` | `number` | `1` | Number of audio channels |
92+
| `commitStrategy` | `'vad' \| 'manual'` | `'vad'` | How to commit transcripts |
93+
| `vadSilenceThresholdSecs` | `number` | `undefined` | VAD silence threshold (0.3-3.0 seconds) |
94+
| `vadThreshold` | `number` | `undefined` | VAD threshold (0.1-0.9) |
95+
| `minSpeechDurationMs` | `number` | `undefined` | Minimum speech duration (50-2000 ms) |
96+
| `minSilenceDurationMs` | `number` | `undefined` | Minimum silence duration (50-2000 ms) |
97+
98+
## Supported Languages
99+
100+
The plugin supports 35+ languages including:
101+
102+
- **English** (`en`)
103+
- **Spanish** (`es`)
104+
- **French** (`fr`)
105+
- **German** (`de`)
106+
- **Italian** (`it`)
107+
- **Portuguese** (`pt`)
108+
- **Polish** (`pl`)
109+
- **Dutch** (`nl`)
110+
- **Swedish** (`sv`)
111+
- **Finnish** (`fi`)
112+
- **Danish** (`da`)
113+
- **Norwegian** (`no`)
114+
- **Czech** (`cs`)
115+
- **Romanian** (`ro`)
116+
- **Slovak** (`sk`)
117+
- **Ukrainian** (`uk`)
118+
- **Greek** (`el`)
119+
- **Turkish** (`tr`)
120+
- **Russian** (`ru`)
121+
- **Bulgarian** (`bg`)
122+
- **Croatian** (`hr`)
123+
- **Serbian** (`sr`)
124+
- **Hungarian** (`hu`)
125+
- **Lithuanian** (`lt`)
126+
- **Latvian** (`lv`)
127+
- **Estonian** (`et`)
128+
- **Japanese** (`ja`)
129+
- **Chinese** (`zh`)
130+
- **Korean** (`ko`)
131+
- **Hindi** (`hi`)
132+
- **Arabic** (`ar`)
133+
- **Persian** (`fa`)
134+
- **Hebrew** (`he`)
135+
- **Indonesian** (`id`)
136+
- **Malay** (`ms`)
137+
- **Thai** (`th`)
138+
- **Vietnamese** (`vi`)
139+
- **Tamil** (`ta`)
140+
- **Urdu** (`ur`)
141+
142+
## Advanced Usage
143+
144+
### Custom VAD Parameters
145+
146+
Fine-tune voice activity detection for your use case:
147+
148+
```typescript
149+
const stt = new STT({
150+
model: 'scribe_v2_realtime',
151+
commitStrategy: 'vad',
152+
153+
// Longer silence before committing (good for thoughtful speakers)
154+
vadSilenceThresholdSecs: 2.0,
155+
156+
// Higher threshold = more strict about what's considered speech
157+
vadThreshold: 0.7,
158+
159+
// Ignore very short speech bursts (reduce false positives)
160+
minSpeechDurationMs: 200,
161+
162+
// Require longer silence to end speech (reduce fragmentation)
163+
minSilenceDurationMs: 500,
164+
});
165+
```
166+
167+
### Multi-Language Support
168+
169+
Let ElevenLabs auto-detect the language:
170+
171+
```typescript
172+
const stt = new STT({
173+
model: 'scribe_v1',
174+
// Don't set languageCode - will auto-detect
175+
});
176+
177+
const event = await stt.recognize(audioBuffer);
178+
console.log('Detected language:', event.alternatives[0].language);
179+
console.log('Text:', event.alternatives[0].text);
180+
```
181+
182+
Or specify a language:
183+
184+
```typescript
185+
const stt = new STT({
186+
model: 'scribe_v2_realtime',
187+
languageCode: 'es', // Spanish
188+
});
189+
```
190+
191+
## Model Comparison
192+
193+
| Feature | Scribe v1 | Scribe v2 | Scribe v2 Realtime |
194+
|---------|-----------|-----------|-------------------|
195+
| **Type** | Non-streaming | Non-streaming | Streaming |
196+
| **Latency** | High (batch) | High (batch) | Low (real-time) |
197+
| **Interim Results** ||||
198+
| **Audio Event Tagging** ||||
199+
| **VAD Configuration** ||||
200+
| **Manual Commit** ||||
201+
| **Best For** | Batch jobs with event detection | High-accuracy batch | Real-time conversations |
202+
203+
## Resources
204+
205+
- [ElevenLabs STT Documentation](https://elevenlabs.io/docs/api-reference/speech-to-text)
206+
- [Scribe v2 Streaming Guide](https://elevenlabs.io/docs/cookbooks/speech-to-text/streaming)
207+
- [LiveKit Agents Documentation](https://docs.livekit.io/agents/)
208+
209+
## License
210+
211+
Copyright 2025 LiveKit, Inc.
212+
213+
Licensed under the Apache License, Version 2.0.

plugins/elevenlabs/src/index.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
// SPDX-License-Identifier: Apache-2.0
44
import { Plugin } from '@livekit/agents';
55

6+
export * from './models.js';
7+
export * from './stt.js';
68
export * from './tts.js';
79

810
class ElevenLabsPlugin extends Plugin {

plugins/elevenlabs/src/models.ts

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,50 @@ export type TTSEncoding =
2121
// | 'mp3_44100_128'
2222
// | 'mp3_44100_192'
2323
'pcm_16000' | 'pcm_22050' | 'pcm_44100';
24+
25+
export type STTModels = 'scribe_v1' | 'scribe_v2' | 'scribe_v2_realtime';
26+
27+
export type STTAudioFormat = 'pcm_16000' | 'pcm_22050' | 'pcm_44100';
28+
29+
export type STTCommitStrategy = 'vad' | 'manual';
30+
31+
export type STTLanguages =
32+
| 'en'
33+
| 'es'
34+
| 'fr'
35+
| 'de'
36+
| 'it'
37+
| 'pt'
38+
| 'pl'
39+
| 'nl'
40+
| 'sv'
41+
| 'fi'
42+
| 'da'
43+
| 'no'
44+
| 'cs'
45+
| 'ro'
46+
| 'sk'
47+
| 'uk'
48+
| 'el'
49+
| 'tr'
50+
| 'ru'
51+
| 'bg'
52+
| 'hr'
53+
| 'sr'
54+
| 'hu'
55+
| 'lt'
56+
| 'lv'
57+
| 'et'
58+
| 'ja'
59+
| 'zh'
60+
| 'ko'
61+
| 'hi'
62+
| 'ar'
63+
| 'fa'
64+
| 'he'
65+
| 'id'
66+
| 'ms'
67+
| 'th'
68+
| 'vi'
69+
| 'ta'
70+
| 'ur';

0 commit comments

Comments
 (0)