Skip to content

Commit 5fd6b1f

Browse files
committed
Initial implementation of Speechmatics vCon link
- Speechmatics API client with retry logic and error handling - WTF format converter per draft-howe-vcon-wtf-extension-01 - Main link with dual output format support (native + WTF) - Comprehensive unit tests (116 tests, 93% coverage) - Full documentation and configuration examples
0 parents  commit 5fd6b1f

File tree

14 files changed

+3820
-0
lines changed

14 files changed

+3820
-0
lines changed

.gitignore

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
__pycache__/
2+
*.pyc
3+
*.pyo
4+
*.pyd
5+
.Python
6+
*.so
7+
*.egg
8+
*.egg-info/
9+
dist/
10+
build/
11+
.venv/
12+
venv/
13+
.env
14+
*.log
15+
.pytest_cache/
16+
.coverage
17+
htmlcov/
18+
.mypy_cache/
19+
.ruff_cache/
20+
*.swp
21+
*.swo
22+
*~
23+
.DS_Store
24+
.idea/
25+
.vscode/
26+

LICENSE

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
MIT License
2+
3+
Copyright (c) 2025 Thomas McCarthy-Howe
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
22+

README.md

Lines changed: 318 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,318 @@
1+
# Speechmatics vCon Link
2+
3+
A vCon server link for transcribing audio dialogs using the [Speechmatics](https://www.speechmatics.com/) API. This link processes vCon objects and stores transcription results in both Speechmatics native format and the standardized [World Transcription Format (WTF)](https://datatracker.ietf.org/doc/draft-howe-vcon-wtf-extension/).
4+
5+
## Features
6+
7+
- Batch transcription of audio dialogs via Speechmatics API
8+
- Dual output formats:
9+
- Speechmatics native JSON format (preserves all provider-specific data)
10+
- WTF format (standardized transcription schema for interoperability)
11+
- Speaker diarization support
12+
- Automatic language detection
13+
- Idempotent processing (skip already transcribed dialogs)
14+
- Retry logic with exponential backoff
15+
- Comprehensive logging
16+
17+
## Installation
18+
19+
Install directly from GitHub:
20+
21+
```bash
22+
pip install git+https://github.com/yourusername/speechmatics-link.git
23+
```
24+
25+
Install from a specific version/tag:
26+
27+
```bash
28+
pip install git+https://github.com/yourusername/speechmatics-link.git@v0.1.0
29+
```
30+
31+
Install for development:
32+
33+
```bash
34+
git clone https://github.com/yourusername/speechmatics-link.git
35+
cd speechmatics-link
36+
pip install -e ".[dev]"
37+
```
38+
39+
## Configuration
40+
41+
### vCon Server Configuration
42+
43+
Add to your vcon-server `config.yml`:
44+
45+
```yaml
46+
links:
47+
speechmatics:
48+
module: speechmatics_vcon_link
49+
pip_name: git+https://github.com/yourusername/speechmatics-link.git@main
50+
options:
51+
api_key: ${SPEECHMATICS_API_KEY}
52+
save_native_format: true
53+
save_wtf_format: true
54+
model: "enhanced"
55+
enable_diarization: false
56+
skip_if_exists: true
57+
58+
chains:
59+
transcription_chain:
60+
links:
61+
- speechmatics
62+
ingress_lists:
63+
- incoming_calls
64+
storages:
65+
- postgres
66+
enabled: 1
67+
```
68+
69+
### Configuration Options
70+
71+
| Option | Type | Default | Description |
72+
|--------|------|---------|-------------|
73+
| `api_key` | string | `$SPEECHMATICS_API_KEY` | Speechmatics API key (required) |
74+
| `api_url` | string | `https://asr.api.speechmatics.com/v2` | Speechmatics API base URL |
75+
| `save_native_format` | bool | `true` | Store transcription in Speechmatics native format |
76+
| `save_wtf_format` | bool | `true` | Store transcription in WTF format |
77+
| `model` | string | `"enhanced"` | Transcription model: "standard" or "enhanced" |
78+
| `language` | string | `null` | Language code (null for auto-detect) |
79+
| `enable_diarization` | bool | `false` | Enable speaker diarization |
80+
| `diarization_max_speakers` | int | `null` | Maximum speakers for diarization |
81+
| `poll_interval` | int | `5` | Seconds between job status checks |
82+
| `max_poll_attempts` | int | `120` | Maximum polling attempts before timeout |
83+
| `skip_if_exists` | bool | `true` | Skip dialogs with existing transcription |
84+
| `redis_host` | string | `"localhost"` | Redis host |
85+
| `redis_port` | int | `6379` | Redis port |
86+
| `redis_db` | int | `0` | Redis database number |
87+
88+
### Environment Variables
89+
90+
Set your Speechmatics API key:
91+
92+
```bash
93+
export SPEECHMATICS_API_KEY=your-api-key-here
94+
```
95+
96+
## Output Formats
97+
98+
### Speechmatics Native Format
99+
100+
Stored as vCon analysis with type `speechmatics_transcription`. Contains the complete Speechmatics API response including:
101+
102+
- Word-level timing and confidence
103+
- Punctuation
104+
- Speaker labels (if diarization enabled)
105+
- Job metadata
106+
107+
### WTF Format
108+
109+
Stored as vCon analysis with type `wtf_transcription`. Follows the WTF schema defined in [draft-howe-vcon-wtf-extension-01](https://datatracker.ietf.org/doc/draft-howe-vcon-wtf-extension/):
110+
111+
```json
112+
{
113+
"transcript": {
114+
"text": "Complete transcription text...",
115+
"language": "en-US",
116+
"duration": 65.2,
117+
"confidence": 0.95
118+
},
119+
"segments": [
120+
{
121+
"id": 0,
122+
"start": 0.5,
123+
"end": 4.8,
124+
"text": "Hello, this is Alice from customer service.",
125+
"confidence": 0.97,
126+
"speaker": "S1",
127+
"words": [0, 1, 2, 3, 4, 5, 6]
128+
}
129+
],
130+
"words": [
131+
{
132+
"id": 0,
133+
"start": 0.5,
134+
"end": 0.8,
135+
"text": "Hello",
136+
"confidence": 0.98,
137+
"speaker": "S1",
138+
"is_punctuation": false
139+
}
140+
],
141+
"speakers": {
142+
"S1": {
143+
"id": "S1",
144+
"label": "Speaker S1",
145+
"segments": [0, 1],
146+
"total_time": 4.3,
147+
"confidence": 0.97
148+
}
149+
},
150+
"metadata": {
151+
"created_at": "2025-01-02T12:15:30Z",
152+
"processed_at": "2025-01-02T12:16:35Z",
153+
"provider": "speechmatics",
154+
"model": "enhanced",
155+
"processing_time": 12.5,
156+
"audio": {
157+
"duration": 65.2,
158+
"format": "wav"
159+
}
160+
},
161+
"quality": {
162+
"audio_quality": "high",
163+
"average_confidence": 0.95,
164+
"low_confidence_words": 0,
165+
"multiple_speakers": true
166+
},
167+
"extensions": {
168+
"speechmatics": {
169+
"job": { ... },
170+
"format": "2.9"
171+
}
172+
}
173+
}
174+
```
175+
176+
## Usage Examples
177+
178+
### Basic Usage
179+
180+
The link automatically processes audio dialogs in vCons:
181+
182+
```python
183+
# The run() function is called by vcon-server
184+
from speechmatics_vcon_link import run
185+
186+
result = run(
187+
vcon_uuid="your-vcon-uuid",
188+
link_name="speechmatics",
189+
opts={
190+
"api_key": "your-api-key",
191+
"save_native_format": True,
192+
"save_wtf_format": True,
193+
}
194+
)
195+
```
196+
197+
### Using the Client Directly
198+
199+
```python
200+
from speechmatics_vcon_link.client import SpeechmaticsClient, TranscriptionConfig
201+
202+
client = SpeechmaticsClient(api_key="your-api-key")
203+
204+
# Configure transcription
205+
config = TranscriptionConfig(
206+
language="en",
207+
operating_point="enhanced",
208+
enable_diarization=True,
209+
)
210+
211+
# Transcribe audio
212+
result = client.transcribe(
213+
audio_url="https://example.com/audio.wav",
214+
config=config,
215+
)
216+
217+
print(result)
218+
```
219+
220+
### Converting to WTF Format
221+
222+
```python
223+
from speechmatics_vcon_link.converter import convert_to_wtf
224+
225+
# Convert Speechmatics response to WTF format
226+
wtf_result = convert_to_wtf(
227+
speechmatics_response,
228+
created_at="2025-01-02T12:00:00Z",
229+
processing_time=12.5,
230+
)
231+
232+
print(wtf_result["transcript"]["text"])
233+
```
234+
235+
## Development
236+
237+
### Running Tests
238+
239+
```bash
240+
# Install dev dependencies
241+
pip install -e ".[dev]"
242+
243+
# Run all tests
244+
pytest
245+
246+
# Run with coverage
247+
pytest --cov=speechmatics_vcon_link --cov-report=html
248+
249+
# Run specific test file
250+
pytest tests/test_converter.py -v
251+
```
252+
253+
### Code Formatting
254+
255+
```bash
256+
black speechmatics_vcon_link/
257+
```
258+
259+
### Project Structure
260+
261+
```
262+
speechmatics-link/
263+
speechmatics_vcon_link/
264+
__init__.py # Main link implementation
265+
client.py # Speechmatics API client
266+
converter.py # WTF format converter
267+
tests/
268+
test_link.py # Link tests
269+
test_client.py # Client tests
270+
test_converter.py # Converter tests
271+
fixtures/ # Test data
272+
pyproject.toml # Package config
273+
README.md # This file
274+
```
275+
276+
## Troubleshooting
277+
278+
### API Key Issues
279+
280+
- Ensure `SPEECHMATICS_API_KEY` is set or `api_key` is provided in options
281+
- Verify your API key has batch transcription permissions
282+
- Check API key hasn't expired
283+
284+
### Audio URL Issues
285+
286+
- Audio must be accessible via HTTP/HTTPS URL
287+
- Supported formats: WAV, MP3, FLAC, OGG, WebM
288+
- URL must be publicly accessible or use signed URLs
289+
290+
### Timeout Errors
291+
292+
- Increase `max_poll_attempts` for longer audio files
293+
- Check Speechmatics service status
294+
- Verify audio file isn't corrupted
295+
296+
### Redis Connection Issues
297+
298+
- Verify Redis is running and accessible
299+
- Check `redis_host`, `redis_port`, `redis_db` settings
300+
- Ensure Redis JSON module is available
301+
302+
### Module Not Found
303+
304+
- Verify package is installed: `pip list | grep speechmatics`
305+
- Check module name in config matches: `speechmatics_vcon_link`
306+
- Restart vcon-server after installation
307+
308+
## License
309+
310+
MIT License - see [LICENSE](LICENSE) for details.
311+
312+
## References
313+
314+
- [Speechmatics API Documentation](https://docs.speechmatics.com/)
315+
- [vCon Specification](https://datatracker.ietf.org/wg/vcon/documents/)
316+
- [WTF Extension Draft](https://datatracker.ietf.org/doc/draft-howe-vcon-wtf-extension/)
317+
- [vCon Server Documentation](https://github.com/vcon-dev/vcon-server)
318+

0 commit comments

Comments
 (0)