|
| 1 | +# Speechmatics vCon Link |
| 2 | + |
| 3 | +A vCon server link for transcribing audio dialogs using the [Speechmatics](https://www.speechmatics.com/) API. This link processes vCon objects and stores transcription results in both Speechmatics native format and the standardized [World Transcription Format (WTF)](https://datatracker.ietf.org/doc/draft-howe-vcon-wtf-extension/). |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- Batch transcription of audio dialogs via Speechmatics API |
| 8 | +- Dual output formats: |
| 9 | + - Speechmatics native JSON format (preserves all provider-specific data) |
| 10 | + - WTF format (standardized transcription schema for interoperability) |
| 11 | +- Speaker diarization support |
| 12 | +- Automatic language detection |
| 13 | +- Idempotent processing (skip already transcribed dialogs) |
| 14 | +- Retry logic with exponential backoff |
| 15 | +- Comprehensive logging |
| 16 | + |
| 17 | +## Installation |
| 18 | + |
| 19 | +Install directly from GitHub: |
| 20 | + |
| 21 | +```bash |
| 22 | +pip install git+https://github.com/yourusername/speechmatics-link.git |
| 23 | +``` |
| 24 | + |
| 25 | +Install from a specific version/tag: |
| 26 | + |
| 27 | +```bash |
| 28 | +pip install git+https://github.com/yourusername/speechmatics-link.git@v0.1.0 |
| 29 | +``` |
| 30 | + |
| 31 | +Install for development: |
| 32 | + |
| 33 | +```bash |
| 34 | +git clone https://github.com/yourusername/speechmatics-link.git |
| 35 | +cd speechmatics-link |
| 36 | +pip install -e ".[dev]" |
| 37 | +``` |
| 38 | + |
| 39 | +## Configuration |
| 40 | + |
| 41 | +### vCon Server Configuration |
| 42 | + |
| 43 | +Add to your vcon-server `config.yml`: |
| 44 | + |
| 45 | +```yaml |
| 46 | +links: |
| 47 | + speechmatics: |
| 48 | + module: speechmatics_vcon_link |
| 49 | + pip_name: git+https://github.com/yourusername/speechmatics-link.git@main |
| 50 | + options: |
| 51 | + api_key: ${SPEECHMATICS_API_KEY} |
| 52 | + save_native_format: true |
| 53 | + save_wtf_format: true |
| 54 | + model: "enhanced" |
| 55 | + enable_diarization: false |
| 56 | + skip_if_exists: true |
| 57 | + |
| 58 | +chains: |
| 59 | + transcription_chain: |
| 60 | + links: |
| 61 | + - speechmatics |
| 62 | + ingress_lists: |
| 63 | + - incoming_calls |
| 64 | + storages: |
| 65 | + - postgres |
| 66 | + enabled: 1 |
| 67 | +``` |
| 68 | +
|
| 69 | +### Configuration Options |
| 70 | +
|
| 71 | +| Option | Type | Default | Description | |
| 72 | +|--------|------|---------|-------------| |
| 73 | +| `api_key` | string | `$SPEECHMATICS_API_KEY` | Speechmatics API key (required) | |
| 74 | +| `api_url` | string | `https://asr.api.speechmatics.com/v2` | Speechmatics API base URL | |
| 75 | +| `save_native_format` | bool | `true` | Store transcription in Speechmatics native format | |
| 76 | +| `save_wtf_format` | bool | `true` | Store transcription in WTF format | |
| 77 | +| `model` | string | `"enhanced"` | Transcription model: "standard" or "enhanced" | |
| 78 | +| `language` | string | `null` | Language code (null for auto-detect) | |
| 79 | +| `enable_diarization` | bool | `false` | Enable speaker diarization | |
| 80 | +| `diarization_max_speakers` | int | `null` | Maximum speakers for diarization | |
| 81 | +| `poll_interval` | int | `5` | Seconds between job status checks | |
| 82 | +| `max_poll_attempts` | int | `120` | Maximum polling attempts before timeout | |
| 83 | +| `skip_if_exists` | bool | `true` | Skip dialogs with existing transcription | |
| 84 | +| `redis_host` | string | `"localhost"` | Redis host | |
| 85 | +| `redis_port` | int | `6379` | Redis port | |
| 86 | +| `redis_db` | int | `0` | Redis database number | |
| 87 | + |
| 88 | +### Environment Variables |
| 89 | + |
| 90 | +Set your Speechmatics API key: |
| 91 | + |
| 92 | +```bash |
| 93 | +export SPEECHMATICS_API_KEY=your-api-key-here |
| 94 | +``` |
| 95 | + |
| 96 | +## Output Formats |
| 97 | + |
| 98 | +### Speechmatics Native Format |
| 99 | + |
| 100 | +Stored as vCon analysis with type `speechmatics_transcription`. Contains the complete Speechmatics API response including: |
| 101 | + |
| 102 | +- Word-level timing and confidence |
| 103 | +- Punctuation |
| 104 | +- Speaker labels (if diarization enabled) |
| 105 | +- Job metadata |
| 106 | + |
| 107 | +### WTF Format |
| 108 | + |
| 109 | +Stored as vCon analysis with type `wtf_transcription`. Follows the WTF schema defined in [draft-howe-vcon-wtf-extension-01](https://datatracker.ietf.org/doc/draft-howe-vcon-wtf-extension/): |
| 110 | + |
| 111 | +```json |
| 112 | +{ |
| 113 | + "transcript": { |
| 114 | + "text": "Complete transcription text...", |
| 115 | + "language": "en-US", |
| 116 | + "duration": 65.2, |
| 117 | + "confidence": 0.95 |
| 118 | + }, |
| 119 | + "segments": [ |
| 120 | + { |
| 121 | + "id": 0, |
| 122 | + "start": 0.5, |
| 123 | + "end": 4.8, |
| 124 | + "text": "Hello, this is Alice from customer service.", |
| 125 | + "confidence": 0.97, |
| 126 | + "speaker": "S1", |
| 127 | + "words": [0, 1, 2, 3, 4, 5, 6] |
| 128 | + } |
| 129 | + ], |
| 130 | + "words": [ |
| 131 | + { |
| 132 | + "id": 0, |
| 133 | + "start": 0.5, |
| 134 | + "end": 0.8, |
| 135 | + "text": "Hello", |
| 136 | + "confidence": 0.98, |
| 137 | + "speaker": "S1", |
| 138 | + "is_punctuation": false |
| 139 | + } |
| 140 | + ], |
| 141 | + "speakers": { |
| 142 | + "S1": { |
| 143 | + "id": "S1", |
| 144 | + "label": "Speaker S1", |
| 145 | + "segments": [0, 1], |
| 146 | + "total_time": 4.3, |
| 147 | + "confidence": 0.97 |
| 148 | + } |
| 149 | + }, |
| 150 | + "metadata": { |
| 151 | + "created_at": "2025-01-02T12:15:30Z", |
| 152 | + "processed_at": "2025-01-02T12:16:35Z", |
| 153 | + "provider": "speechmatics", |
| 154 | + "model": "enhanced", |
| 155 | + "processing_time": 12.5, |
| 156 | + "audio": { |
| 157 | + "duration": 65.2, |
| 158 | + "format": "wav" |
| 159 | + } |
| 160 | + }, |
| 161 | + "quality": { |
| 162 | + "audio_quality": "high", |
| 163 | + "average_confidence": 0.95, |
| 164 | + "low_confidence_words": 0, |
| 165 | + "multiple_speakers": true |
| 166 | + }, |
| 167 | + "extensions": { |
| 168 | + "speechmatics": { |
| 169 | + "job": { ... }, |
| 170 | + "format": "2.9" |
| 171 | + } |
| 172 | + } |
| 173 | +} |
| 174 | +``` |
| 175 | + |
| 176 | +## Usage Examples |
| 177 | + |
| 178 | +### Basic Usage |
| 179 | + |
| 180 | +The link automatically processes audio dialogs in vCons: |
| 181 | + |
| 182 | +```python |
| 183 | +# The run() function is called by vcon-server |
| 184 | +from speechmatics_vcon_link import run |
| 185 | +
|
| 186 | +result = run( |
| 187 | + vcon_uuid="your-vcon-uuid", |
| 188 | + link_name="speechmatics", |
| 189 | + opts={ |
| 190 | + "api_key": "your-api-key", |
| 191 | + "save_native_format": True, |
| 192 | + "save_wtf_format": True, |
| 193 | + } |
| 194 | +) |
| 195 | +``` |
| 196 | + |
| 197 | +### Using the Client Directly |
| 198 | + |
| 199 | +```python |
| 200 | +from speechmatics_vcon_link.client import SpeechmaticsClient, TranscriptionConfig |
| 201 | +
|
| 202 | +client = SpeechmaticsClient(api_key="your-api-key") |
| 203 | +
|
| 204 | +# Configure transcription |
| 205 | +config = TranscriptionConfig( |
| 206 | + language="en", |
| 207 | + operating_point="enhanced", |
| 208 | + enable_diarization=True, |
| 209 | +) |
| 210 | +
|
| 211 | +# Transcribe audio |
| 212 | +result = client.transcribe( |
| 213 | + audio_url="https://example.com/audio.wav", |
| 214 | + config=config, |
| 215 | +) |
| 216 | +
|
| 217 | +print(result) |
| 218 | +``` |
| 219 | + |
| 220 | +### Converting to WTF Format |
| 221 | + |
| 222 | +```python |
| 223 | +from speechmatics_vcon_link.converter import convert_to_wtf |
| 224 | +
|
| 225 | +# Convert Speechmatics response to WTF format |
| 226 | +wtf_result = convert_to_wtf( |
| 227 | + speechmatics_response, |
| 228 | + created_at="2025-01-02T12:00:00Z", |
| 229 | + processing_time=12.5, |
| 230 | +) |
| 231 | +
|
| 232 | +print(wtf_result["transcript"]["text"]) |
| 233 | +``` |
| 234 | + |
| 235 | +## Development |
| 236 | + |
| 237 | +### Running Tests |
| 238 | + |
| 239 | +```bash |
| 240 | +# Install dev dependencies |
| 241 | +pip install -e ".[dev]" |
| 242 | +
|
| 243 | +# Run all tests |
| 244 | +pytest |
| 245 | +
|
| 246 | +# Run with coverage |
| 247 | +pytest --cov=speechmatics_vcon_link --cov-report=html |
| 248 | +
|
| 249 | +# Run specific test file |
| 250 | +pytest tests/test_converter.py -v |
| 251 | +``` |
| 252 | + |
| 253 | +### Code Formatting |
| 254 | + |
| 255 | +```bash |
| 256 | +black speechmatics_vcon_link/ |
| 257 | +``` |
| 258 | + |
| 259 | +### Project Structure |
| 260 | + |
| 261 | +``` |
| 262 | +speechmatics-link/ |
| 263 | + speechmatics_vcon_link/ |
| 264 | + __init__.py # Main link implementation |
| 265 | + client.py # Speechmatics API client |
| 266 | + converter.py # WTF format converter |
| 267 | + tests/ |
| 268 | + test_link.py # Link tests |
| 269 | + test_client.py # Client tests |
| 270 | + test_converter.py # Converter tests |
| 271 | + fixtures/ # Test data |
| 272 | + pyproject.toml # Package config |
| 273 | + README.md # This file |
| 274 | +``` |
| 275 | +
|
| 276 | +## Troubleshooting |
| 277 | +
|
| 278 | +### API Key Issues |
| 279 | +
|
| 280 | +- Ensure `SPEECHMATICS_API_KEY` is set or `api_key` is provided in options |
| 281 | +- Verify your API key has batch transcription permissions |
| 282 | +- Check API key hasn't expired |
| 283 | +
|
| 284 | +### Audio URL Issues |
| 285 | +
|
| 286 | +- Audio must be accessible via HTTP/HTTPS URL |
| 287 | +- Supported formats: WAV, MP3, FLAC, OGG, WebM |
| 288 | +- URL must be publicly accessible or use signed URLs |
| 289 | +
|
| 290 | +### Timeout Errors |
| 291 | +
|
| 292 | +- Increase `max_poll_attempts` for longer audio files |
| 293 | +- Check Speechmatics service status |
| 294 | +- Verify audio file isn't corrupted |
| 295 | +
|
| 296 | +### Redis Connection Issues |
| 297 | +
|
| 298 | +- Verify Redis is running and accessible |
| 299 | +- Check `redis_host`, `redis_port`, `redis_db` settings |
| 300 | +- Ensure Redis JSON module is available |
| 301 | +
|
| 302 | +### Module Not Found |
| 303 | +
|
| 304 | +- Verify package is installed: `pip list | grep speechmatics` |
| 305 | +- Check module name in config matches: `speechmatics_vcon_link` |
| 306 | +- Restart vcon-server after installation |
| 307 | +
|
| 308 | +## License |
| 309 | +
|
| 310 | +MIT License - see [LICENSE](LICENSE) for details. |
| 311 | +
|
| 312 | +## References |
| 313 | +
|
| 314 | +- [Speechmatics API Documentation](https://docs.speechmatics.com/) |
| 315 | +- [vCon Specification](https://datatracker.ietf.org/wg/vcon/documents/) |
| 316 | +- [WTF Extension Draft](https://datatracker.ietf.org/doc/draft-howe-vcon-wtf-extension/) |
| 317 | +- [vCon Server Documentation](https://github.com/vcon-dev/vcon-server) |
| 318 | +
|
0 commit comments