feat: Add Whisper STT Extension using faster-whisper #1984

Nsuccess · 2026-01-14T17:29:00Z

Description

This PR adds a Whisper STT (Speech-to-Text) extension for the TEN Framework using the faster-whisper library.

Closes #1969

Features

✅ Complete ASR Implementation: Inherits from AsyncASRBaseExtension following TEN Framework patterns
✅ Optimized Performance: Uses faster-whisper (4x faster than openai/whisper)
✅ Multiple Model Sizes: Support for tiny, base, small, medium, large-v1/v2/v3
✅ CPU & GPU Support: Configurable device and compute types (int8, float16, float32)
✅ Multi-Language: 99+ languages with automatic detection
✅ Translation: Translate speech to English
✅ VAD Filtering: Built-in voice activity detection using Silero VAD
✅ Production-Ready: Auto-reconnection, audio dumping, standardized logging
✅ Buffer Strategy: Keep mode with 10MB limit for timestamp accuracy

Implementation Details

Architecture

Extension: WhisperSTTExtension - Main ASR extension class
Client: WhisperClient - Handles faster-whisper model and inference
Config: WhisperSTTConfig - Pass-through params design for flexibility
Reconnection: Exponential backoff with configurable max attempts

Files Added (14 files, 1,324 lines)

whisper_stt_python/extension.py - Main extension implementation
whisper_stt_python/whisper_client.py - Faster-whisper client wrapper
whisper_stt_python/config.py - Configuration management
whisper_stt_python/reconnect_manager.py - Auto-reconnection logic
whisper_stt_python/addon.py - Extension entry point
whisper_stt_python/const.py - Constants
whisper_stt_python/manifest.json - Extension metadata
whisper_stt_python/property.json - Default configuration
whisper_stt_python/requirements.txt - Dependencies
whisper_stt_python/README.md - Comprehensive documentation
whisper_stt_python/tests/test_config.py - Config tests (10 tests)
whisper_stt_python/tests/test_extension.py - Extension tests (15 tests)

Testing

✅ 25 Unit Tests: Full coverage with mock-based testing
✅ Config Tests: Default values, JSON parsing, sensitive masking, language normalization
✅ Extension Tests: Initialization, connection, audio sending, finalize, callbacks
✅ No Real API Calls: All tests use mocks for fast, reliable execution

Configuration Example

{
  "params": {
    "model": "base",
    "device": "cpu",
    "compute_type": "int8",
    "language": "en",
    "task": "transcribe",
    "sample_rate": 16000
  }
}

- Implements text-to-speech using NVIDIA Riva Speech Skills - Supports streaming synthesis with gRPC - Includes comprehensive tests and documentation - Follows TTS2 interface pattern Closes TEN-framework#1964

- Implements text-to-speech using Speechmatics TTS API - Supports low-latency streaming synthesis (sub-150ms) - Includes 4 voice options (UK and US English) - Comprehensive tests and documentation - Follows TTS2 HTTP interface pattern Closes TEN-framework#1965

- Implements ASR extension for OpenAI Whisper model - Uses faster-whisper library (4x faster than openai/whisper) - Supports all Whisper model sizes (tiny to large-v3) - CPU and GPU execution with multiple compute types - 99+ languages support with auto-detection - Translation to English capability - VAD filtering with Silero VAD - Auto-reconnection with exponential backoff - Audio dumping for debugging - Keep mode buffer strategy for timestamp accuracy - 25 unit tests with mock-based testing - Comprehensive documentation and examples Closes TEN-framework#1969

Nsuccess added 3 commits January 14, 2026 16:46

feat: Add NVIDIA Riva TTS extension (TEN-framework#1964)

0e25a95

- Implements text-to-speech using NVIDIA Riva Speech Skills - Supports streaming synthesis with gRPC - Includes comprehensive tests and documentation - Follows TTS2 interface pattern Closes TEN-framework#1964

Nsuccess requested review from halajohn and plutoless as code owners January 14, 2026 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Whisper STT Extension using faster-whisper #1984

feat: Add Whisper STT Extension using faster-whisper #1984

Uh oh!

Nsuccess commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add Whisper STT Extension using faster-whisper #1984

Are you sure you want to change the base?

feat: Add Whisper STT Extension using faster-whisper #1984

Uh oh!

Conversation

Nsuccess commented Jan 14, 2026

Description

Features

Implementation Details

Architecture

Files Added (14 files, 1,324 lines)

Testing

Configuration Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant