Features โข Quick Start โข Usage โข Development โข Documentation โข Resources โข Contributing
Version 1.0.0 | Production Ready ๐
|
|
|
|
# Python 3.9 or higher (required for type checking with mypy)
python3 --version # Should be >= 3.9
# Optional: Install development tools
pip install --upgrade pip setuptools wheel# Clone the repository
git clone https://github.com/umitkacar/Speech-To-Text.git
cd Speech-To-Text
# Install package (PyAudio optional - see below)
pip install -e .Note: PyAudio is optional! You can use the CLI for configuration, language listing, etc. without it. For actual speech recognition, install PyAudio separately (see below).
# Install system dependencies for PyAudio
sudo apt-get update
sudo apt-get install -y portaudio19-dev python3-pyaudio
# Install with audio support
pip install -e ".[audio]"
# Or install Python dependencies separately
pip install SpeechRecognition PyAudio pyttsx3 typer rich
# Install system dependencies
sudo apt-get update
sudo apt-get install -y \
python3-pyaudio \
portaudio19-dev \
libportaudio2 \
libportaudiocpp0 \
libasound-dev \
libasound2 \
alsa-utils \
alsa-oss# Install Homebrew dependencies
brew install portaudio
# Install Python packages
pip3 install SpeechRecognition PyAudio pyttsx3# Install Python packages
pip install SpeechRecognition PyAudio pyttsx3Note: On Windows, you may need to install Visual Studio Build Tools for PyAudio.
For detailed installation instructions, see INSTALL.md.
# Install the modern CLI
pip install -e .
# Listen once
speech-to-text-ai listen
# Continuous recognition
speech-to-text-ai continuous
# Interactive mode (with voice feedback)
speech-to-text-ai interactive
# List available devices
speech-to-text-ai devices
# Show all commands
speech-to-text-ai --help# Basic usage
speech-to-text-ai listen
# Specify language
speech-to-text-ai listen --language tr-TR
# Save to file
speech-to-text-ai listen -l en-US -o transcript.txt
# Custom microphone and timeout
speech-to-text-ai listen --mic "USB Audio" --timeout 30# Continuous recognition
speech-to-text-ai continuous -l en-US
# Save all results to file
speech-to-text-ai continuous -l tr-TR -o meeting_notes.txt
# Limit to 10 iterations
speech-to-text-ai continuous --max 10# Start interactive mode
speech-to-text-ai interactive -l en-US
# With custom settings
speech-to-text-ai interactive -l tr-TR --mic "Built-in Microphone"from speech_to_text_ai import SpeechRecognizer, MicrophoneManager, TextToSpeech
# Initialize components
mic_manager = MicrophoneManager(device_name="default")
recognizer = SpeechRecognizer(language="en-US", mic_manager=mic_manager)
# Single recognition
result = recognizer.recognize_once()
if result.success:
print(f"โ Recognized: {result.text}")
else:
print(f"โ Error: {result.error}")
# Interactive mode with TTS
tts = TextToSpeech()
while True:
result = recognizer.recognize_once()
if result.success:
print(f"You said: {result.text}")
tts.speak(result.text)# Clone repository
git clone https://github.com/umitkacar/Speech-To-Text.git
cd Speech-To-Text
# Install with development dependencies
pip install -e ".[dev,audio]"
# Install pre-commit hooks
pre-commit install# Run all tests (sequential)
pytest
# Run tests in parallel (50% faster!)
pytest -n auto
# Run with coverage report
pytest --cov=src/speech_to_text_ai --cov-report=html
# Run specific test markers
pytest -m unit # Only unit tests
pytest -m "not slow" # Skip slow tests# Run all pre-commit hooks
pre-commit run --all-files
# Individual checks
ruff check src/ tests/ # Linting
black src/ tests/ # Formatting
mypy src/speech_to_text_ai # Type checking
pip-audit --desc # Security audit
# Or use Hatch scripts
hatch run test # Run tests
hatch run test-parallel # Parallel tests
hatch run test-cov # Tests with coverage
hatch run audit # Security auditPre-commit Hooks (11 automated checks):
- โ Ruff (linting)
- โ Black (formatting)
- โ isort (import sorting)
- โ Mypy (type checking)
- โ Bandit (security scanning)
- โ pip-audit (dependency vulnerabilities)
- โ pytest-check (parallel testing)
- โ coverage-check (70% threshold)
- โ codespell (spell checking)
- โ mdformat (markdown)
- โ YAML formatter
Current Quality Metrics:
- โ 21/22 tests passing (1 skipped - PyAudio optional)
- โ Zero security vulnerabilities
- โ Zero type errors (mypy)
- โ 100% type coverage in core modules
- โ Zero linting errors (Ruff)
- README.md - This file (overview, quick start)
- INSTALL.md - Detailed installation guide
- CLI_USAGE.md - Complete CLI documentation
- DEVELOPMENT.md - Development setup and workflow
- CONTRIBUTING.md - Contribution guidelines
- QUALITY_CHECKLIST.md - Quality assurance checklist
- LESSONS_LEARNED.md - Project learnings and best practices
- CHANGELOG.md - Version history and changes
- Legacy Scripts - Original Python scripts (google_api_*.py)
| Project | Description | Stars | Tech |
|---|---|---|---|
| Whisper | OpenAI's robust speech recognition (SOTA 2024) | PyTorch, Transformers | |
| Faster Whisper | Optimized Whisper implementation (4x faster) | CTranslate2 | |
| Whisper.cpp | C++ port of Whisper (edge devices) | C++, WASM | |
| Vosk | Offline speech recognition (100+ languages) | Kaldi | |
| SpeechBrain | All-in-one speech toolkit | PyTorch | |
| Wav2Vec 2.0 | Meta's self-supervised speech model | PyTorch | |
| NeMo ASR | NVIDIA's conversational AI toolkit | PyTorch |
| Project | Description | Use Case |
|---|---|---|
| Deepgram SDK | Production-grade ASR API | Enterprise applications |
| AssemblyAI | Modern speech-to-text API | Real-time transcription |
| Azure Speech SDK | Microsoft's Speech Services | Cloud integration |
| Amazon Transcribe | AWS speech recognition | Scalable solutions |
| Project | Innovation | GitHub |
|---|---|---|
| Distil-Whisper | 6x faster Whisper variant | โญ Trending |
| Seamless M4T | Multilingual speech translation | Meta AI |
| MMS (Massively Multilingual Speech) | 1000+ languages support | Meta Research |
| Canary | NVIDIA's multilingual ASR | SOTA 2024 |
- ๐ Speech Recognition Course - DeepLearning.AI
- ๐ฅ Whisper Tutorial Series - Latest tutorials
- ๐ ASR Papers - State-of-the-art research
- ๐ ๏ธ Hugging Face Audio - Pre-trained models
- ๐๏ธ Audio Processing - Modern audio manipulation
- ๐๏ธ Noise Reduction - AI-powered noise cancellation
- ๐ Speech Analytics - Audio feature extraction
- ๐ต Librosa - Audio analysis library
| Service | Accuracy | Speed | Languages | Free Tier | Best For |
|---|---|---|---|---|---|
| Google Cloud Speech | โญโญโญโญโญ | Fast | 125+ | 60 min/month | General purpose |
| Deepgram | โญโญโญโญโญ | Very Fast | 30+ | $200 credit | Real-time apps |
| AssemblyAI | โญโญโญโญโญ | Fast | 15+ | 5 hours | Transcription |
| Azure Speech | โญโญโญโญ | Medium | 100+ | 5 hours | Enterprise |
| Amazon Transcribe | โญโญโญโญ | Fast | 35+ | 60 min/month | AWS ecosystem |
| Whisper (Self-hosted) | โญโญโญโญโญ | Medium | 99 | Free | Privacy-first |
Speech-To-Text/
โโโ src/
โ โโโ speech_to_text_ai/ # Main package
โ โโโ __init__.py # Package initialization
โ โโโ __main__.py # Entry point
โ โโโ cli.py # CLI interface (Typer)
โ โโโ core/ # Core modules
โ โ โโโ recognizer.py # Speech recognition engine
โ โ โโโ microphone.py # Microphone management
โ โ โโโ speaker.py # Text-to-speech
โ โโโ config/ # Configuration
โ โ โโโ settings.py # Settings management
โ โโโ utils/ # Utilities
โ โโโ logger.py # Logging setup
โโโ tests/ # Test suite
โ โโโ test_recognizer.py
โ โโโ test_microphone.py
โ โโโ test_speaker.py
โ โโโ test_config.py
โโโ legacy/ # Original Python scripts
โ โโโ google_api_1.py
โ โโโ google_api_2.py
โ โโโ google_api_3_return.py
โโโ pyproject.toml # Project metadata (Hatch)
โโโ .pre-commit-config.yaml # Pre-commit hooks
โโโ Makefile # Development commands
โโโ README.md # This file
โโโ INSTALL.md # Installation guide
โโโ CLI_USAGE.md # CLI documentation
โโโ CONTRIBUTING.md # Contribution guidelines
โโโ CODE_OF_CONDUCT.md # Code of conduct
- โ Basic speech recognition
- โ Multi-language support (12 languages)
- โ Microphone integration
- โ Modern CLI with Typer + Rich
- โ Production-ready quality tooling
- โ Parallel testing with pytest-xdist
- โ Security scanning (zero vulnerabilities)
- โ Type safety with mypy (100% core coverage)
- โ Pre-commit hooks (11 automated checks)
- โ Comprehensive documentation
- ๐ง CI/CD Pipeline (GitHub Actions)
- ๐ง Docker containerization (multi-stage builds)
- ๐ง Increase test coverage (>70%)
- ๐ Integration tests (real API calls)
- ๐ Performance benchmarks (automated tracking)
- ๐ง Whisper integration (OpenAI SOTA model)
- ๐ง Real-time streaming (WebSocket support)
- ๐ GPU acceleration (CUDA support)
- ๐ Web interface (React dashboard)
- ๐ API endpoints (FastAPI/Flask)
- ๐ Multilingual models (Seamless M4T)
- ๐ Speaker diarization (Who spoke when)
- ๐ Emotion detection (Sentiment analysis)
# List all audio devices
python -c "import speech_recognition as sr; print(sr.Microphone.list_microphone_names())"# Open mixer
alsamixer
# Command line mixer
amixer
# Test recording
arecord -lContributions are what make the open source community amazing! Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
Distributed under the MIT License. See LICENSE for more information.
- OpenAI Whisper - Inspiration for modern ASR
- SpeechRecognition - Core library
- Google Cloud Speech - API provider
- PyTTSx3 - Text-to-speech engine