Real-time speech-to-text and translation between English, Japanese, and Chinese - all in your terminal!
- 🎯 Three-Language Focus: English (🇺🇸), Japanese (🇯🇵), Chinese (🇨🇳)
- ⚡ Real-time Processing: Live audio capture and transcription
- 🔍 Auto-Language Detection: Automatically identifies spoken language
- 🔄 Instant Translation: Translates between all three languages simultaneously
- 💻 Terminal-Based: No GUI required - runs entirely in command line
- 🚀 High Performance: Uses faster-whisper for optimal speed
- 🎵 Smart Audio: Intelligent silence detection and chunking
- 🚫 Hallucination Detection: Filters out common AI-generated false transcriptions
- ⚙️ Configurable: Customizable settings via config file or command line
# Install Python requirements
pip install -r requirements.txt
# Install Ollama for translation (optional but recommended)
# Download from: https://ollama.ai/# Basic usage with default settings
python main.py
# With custom configuration
python main.py --model large --chunk-duration 5
# Using configuration file
python main.py --config config.json- Speak in English, Japanese, or Chinese
- The system will automatically detect the language
- Get instant transcriptions and translations
- Press
Ctrl+Cto stop
- Python 3.8+
- Microphone (built-in or external)
- 6GB+ RAM (8GB+ recommended)
- Windows 10/11, macOS, or Linux
numpy>=1.21.0 # Numerical computing
sounddevice>=0.4.6 # Audio capture
faster-whisper>=1.1.0 # Speech recognition (recommended)
# OR openai-whisper # Alternative implementation
scipy>=1.7.0 # Audio processing
soundfile>=0.12.0 # Audio I/O
requests>=2.25.0 # HTTP translation APIs
- Ollama: For local translation (recommended)
- CUDA: For GPU acceleration with faster-whisper
- PyTorch: For enhanced CUDA support
By default, models are downloaded automatically on first run into ./models/.
To pre-download or manage models manually:
- Just run
python main.pyonce; it will cache under./models/.
from faster_whisper import WhisperModel
model = WhisperModel("base", download_root="./models")Expected structure for the base model:
models/
models--Systran--faster-whisper-base/
snapshots/<hash>/
config.json
model.bin
tokenizer.json
vocabulary.txt
For tiny model:
models/
models--Systran--faster-whisper-tiny/
snapshots/<hash>/
config.json
model.bin
tokenizer.json
vocabulary.txt
If you already see these folders with model.bin inside, you're good to go.
| Option | Description | Default |
|---|---|---|
--model |
Whisper model size | base |
--languages |
Supported languages | en ja zh |
--chunk-duration |
Audio chunk length (seconds) | 3.0 |
--sample-rate |
Audio sample rate (Hz) | 16000 |
--silence-threshold |
Silence detection (seconds) | 0.5 |
--use-cuda |
Enable CUDA acceleration | False |
--config |
Configuration file path | None |
{
"model_size": "base",
"languages": ["en", "ja", "zh"],
"chunk_duration": 3.0,
"sample_rate": 16000,
"silence_threshold": 0.5,
"use_cuda": false,
"translation_providers": {
"ollama": {
"enabled": true,
"model": "schroneko/gemma-2-2b-jpn-it"
}
}
}The system automatically detects and filters out common AI hallucination patterns:
- Japanese: ご視聴ありがとうございました, お疲れ様でした, 終わり, etc.
- English: "Thanks for watching!", "Goodbye", "The end", etc.
- Chinese: 那他说他说, 最初, 重新开始吧, etc.
- Mixed Languages: Cyrillic, Devanagari, Portuguese phrases
- Generic Responses: Very short, repetitive, or ending phrases
{
"detection_settings": {
"enable_hallucination_detection": true,
"strict_mode": false,
"log_hallucinations": true,
"skip_hallucinations": true
}
}- Business Meetings: Real-time translation during international calls
- Academic Conferences: Multilingual presentations and discussions
- Customer Support: Serve customers in their preferred language
- Pronunciation Practice: Get instant feedback on speech
- Translation Exercises: Compare expressions across languages
- Listening Comprehension: Practice with native speakers
- Interpreting Services: Real-time language bridge
- Documentation: Multilingual meeting transcripts
- Training Sessions: International team training
- Local Communication: Speak with locals in their language
- Navigation Help: Get directions and information
- Cultural Exchange: Bridge language barriers instantly
# Tiny model - fastest, least accurate
python main.py --model tiny
# Base model - balanced performance (default)
python main.py --model base
# Large model - most accurate, slower
python main.py --model large# Only English and Japanese
python main.py --languages en ja
# Only Chinese and English
python main.py --languages zh en# Use CUDA acceleration (if available)
python main.py --use-cuda
# Adjust audio chunk duration
python main.py --chunk-duration 2.0
# Custom silence threshold
python main.py --silence-threshold 0.3# Check microphone permissions
# Verify audio device selection
# Test with: python -c "import sounddevice; print(sounddevice.query_devices())"# Install Ollama: https://ollama.ai/
# Pull translation model: ollama pull schroneko/gemma-2-2b-jpn-it
# Verify Ollama is running: ollama list# Use smaller model: --model tiny
# Reduce chunk duration: --chunk-duration 2.0
# Enable CUDA: --use-cuda| Error | Solution |
|---|---|
No Whisper implementation found |
Install faster-whisper or openai-whisper |
Audio capture error |
Check microphone permissions and device |
Translation error |
Verify Ollama installation and model |
CUDA not available |
Install PyTorch with CUDA support |
| Model | Speed | Accuracy | Memory |
|---|---|---|---|
tiny |
⚡⚡⚡ | ⭐⭐ | 💾 |
base |
⚡⚡ | ⭐⭐⭐ | 💾💾 |
small |
⚡ | ⭐⭐⭐⭐ | 💾💾💾 |
medium |
🐌 | ⭐⭐⭐⭐⭐ | 💾💾💾💾 |
large |
🐌🐌 | ⭐⭐⭐⭐⭐ | 💾💾💾💾💾 |
| Language | Detection Rate | Translation Quality |
|---|---|---|
| English | 98%+ | Excellent |
| Japanese | 95%+ | Very Good |
| Chinese | 95%+ | Very Good |
# Update Python packages
pip install --upgrade -r requirements.txt
# Update Ollama models
ollama pull schroneko/gemma-2-2b-jpn-it- Whisper models are automatically downloaded on first run
- Models are cached locally in
./models/directory - Update models by deleting cache and restarting
- Support for additional languages
- Enhanced translation providers
- Audio format improvements
- Performance optimizations
- OpenAI Whisper: Speech recognition technology
- Faster Whisper: Performance-optimized Whisper implementation
- Ollama: Local language model inference
- SoundDevice: Audio capture and processing
- Issues: Check existing GitHub issues
- Documentation: Review this README thoroughly
- Community: Join our discussion forum
When reporting issues, please include:
- Operating system and version
- Python version
- Error messages and logs
- Steps to reproduce the issue
(cribe) PS> python main.py
Using faster-whisper for enhanced performance
Loaded faster-whisper model: base on cpu
ollama translation available
V HTTP translation available
Starting Enhanced Terminal Transcription & Translation
Languages: Us English, JP Japanese, CN Chinese
Sample Rate: 16000Hz
Chunk Duration: 3.0s
Silence Threshold: 0.5s
Press Ctrl+C to stop
Speak now! Audio is being captured in 3.0s chunks ...
Audio capture started
[16:48:20][JP Japanese]私の音が聞きますか?
[us English] Do you hear my voice?
[CN Chinese] 我的声音听吗?
[16:48:24][JP Japanese]聞いてるものが分かってますか?
[us English] Do you know what you're looking at?
[CN Chinese] 你听到的东西能理解吗?
Stopping transcription ...
Transcription stopped successfully🎉 Ready to break down language barriers? Start transcribing and translating in real-time!