A real-time speech transcription system built with OpenAI Whisper supporting continuous audio recording and multiple languages.
- Real-time transcription with continuous audio recording
- Multi-language support with auto-detection
- Segmented processing with smart overlap handling
- Interactive setup or command-line configuration
- Multiple recording methods: sounddevice, pyaudio, speech_recognition
- Pause/Resume functionality with Ctrl+C menu
- Audio segment saving for debugging and quality analysis
- Energy threshold configuration for speech detection
- Python 3.8 or higher
- Conda (recommended) or pip
Using Conda (Recommended):
# Create and activate conda environment
conda create -n whisper-transcription python=3.9
conda activate whisper-transcription
# Clone the repository
git clone https://github.com/lancer1911/real-time-whisper-transcription.git
cd real-time-whisper-transcription
# Install dependencies
pip install -r requirements.txt
# Optional: Install additional packages
pip install sounddevice faster-whisperUsing pip:
# Clone the repository
git clone https://github.com/lancer1911/real-time-whisper-transcription.git
cd real-time-whisper-transcription
# Install dependencies
pip install -r requirements.txt
# Optional: Install additional packages
pip install sounddevice faster-whisperNote: The requirements.txt includes PyTorch with CUDA 11.6 support. For CPU-only installation, modify the torch installation:
pip install torch --index-url https://download.pytorch.org/whl/cpumacOS:
# Install portaudio for pyaudio
brew install portaudioUbuntu/Debian:
# Install portaudio and ffmpeg
sudo apt update
sudo apt install portaudio19-dev ffmpegWindows:
- Download and install FFmpeg from https://ffmpeg.org/download.html
- Add FFmpeg to PATH environment variable
# If using conda
conda activate whisper-transcriptionpython transcriber.py# English transcription
python transcriber.py \
--language en \
--model small \
--save_audio \
--output_file transcription.txt \
--device cpu \
--microphone_index 4 \
--energy_threshold 100 \
--segment_duration 8.0 \
--overlap_duration 1.0 \
--debug
# Chinese transcription
python transcriber.py \
--language zh \
--model small \
--save_audio \
--output_file transcription.txt \
--device cpu \
--microphone_index 4 \
--energy_threshold 100 \
--segment_duration 8.0 \
--overlap_duration 1.0 \
--debugTest your microphone setup and find optimal settings:
python audio_diagnostic.pyThe diagnostic tool will:
- List all available microphone devices
- Test different energy thresholds (4000, 1000, 300, 100, 50)
- Analyze audio volume levels
- Provide recommended energy threshold values
Sample output:
1. Available microphone devices:
0: Built-in Microphone
1: USB Audio Device
2. Testing default microphone...
Testing energy threshold: 300
Volume - Max: 0.156, Average: 0.023
Threshold 300 working properly!
Recommended energy threshold: 300
Use the recommended settings:
python transcriber.py \
--language en \
--model small \
--save_audio \
--output_file transcription.txt \
--device cpu \
--microphone_index 4 \
--energy_threshold 100 \
--segment_duration 8.0 \
--overlap_duration 1.0 \
--debug--language: Target language (en,zh,ja,ko,es,fr,de,auto)--model: Whisper model size (tiny,base,small,medium,large)--microphone_index: Microphone device index--energy_threshold: Speech detection threshold (default: 300)
--recording_method: Recording method (sounddevice,pyaudio,speech_recognition)--segment_duration: Segment length in seconds (default: 8.0)--overlap_duration: Overlap length in seconds (default: 2.0)
--output_file: Transcription output file (default:transcription.txt)--save_audio: Save audio segments (recommended for debugging and analysis)--debug: Enable debug mode
During transcription:
- Ctrl+C: Pause and access options menu
- Option 1: Resume recording
- Option 2: Stop and exit
- Python 3.8+
- PyTorch (with CUDA 11.6 support)
- OpenAI Whisper
- NumPy
- PyAudio
- SpeechRecognition
- sounddevice (recommended for better audio recording)
- faster-whisper (experimental performance improvements)
- 4GB RAM minimum
- Microphone access
- Internet connection for initial model download
- CUDA-capable GPU (optional, for acceleration)
MIT License - see LICENSE file for details.