🎙️ Whisper Realtime Echo 🚀

A modern GUI for Faster Whisper speech recognition model using Flet framework.

⚠️ This application has been tested and confirmed working on Windows. While it may work on other platforms (Linux, macOS), they have not been officially tested.

✨ Features

🔊 Transcribe audio files with a beautiful UI
🎤 Record audio directly and transcribe it
🎧 Live transcription with real-time speech recognition
🌍 Choose between English-only and multilingual models
🗣️ Select language for multilingual models (auto-detect available)
🌐 Translate any language to English text
📏 Select model size with VRAM/speed indicators
💻 GPU (CUDA) or CPU processing
📊 Real-time transcription status
🧩 Modular, maintainable codebase
📋 Copy results to clipboard with history access
🌓 Toggle between light and dark themes

📋 Model Information

Size	Parameters	English-only	Multilingual	VRAM	Speed
tiny	39 M	✅ tiny.en	✅ tiny	~1 GB	~10x
base	74 M	✅ base.en	✅ base	~1 GB	~7x
small	244 M	✅ small.en	✅ small	~2 GB	~4x
medium	769 M	✅ medium.en	✅ medium	~5 GB	~2x
large	1550 M	❌ N/A	✅ large	~10 GB	1x
turbo	809 M	❌ N/A	✅ turbo	~6 GB	~8x

💡 English-only models typically perform better for English transcription.

🚀 Included Pre-quantized Models

For optimal real-time transcription speed, the application includes pre-quantized int8 versions of the tiny models:

int8_tiny_en: Pre-quantized tiny.en model for English transcription
int8_tiny: Pre-quantized tiny model for multilingual transcription

These models were created using CTranslate2's conversion tool with the following command:

ct2-transformers-converter --model openai/whisper-tiny.en --output_dir int8_tiny_en --copy_files tokenizer.json preprocessor_config.json --quantization int8

🚀 Quick Start

Installation

# Clone repository
git clone https://github.com/alouiadel/whisper-realtime-echo.git
cd whisper-realtime-echo

# Create and activate virtual environment
# For Windows:
python -m venv venv
venv\Scripts\activate

# For macOS/Linux:
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

🔥 For GPU acceleration: If you want to use CUDA for GPU acceleration, install the appropriate PyTorch version from PyTorch official installation guide

Usage

# Run the application
python main.py

📂 Click "Select Audio File" or 🎤 "Start Recording"
🌐 Choose model type and size
📊 Select processing device (CPU/GPU)
▶️ Click "Transcribe" for file processing or 🎙️ "Start Live" for real-time transcription
📝 View results and use the toolbar to:
- 📋 Copy to clipboard
- 📜 Access transcript history
- 🌓 Toggle theme (top-right corner)

🎙️ Live Transcription:

Provides real-time speech-to-text from your microphone
Automatically detects and processes speech in chunks
Optimized for maximum responsiveness:
- Uses int8 quantization (reduces memory usage by ~40%)
- Reduced beam size (1 instead of 5)
- Greedy decoding with temperature=0
- Independent chunk processing
- Disabled timestamps generation
Best practices:
- Works best in quiet environments
- Smaller models offer faster response times
- Use a clear, consistent speaking voice

🎧 Supported Audio Formats

Whisper leverages ffmpeg to process audio, supporting a wide range of formats:

Audio files: wav, mp3, m4a, ogg, flac, opus, amr
Video files: mp4 (and other video formats with audio tracks)

📜 Clipboard History

View, copy, and reuse previous transcriptions
Saved between sessions with timestamps and model info
Most recent 50 entries are retained

📝 Notes & Limitations

Language Selection: Helps Whisper optimize recognition but doesn't translate content. English audio will still be transcribed as English even when another language is selected.
Transcribe vs. Translate Modes:
- Transcribe: Converts speech to text in the original language (default)
- Translate: Converts speech from any language to English text
Translation Best Practices:
- Automatically sets language to "Auto-detect" for optimal results
- Works best with larger models (large or turbo) and clear audio
- Some languages may require transcribing first, then using a separate translation service
System Requirements: If crashes occur, try using a smaller model or ensure your system meets the memory requirements in the model table.

🙏 Credits

This project is powered by Faster Whisper, a highly optimized implementation of OpenAI's Whisper. The Whisper model was trained on a large dataset of diverse audio and is capable of multilingual speech recognition, translation, and language identification.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
audio-samples		audio-samples
int8_tiny		int8_tiny
int8_tiny_en		int8_tiny_en
logic		logic
ui		ui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.png		demo.png
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ Whisper Realtime Echo 🚀

✨ Features

📋 Model Information

🚀 Included Pre-quantized Models

🚀 Quick Start

Installation

Usage

🎙️ Live Transcription:

🎧 Supported Audio Formats

📜 Clipboard History

📝 Notes & Limitations

🙏 Credits

📄 License

About

Uh oh!

Releases

Packages

Languages

License

alouiadel/whisper-realtime-echo

Folders and files

Latest commit

History

Repository files navigation

🎙️ Whisper Realtime Echo 🚀

✨ Features

📋 Model Information

🚀 Included Pre-quantized Models

🚀 Quick Start

Installation

Usage

🎙️ Live Transcription:

🎧 Supported Audio Formats

📜 Clipboard History

📝 Notes & Limitations

🙏 Credits

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages