Turn your voice into text with AI-powered transcription
A modern, feature-rich speech-to-text application that combines the power of Google's Gemini AI with a beautiful web interface and floating recording controls. Perfect for content creators, developers, students, and anyone who needs fast, accurate transcription.
- One-click recording with customizable hotkeys
- Pause & resume without losing your audio
- Multiple microphones support with easy device switching
- Google Gemini AI for accurate, context-aware transcription
- Multi-language support - 120+ Supported Languages
- Customizable AI prompts for specialized use cases
- Beautiful UI with dark theme and responsive design
- Floating recording overlay that stays on top while you work
- System tray integration for background operation
- Real-time status and audio feedback
- Auto-paste transcribed text directly to your active application
- auto-save in cliboard it automatically saves the transcribed text to clipboard
- Background operation - keeps running in system tray
pip install -r requirements.txt
- Get a free Google Gemini API key
python main.py
- In the left sidebar, click
API Keys
- Paste your key into the
Gemini API Key
field - The key is saved automatically (you can change it anytime)
Tip: Alternatively, you can create a .env
file with GEMINI_API_KEY=your_key
.
- Start Recording - press
Ctrl+Shift+Space
to start/stop recording - Speak Naturally - The floating overlay shows recording status
- AI Processing - Gemini AI transcribes with context awareness
- Auto-Paste - Text appears in your active application and is saved to clipboard
- Toggle Mode: Press once to start/stop (default:
Ctrl+Shift+Space
) - Hold Mode: Hold to record, release to stop (default:
Ctrl
)
- Silence Threshold: Adjust sensitivity for your environment
- Microphone Selection: Choose your preferred input device
- Ambient Calibration: Automatic noise floor detection
- Custom Prompts: Tailor transcription for your specific needs
- Language Preservation: Maintains original scripts and accents
Smart Audio Transcript uses a modern hybrid architecture:
- ๐ Web Interface (Eel) - Settings and configuration
- ๐ฏ Native Overlay (CustomTkinter) - Floating recording controls
- ๐๏ธ System Tray (pystray) - Background management
- ๐ค Core Engine (Python) - Audio processing & AI integration
This gives you the best of both worlds: a modern web UI for settings and responsive native controls for recording.
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Made with โค๏ธ for the open source community
Transform your voice into text with the power of AI