A free, open-source Python tool that converts English audio files (MP3, WAV, etc.) to subtitle files using OpenAI's Whisper AI model.
Beautiful graphical interface with drag-and-drop support:
python app.pyπ See docs/DESKTOP_APP_GUIDE.md for details
For advanced users and automation:
python main.py audio.mp3 -m tinyπ See usage examples below
β¨ Free & Offline - Uses OpenAI's open-source Whisper model, no API keys required π― High Accuracy - State-of-the-art speech recognition technology π Multiple Formats - Generates SRT subtitles and plain text transcriptions π English Language - Optimized for English audio transcription π Easy to Use - Desktop app with drag-and-drop OR command-line interface π¨ Beautiful GUI - Modern desktop application for non-technical users
- Python 3.8 or higher - Download Python
- FFmpeg - Required for audio processing
Option 1: Using Chocolatey (Recommended)
choco install ffmpegOption 2: Manual Installation
- Download FFmpeg from https://www.gyan.dev/ffmpeg/builds/
- Extract the archive
- Add the
binfolder to your system PATH
To verify FFmpeg is installed:
ffmpeg -version-
Clone or download this repository
-
Install Python dependencies
pip install -r requirements.txtThis will install:
openai-whisper- The AI speech recognition modeltorch- PyTorch deep learning framework- Other required dependencies
Note: First installation may take several minutes as it downloads PyTorch and other large dependencies.
Test the application with included sample files:
# Test with sample audio
python main.py samples/test.wav
# Test with sample video
python main.py samples/Facial_expression_the_202511041949_nzzse.mp4Note: The samples/ directory contains sample media files and example output files for reference.
Convert an audio or video file to subtitles:
python main.py your_audio.mp3Or convert MP4 video:
python main.py video.mp4This will create (output format: first 5 characters + "_subtitle"):
your_subtitle.srt- SRT subtitle file (standard format)your_subtitle.txt- Plain text with timestampsyour_subtitle_full.txt- Full transcription without timestamps
Example: aymal_khalid_khan.mp3 β aymal_subtitle.srt
Choose a specific model size:
python main.py audio.mp3 -m smallAvailable models (accuracy vs speed):
tiny- Fastest, least accurate (~1GB RAM)base- Good balance (default, ~1GB RAM)small- Better accuracy (~2GB RAM)medium- High accuracy (~5GB RAM)large- Best accuracy (~10GB RAM)
Choose output format:
# Only SRT format
python main.py audio.mp3 -f srt
# Only plain text
python main.py audio.mp3 -f txt
# Both formats (default)
python main.py audio.mp3 -f bothSpecify custom output filename:
python main.py audio.mp3 -o my_subtitlespython main.py interview.mp3 -m medium -f srt -o interview_subtitlesAudio formats:
- MP3, WAV, M4A, FLAC, OGG, WMA, AAC
Video formats:
- MP4 (audio automatically extracted)
The tool supports any audio format that FFmpeg can decode, and automatically extracts audio from MP4 video files.
Standard subtitle format with 3-5 words per segment:
1
00:00:00,000 --> 00:00:04,160
Watch the eyes. This is
2
00:00:04,160 --> 00:00:07,760
not a sign of tiredness.
Simple text with timestamps (3-5 words per segment):
[00:00:00,000 --> 00:00:04,160]
Watch the eyes. This is
[00:00:04,160 --> 00:00:07,760]
not a sign of tiredness.
Complete transcription without timestamps, useful for reading the entire content.
- Audio Loading - FFmpeg loads and processes your audio file
- AI Transcription - Whisper AI model transcribes speech to text with timestamps
- Subtitle Generation - Creates properly formatted subtitle files
- File Saving - Outputs files in your chosen format(s)
The samples/ directory includes:
- Sample media files:
test.wav(audio) andFacial_expression_the_202511041949_nzzse.mp4(video) - Example outputs: Sample subtitle files showing expected output formats
Use these files to quickly test the application:
python main.py samples/test.wavSee samples/README.md for more details.
- π€ Use clear audio recordings with minimal background noise
- π’ Ensure speech is in English (this version is optimized for English)
- π Higher quality audio = better transcription accuracy
- πͺ Use larger models (
mediumorlarge) for difficult audio - β‘ Use smaller models (
tinyorbase) for faster processing - π§ͺ Test first with sample files in the
samples/directory
Error: "ffmpeg not found"
- Install FFmpeg and ensure it's in your system PATH
Error: "Import whisper could not be resolved"
- Run:
pip install -r requirements.txt
Slow processing
- Try a smaller model:
python main.py audio.mp3 -m tiny - First run downloads the model (one-time delay)
Out of memory
- Use a smaller model size
- Process shorter audio segments
On first run, Whisper will automatically download the selected model:
tiny- ~75 MBbase- ~150 MBsmall- ~500 MBmedium- ~1.5 GBlarge- ~3 GB
Models are cached, so subsequent runs are much faster.
This project uses OpenAI Whisper which is released under the MIT License.
To create a standalone .exe file for distribution:
pyinstaller AI-Subtitle-Generator.spec --cleanπ See docs/BUILD.md for complete build instructions including:
- Multiple build methods (PyInstaller, Auto-Py-to-Exe)
- Troubleshooting tips
- Build script usage
Want to share your .exe with others?
π See docs/GITHUB_RELEASES_GUIDE.md for:
- How to create GitHub Releases
- Uploading the
.exefile - Writing release notes
- Version management
Quick steps:
- Build the
.exe:pyinstaller AI-Subtitle-Generator.spec --clean - Create a GitHub Release
- Upload
dist/AI-Subtitle-Generator.exeas an asset - Users can download from the Releases page
Feel free to open issues or submit pull requests for improvements!
- OpenAI Whisper - The amazing speech recognition model
- FFmpeg - Audio/video processing
Made with β€οΈ for free subtitle generation