Skip to content

πŸŽ™οΈ Free AI Subtitle Generator - Convert MP4 videos & MP3 audio to SRT subtitles using OpenAI Whisper. Features desktop GUI with drag-and-drop, automatic video audio extraction, optimized 3-5 word subtitle segments with timestamps, and multiple output formats (SRT, TXT). Works completely offline, no API keys required. Perfect for content creators!

License

Notifications You must be signed in to change notification settings

aymalkhalid/Openai-Whisper-AI-Subtitle-Generator-Mp4-Mp3-To-SRT

Repository files navigation

AI Subtitle Generator πŸŽ™οΈβž‘οΈπŸ“

A free, open-source Python tool that converts English audio files (MP3, WAV, etc.) to subtitle files using OpenAI's Whisper AI model.

🎨 Two Ways to Use

πŸ–₯️ Desktop App (Recommended!)

Beautiful graphical interface with drag-and-drop support:

python app.py

πŸ‘‰ See docs/DESKTOP_APP_GUIDE.md for details

⌨️ Command Line

For advanced users and automation:

python main.py audio.mp3 -m tiny

πŸ‘‰ See usage examples below


Features

✨ Free & Offline - Uses OpenAI's open-source Whisper model, no API keys required 🎯 High Accuracy - State-of-the-art speech recognition technology πŸ“ Multiple Formats - Generates SRT subtitles and plain text transcriptions 🌐 English Language - Optimized for English audio transcription πŸš€ Easy to Use - Desktop app with drag-and-drop OR command-line interface 🎨 Beautiful GUI - Modern desktop application for non-technical users

Prerequisites

  1. Python 3.8 or higher - Download Python
  2. FFmpeg - Required for audio processing

Installing FFmpeg on Windows

Option 1: Using Chocolatey (Recommended)

choco install ffmpeg

Option 2: Manual Installation

  1. Download FFmpeg from https://www.gyan.dev/ffmpeg/builds/
  2. Extract the archive
  3. Add the bin folder to your system PATH

To verify FFmpeg is installed:

ffmpeg -version

Installation

  1. Clone or download this repository

  2. Install Python dependencies

pip install -r requirements.txt

This will install:

  • openai-whisper - The AI speech recognition model
  • torch - PyTorch deep learning framework
  • Other required dependencies

Note: First installation may take several minutes as it downloads PyTorch and other large dependencies.

Usage

Quick Test with Sample Files

Test the application with included sample files:

# Test with sample audio
python main.py samples/test.wav

# Test with sample video
python main.py samples/Facial_expression_the_202511041949_nzzse.mp4

Note: The samples/ directory contains sample media files and example output files for reference.

Basic Usage

Convert an audio or video file to subtitles:

python main.py your_audio.mp3

Or convert MP4 video:

python main.py video.mp4

This will create (output format: first 5 characters + "_subtitle"):

  • your_subtitle.srt - SRT subtitle file (standard format)
  • your_subtitle.txt - Plain text with timestamps
  • your_subtitle_full.txt - Full transcription without timestamps

Example: aymal_khalid_khan.mp3 β†’ aymal_subtitle.srt

Advanced Options

Choose a specific model size:

python main.py audio.mp3 -m small

Available models (accuracy vs speed):

  • tiny - Fastest, least accurate (~1GB RAM)
  • base - Good balance (default, ~1GB RAM)
  • small - Better accuracy (~2GB RAM)
  • medium - High accuracy (~5GB RAM)
  • large - Best accuracy (~10GB RAM)

Choose output format:

# Only SRT format
python main.py audio.mp3 -f srt

# Only plain text
python main.py audio.mp3 -f txt

# Both formats (default)
python main.py audio.mp3 -f both

Specify custom output filename:

python main.py audio.mp3 -o my_subtitles

Complete Example

python main.py interview.mp3 -m medium -f srt -o interview_subtitles

Supported File Formats

Audio formats:

  • MP3, WAV, M4A, FLAC, OGG, WMA, AAC

Video formats:

  • MP4 (audio automatically extracted)

The tool supports any audio format that FFmpeg can decode, and automatically extracts audio from MP4 video files.

Output Formats

SRT Format (.srt)

Standard subtitle format with 3-5 words per segment:

1
00:00:00,000 --> 00:00:04,160
Watch the eyes. This is

2
00:00:04,160 --> 00:00:07,760
not a sign of tiredness.

Plain Text Format (.txt)

Simple text with timestamps (3-5 words per segment):

[00:00:00,000 --> 00:00:04,160]
Watch the eyes. This is

[00:00:04,160 --> 00:00:07,760]
not a sign of tiredness.

Full Transcription (_full.txt)

Complete transcription without timestamps, useful for reading the entire content.

How It Works

  1. Audio Loading - FFmpeg loads and processes your audio file
  2. AI Transcription - Whisper AI model transcribes speech to text with timestamps
  3. Subtitle Generation - Creates properly formatted subtitle files
  4. File Saving - Outputs files in your chosen format(s)

Sample Files

The samples/ directory includes:

  • Sample media files: test.wav (audio) and Facial_expression_the_202511041949_nzzse.mp4 (video)
  • Example outputs: Sample subtitle files showing expected output formats

Use these files to quickly test the application:

python main.py samples/test.wav

See samples/README.md for more details.

Tips for Best Results

  • 🎀 Use clear audio recordings with minimal background noise
  • πŸ“’ Ensure speech is in English (this version is optimized for English)
  • πŸ”Š Higher quality audio = better transcription accuracy
  • πŸ’ͺ Use larger models (medium or large) for difficult audio
  • ⚑ Use smaller models (tiny or base) for faster processing
  • πŸ§ͺ Test first with sample files in the samples/ directory

Troubleshooting

Error: "ffmpeg not found"

  • Install FFmpeg and ensure it's in your system PATH

Error: "Import whisper could not be resolved"

  • Run: pip install -r requirements.txt

Slow processing

  • Try a smaller model: python main.py audio.mp3 -m tiny
  • First run downloads the model (one-time delay)

Out of memory

  • Use a smaller model size
  • Process shorter audio segments

Model Download Information

On first run, Whisper will automatically download the selected model:

  • tiny - ~75 MB
  • base - ~150 MB
  • small - ~500 MB
  • medium - ~1.5 GB
  • large - ~3 GB

Models are cached, so subsequent runs are much faster.

License

This project uses OpenAI Whisper which is released under the MIT License.

Building the Executable

To create a standalone .exe file for distribution:

pyinstaller AI-Subtitle-Generator.spec --clean

πŸ‘‰ See docs/BUILD.md for complete build instructions including:

  • Multiple build methods (PyInstaller, Auto-Py-to-Exe)
  • Troubleshooting tips
  • Build script usage

πŸ“¦ Distributing the Executable

Want to share your .exe with others?

πŸ‘‰ See docs/GITHUB_RELEASES_GUIDE.md for:

  • How to create GitHub Releases
  • Uploading the .exe file
  • Writing release notes
  • Version management

Quick steps:

  1. Build the .exe: pyinstaller AI-Subtitle-Generator.spec --clean
  2. Create a GitHub Release
  3. Upload dist/AI-Subtitle-Generator.exe as an asset
  4. Users can download from the Releases page

Contributing

Feel free to open issues or submit pull requests for improvements!

Acknowledgments


Made with ❀️ for free subtitle generation

About

πŸŽ™οΈ Free AI Subtitle Generator - Convert MP4 videos & MP3 audio to SRT subtitles using OpenAI Whisper. Features desktop GUI with drag-and-drop, automatic video audio extraction, optimized 3-5 word subtitle segments with timestamps, and multiple output formats (SRT, TXT). Works completely offline, no API keys required. Perfect for content creators!

Resources

License

Stars

Watchers

Forks

Packages

No packages published