AI Subtitle Generator 🎙️➡️📝

A free, open-source Python tool that converts English audio files (MP3, WAV, etc.) to subtitle files using OpenAI's Whisper AI model.

🎨 Two Ways to Use

🖥️ Desktop App (Recommended!)

Beautiful graphical interface with drag-and-drop support:

python app.py

👉 See docs/DESKTOP_APP_GUIDE.md for details

⌨️ Command Line

For advanced users and automation:

python main.py audio.mp3 -m tiny

👉 See usage examples below

Features

✨ Free & Offline - Uses OpenAI's open-source Whisper model, no API keys required 🎯 High Accuracy - State-of-the-art speech recognition technology 📝 Multiple Formats - Generates SRT subtitles and plain text transcriptions 🌐 English Language - Optimized for English audio transcription 🚀 Easy to Use - Desktop app with drag-and-drop OR command-line interface 🎨 Beautiful GUI - Modern desktop application for non-technical users

Prerequisites

Python 3.8 or higher - Download Python
FFmpeg - Required for audio processing

Installing FFmpeg on Windows

Option 1: Using Chocolatey (Recommended)

choco install ffmpeg

Option 2: Manual Installation

Download FFmpeg from https://www.gyan.dev/ffmpeg/builds/
Extract the archive
Add the bin folder to your system PATH

To verify FFmpeg is installed:

ffmpeg -version

Installation

Clone or download this repository
Install Python dependencies

pip install -r requirements.txt

This will install:

openai-whisper - The AI speech recognition model
torch - PyTorch deep learning framework
Other required dependencies

Note: First installation may take several minutes as it downloads PyTorch and other large dependencies.

Usage

Quick Test with Sample Files

Test the application with included sample files:

# Test with sample audio
python main.py samples/test.wav

# Test with sample video
python main.py samples/Facial_expression_the_202511041949_nzzse.mp4

Note: The samples/ directory contains sample media files and example output files for reference.

Basic Usage

Convert an audio or video file to subtitles:

python main.py your_audio.mp3

Or convert MP4 video:

python main.py video.mp4

This will create (output format: first 5 characters + "_subtitle"):

your_subtitle.srt - SRT subtitle file (standard format)
your_subtitle.txt - Plain text with timestamps
your_subtitle_full.txt - Full transcription without timestamps

Example: aymal_khalid_khan.mp3 → aymal_subtitle.srt

Advanced Options

Choose a specific model size:

python main.py audio.mp3 -m small

Available models (accuracy vs speed):

tiny - Fastest, least accurate (~1GB RAM)
base - Good balance (default, ~1GB RAM)
small - Better accuracy (~2GB RAM)
medium - High accuracy (~5GB RAM)
large - Best accuracy (~10GB RAM)

Choose output format:

# Only SRT format
python main.py audio.mp3 -f srt

# Only plain text
python main.py audio.mp3 -f txt

# Both formats (default)
python main.py audio.mp3 -f both

Specify custom output filename:

python main.py audio.mp3 -o my_subtitles

Complete Example

python main.py interview.mp3 -m medium -f srt -o interview_subtitles

Supported File Formats

Audio formats:

MP3, WAV, M4A, FLAC, OGG, WMA, AAC

Video formats:

MP4 (audio automatically extracted)

The tool supports any audio format that FFmpeg can decode, and automatically extracts audio from MP4 video files.

Output Formats

SRT Format (.srt)

Standard subtitle format with 3-5 words per segment:

1
00:00:00,000 --> 00:00:04,160
Watch the eyes. This is

2
00:00:04,160 --> 00:00:07,760
not a sign of tiredness.

Plain Text Format (.txt)

Simple text with timestamps (3-5 words per segment):

[00:00:00,000 --> 00:00:04,160]
Watch the eyes. This is

[00:00:04,160 --> 00:00:07,760]
not a sign of tiredness.

Full Transcription (_full.txt)

Complete transcription without timestamps, useful for reading the entire content.

How It Works

Audio Loading - FFmpeg loads and processes your audio file
AI Transcription - Whisper AI model transcribes speech to text with timestamps
Subtitle Generation - Creates properly formatted subtitle files
File Saving - Outputs files in your chosen format(s)

Sample Files

The samples/ directory includes:

Sample media files: test.wav (audio) and Facial_expression_the_202511041949_nzzse.mp4 (video)
Example outputs: Sample subtitle files showing expected output formats

Use these files to quickly test the application:

python main.py samples/test.wav

See samples/README.md for more details.

Tips for Best Results

🎤 Use clear audio recordings with minimal background noise
📢 Ensure speech is in English (this version is optimized for English)
🔊 Higher quality audio = better transcription accuracy
💪 Use larger models (medium or large) for difficult audio
⚡ Use smaller models (tiny or base) for faster processing
🧪 Test first with sample files in the samples/ directory

Troubleshooting

Error: "ffmpeg not found"

Install FFmpeg and ensure it's in your system PATH

Error: "Import whisper could not be resolved"

Run: pip install -r requirements.txt

Slow processing

Try a smaller model: python main.py audio.mp3 -m tiny
First run downloads the model (one-time delay)

Out of memory

Use a smaller model size
Process shorter audio segments

Model Download Information

On first run, Whisper will automatically download the selected model:

tiny - ~75 MB
base - ~150 MB
small - ~500 MB
medium - ~1.5 GB
large - ~3 GB

Models are cached, so subsequent runs are much faster.

License

This project uses OpenAI Whisper which is released under the MIT License.

Building the Executable

To create a standalone .exe file for distribution:

pyinstaller AI-Subtitle-Generator.spec --clean

👉 See docs/BUILD.md for complete build instructions including:

Multiple build methods (PyInstaller, Auto-Py-to-Exe)
Troubleshooting tips
Build script usage

📦 Distributing the Executable

Want to share your .exe with others?

👉 See docs/GITHUB_RELEASES_GUIDE.md for:

How to create GitHub Releases
Uploading the .exe file
Writing release notes
Version management

Quick steps:

Build the .exe: pyinstaller AI-Subtitle-Generator.spec --clean
Create a GitHub Release
Upload dist/AI-Subtitle-Generator.exe as an asset
Users can download from the Releases page

Contributing

Feel free to open issues or submit pull requests for improvements!

Acknowledgments

OpenAI Whisper - The amazing speech recognition model
FFmpeg - Audio/video processing

Made with ❤️ for free subtitle generation

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
output		output
samples		samples
.gitignore		.gitignore
AI-Subtitle-Generator.spec		AI-Subtitle-Generator.spec
LICENSE		LICENSE
README.md		README.md
activate_venv.bat		activate_venv.bat
activate_venv.ps1		activate_venv.ps1
app.py		app.py
build_exe.ps1		build_exe.ps1
build_exe_fix.ps1		build_exe_fix.ps1
install.ps1		install.ps1
main.py		main.py
requirements.txt		requirements.txt
run_app.bat		run_app.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Subtitle Generator 🎙️➡️📝

🎨 Two Ways to Use

🖥️ Desktop App (Recommended!)

⌨️ Command Line

Features

Prerequisites

Installing FFmpeg on Windows

Installation

Usage

Quick Test with Sample Files

Basic Usage

Advanced Options

Complete Example

Supported File Formats

Output Formats

SRT Format (.srt)

Plain Text Format (.txt)

Full Transcription (_full.txt)

How It Works

Sample Files

Tips for Best Results

Troubleshooting

Model Download Information

License

Building the Executable

📦 Distributing the Executable

Contributing

Acknowledgments

About

Uh oh!

Releases 1

Packages

Languages

License

aymalkhalid/Openai-Whisper-AI-Subtitle-Generator-Mp4-Mp3-To-SRT

Folders and files

Latest commit

History

Repository files navigation

AI Subtitle Generator 🎙️➡️📝

🎨 Two Ways to Use

🖥️ Desktop App (Recommended!)

⌨️ Command Line

Features

Prerequisites

Installing FFmpeg on Windows

Installation

Usage

Quick Test with Sample Files

Basic Usage

Advanced Options

Complete Example

Supported File Formats

Output Formats

SRT Format (.srt)

Plain Text Format (.txt)

Full Transcription (_full.txt)

How It Works

Sample Files

Tips for Best Results

Troubleshooting

Model Download Information

License

Building the Executable

📦 Distributing the Executable

Contributing

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages