AI Sub is a powerful tool that leverages AI (currently Google Gemini) to produce English and Japanese subtitles for videos, translating between languages as necessary. It is primarily tested and designed for Hololive concert/cover videos, but might work on other content.
Here's an example of subtitles generated by AI Sub:
For more examples, please visit the showcase directory.
- Multimodal Context: Gemini's advanced multimodal capabilities enable it to analyze video content comprehensively, including on-screen text, for superior contextual understanding and more accurate subtitle generation.
- Cloud-Based Processing: All processing is efficiently handled on Google Gemini's infrastructure, eliminating the need for local GPUs or extensive computational resources on your machine.
- Timestamp Precision: Subtitle timestamps may exhibit a minor offset of a few seconds.
- Network Usage: Uploading entire video files to Google's services will consume network bandwidth.
- Video Segmentation: The input video is first segmented into 180-second segments. This duration is configurable via the
--split_seconds
flag. - Concurrent Processing: Each video segment is then sent to the AI model (Google Gemini) for subtitle generation. You can adjust the number of concurrent processing threads using the
--num_processing_threads
flag to optimize performance. - Subtitle Compilation: All generated subtitle parts are then combined into a single, final subtitle file.
Follow these simple steps to acquire your API key:
- Sign in to Google AI Studio.
- Click "Create API Key."
- Copy and securely store your API key. Never disclose your API key publicly.
Prepare your python virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate.bat`
pip install --upgrade ai-sub
Run the application with your video file:
ai-sub --api_key=YOUR_API_KEY "path/to/your/video.mp4"
Note: Replace YOUR_API_KEY
with your actual Google Gemini API key and "path/to/your/video.mp4"
with the full path to your video file.
-
Timestamp Accuracy: Subtitle timestamps may exhibit inaccuracies. This is an inherent characteristic of the Gemini AI model.
- Observations indicate that shorter video segments generally lead to improved timestamp accuracy.
- Requesting second-level precision for timestamps generally yields more accurate results compared to millisecond-level precision from the model. Consequently, the current implementation is designed to request second-level timestamps.
-
AI Hallucinations: Like all AI models, Gemini may occasionally produce "hallucinations" or inaccurate information. This is a known characteristic of current AI technology.
If you encounter issues related to these limitations, consider re-processing specific video segments as detailed in the "Re-processing Specific Video Segments" section below.
Intermediate files generated during processing are stored in the temporary directory, which defaults to tmp_<input_file_name>
but can be specified using the --temp_dir
CLI flag.
Users can examine these part_XXX.json
files within this directory to review the AI's results for individual segments.
To re-process a specific video segment, simply delete its corresponding part_XXX.json
file.
Upon subsequent execution, the script will automatically re-process only those segments for which the part_XXX.json
file is absent.