Skip to content

FlippFuzz/ai-sub

Repository files navigation

AI Sub: AI-Powered Subtitle Generation with Translation

PyPI version Downloads


Project Overview

AI Sub is a powerful tool that leverages AI (currently Google Gemini) to produce English and Japanese subtitles for videos, translating between languages as necessary. It is primarily tested and designed for Hololive concert/cover videos, but might work on other content.


Showcase

Here's an example of subtitles generated by AI Sub:

Video Screenshot

For more examples, please visit the showcase directory.


Pros and cons of using Gemini as the AI model

Pros:

  • Multimodal Context: Gemini's advanced multimodal capabilities enable it to analyze video content comprehensively, including on-screen text, for superior contextual understanding and more accurate subtitle generation.
  • Cloud-Based Processing: All processing is efficiently handled on Google Gemini's infrastructure, eliminating the need for local GPUs or extensive computational resources on your machine.

Cons:

  • Timestamp Precision: Subtitle timestamps may exhibit a minor offset of a few seconds.
  • Network Usage: Uploading entire video files to Google's services will consume network bandwidth.

How AI Sub Works

  • Video Segmentation: The input video is first segmented into 180-second segments. This duration is configurable via the --split_seconds flag.
  • Concurrent Processing: Each video segment is then sent to the AI model (Google Gemini) for subtitle generation. You can adjust the number of concurrent processing threads using the --num_processing_threads flag to optimize performance.
  • Subtitle Compilation: All generated subtitle parts are then combined into a single, final subtitle file.

Getting Started: A Quick Guide

1. Obtain Your Google Gemini API Key

Follow these simple steps to acquire your API key:

  1. Sign in to Google AI Studio.
  2. Click "Create API Key."
  3. Copy and securely store your API key. Never disclose your API key publicly.

2. Set Up Your Python Environment (Python 3.10+ Required)

Prepare your python virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate.bat`
pip install --upgrade ai-sub

3. Execute the Script

Run the application with your video file:

ai-sub --api_key=YOUR_API_KEY "path/to/your/video.mp4"

Note: Replace YOUR_API_KEY with your actual Google Gemini API key and "path/to/your/video.mp4" with the full path to your video file.


Known Limitations

  1. Timestamp Accuracy: Subtitle timestamps may exhibit inaccuracies. This is an inherent characteristic of the Gemini AI model.

    • Observations indicate that shorter video segments generally lead to improved timestamp accuracy.
    • Requesting second-level precision for timestamps generally yields more accurate results compared to millisecond-level precision from the model. Consequently, the current implementation is designed to request second-level timestamps.
  2. AI Hallucinations: Like all AI models, Gemini may occasionally produce "hallucinations" or inaccurate information. This is a known characteristic of current AI technology.

If you encounter issues related to these limitations, consider re-processing specific video segments as detailed in the "Re-processing Specific Video Segments" section below.


Re-processing Specific Video Segments

Intermediate files generated during processing are stored in the temporary directory, which defaults to tmp_<input_file_name> but can be specified using the --temp_dir CLI flag. Users can examine these part_XXX.json files within this directory to review the AI's results for individual segments. To re-process a specific video segment, simply delete its corresponding part_XXX.json file. Upon subsequent execution, the script will automatically re-process only those segments for which the part_XXX.json file is absent.