A CLI tool that automatically generates Romaji karaoke subtitles for Japanese music videos.
- Auto-Transcription: Uses OpenAI Whisper (via
stable-ts) to transcribe audio with word-level precision. - Auto-Romanization: Converts Japanese text to Romaji using
cutlet(MeCab). - Karaoke Effects: Generates standard
.asskaraoke tags ({\k}) synced to vocals. - Burn-in: Automatically burns the subtitles into the video using FFmpeg.
- Python 3.10+
- FFmpeg with
libasssupport (Required for burning subtitles).
MacOS (Apple Silicon):
Standard Homebrew FFmpeg does NOT support subtitle burning. You must compile it from source.
-
Install Build Dependencies:
brew install automake fdk-aac git lame libass libtool libvorbis libvpx \ opus sdl shtool texi2html theora wget x264 x265 xvid nasm yasm pkg-config
-
Clone FFmpeg:
cd ~/Downloads git clone https://github.com/FFmpeg/FFmpeg.git cd FFmpeg
-
Configure & Build: Run this block in your terminal to configure with explicit Apple Silicon paths:
export PKG_CONFIG_PATH="/opt/homebrew/lib/pkgconfig:$PKG_CONFIG_PATH" ./configure --prefix=/usr/local \ --extra-cflags="-I/opt/homebrew/include" \ --extra-ldflags="-L/opt/homebrew/lib" \ --enable-gpl \ --enable-nonfree \ --enable-libass \ --enable-libfdk-aac \ --enable-libfreetype \ --enable-libmp3lame \ --enable-libopus \ --enable-libtheora \ --enable-libvorbis \ --enable-libvpx \ --enable-libx264 \ --enable-libx265
-
Install:
make -j$(sysctl -n hw.ncpu) sudo make install
-
Clone the repository:
git clone https://github.com/yourusername/kara-it.git cd kara-it -
Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 - # Add Poetry to PATH (add to your shell config file: ~/.zshrc or ~/.bashrc) export PATH="$HOME/.local/bin:$PATH"
-
Install Dependencies:
# Install all dependencies and create virtual environment poetry install # Activate the virtual environment poetry shell
-
Install MeCab Dictionary:
python -m unidic_lite.download
Note: Poetry is configured to create the virtual environment in the project directory (
.venv/). This keeps your dependencies isolated and makes it easy to manage.
Basic Usage (Auto Mode): Generates Romaji karaoke and burns it to a new video.
python src/main.py generate "path/to/song.mp4"Options:
--karaoke / --no-karaoke: Enable/Disable{\k}tags (Default: Enabled).--burn / --no-burn: Burn subtitles into video (Default: Enabled).--format ass: Output format (Default: ass).--model base: Whisper model size (tiny, base, small, medium, large).
Example:
python -m src.main generate my_song.mp4 --model medium --karaokeby default, the tool transcribes everything, including spoken words between songs. To remove these:
- Transcribe first:
poetry run python -m src.main transcribe live_video.mp4 --output transcript.json
- Edit the JSON:
Open
transcript.jsonand manually remove the segments corresponding to the spoken parts. - Continue the pipeline:
poetry run python -m src.main romanize transcript.json --output romaji.json poetry run python -m src.main format romaji.json --output subs.ass poetry run python -m src.main burn live_video.mp4 subs.ass