|
1 | | -# Lyrics Transcriber 🎶 |
| 1 | +# python-lyrics-transcriber (DEPRECATED) |
2 | 2 |
|
3 | | - |
4 | | - |
5 | | -[](https://github.com/nomadkaraoke/python-lyrics-transcriber/actions/workflows/test-and-publish.yml) |
6 | | -[](https://codecov.io/gh/nomadkaraoke/python-lyrics-transcriber) |
| 3 | +> **This project has been deprecated and archived.** |
7 | 4 |
|
8 | | -Create synchronized karaoke assets from an audio file with word‑level timing: fetch lyrics, transcribe audio, auto‑correct against references, review in a web UI, and export ASS, LRC, CDG, and video. |
| 5 | +The lyrics transcription functionality has been consolidated into [karaoke-gen](https://github.com/nomadkaraoke/karaoke-gen). |
9 | 6 |
|
10 | | -### What this project is now |
11 | | -- **Modular pipeline** orchestrated by `LyricsTranscriber` with clear configs |
12 | | -- **Transcription** via AudioShake (preferred) and Whisper on RunPod (fallback) |
13 | | -- **Lyrics providers**: Genius, Spotify, Musixmatch, or a local file |
14 | | -- **Rule‑based correction** with optional **LLM‑assisted** gap fixes |
15 | | -- **Human review** server + frontend for iterative corrections and previews |
16 | | -- **Outputs**: original/corrected text, corrections JSON, LRC, ASS, CDG(+MP3/ZIP), and video |
| 7 | +## Migration |
17 | 8 |
|
18 | | -## Features |
19 | | -- **Multi-transcriber orchestration** with caching per audio hash |
20 | | - - AudioShake API (priority 1) |
21 | | - - Whisper via RunPod + Dropbox upload (priority 2) |
22 | | -- **Lyrics fetching** with caching per artist/title |
23 | | - - Genius (token or RapidAPI) • Spotify (cookie or RapidAPI) • Musixmatch (RapidAPI) • Local file |
24 | | -- **Correction engine** |
25 | | - - Anchor/gap detection, multiple rule handlers (word count, syllables, relaxed, punctuation, extend‑anchor) |
26 | | - - Optional LLM handlers (Ollama local, or OpenRouter with `OPENROUTER_API_KEY`) |
27 | | -- **Review UI** (FastAPI) at `http://localhost:8000` |
28 | | - - Edit corrections, toggle handlers, add lyrics sources, generate preview video |
29 | | -- **Countdown intro for karaoke** (enabled by default) |
30 | | - - Automatically adds 3-second intro with "3... 2... 1..." for songs that start within 3 seconds |
31 | | - - Pads audio with silence and shifts all timestamps accordingly |
32 | | - - Helps karaoke singers prepare before vocals begin |
33 | | - - Disable with `--skip_countdown` |
34 | | -- **Rich outputs** |
35 | | - - Plain text (original/corrected), corrections `JSON`, `*.lrc` (MidiCo), `*.ass` (karaoke), `*.cdg` with `*.mp3` and ZIP, and MP4/MKV video |
36 | | - - Subtitle offset, line wrapping, styles via JSON |
| 9 | +For karaoke video generation with synchronized lyrics: |
| 10 | +- Use [karaoke-gen](https://github.com/nomadkaraoke/karaoke-gen) - the complete karaoke generation solution |
| 11 | +- Web app: https://gen.nomadkaraoke.com |
37 | 12 |
|
38 | | -## Install |
39 | | -``` |
40 | | -pip install lyrics-transcriber |
41 | | -``` |
| 13 | +## Historical Context |
42 | 14 |
|
43 | | -### System requirements |
44 | | -- Python 3.10–3.13 |
45 | | -- FFmpeg (required for audio probe and video rendering) |
46 | | -- spaCy English model (phrase analyzer used by correction): |
47 | | -``` |
48 | | -python -m spacy download en_core_web_sm |
49 | | -``` |
| 15 | +This library was originally developed as a standalone tool for creating synchronized lyrics files with word-level timestamps. It combined audio transcription (via AudioShake or Whisper), lyrics fetching from multiple sources (Genius, Spotify, Musixmatch, LRCLib), and intelligent correction algorithms to produce professional karaoke assets. |
50 | 16 |
|
51 | | -## Quick start (CLI) |
52 | | -Minimal run (transcribe + LRC/ASS, no video/CDG): |
53 | | -```bash |
54 | | -lyrics-transcriber /path/to/song.mp3 --skip_video --skip_cdg |
55 | | -``` |
| 17 | +The functionality has now been integrated into the karaoke-gen platform, which provides a complete end-to-end solution for karaoke video generation including: |
| 18 | +- Audio separation (vocals/instrumentals) |
| 19 | +- Lyrics transcription and correction |
| 20 | +- Human review interface |
| 21 | +- Video rendering with title screens |
| 22 | +- Distribution to YouTube, Dropbox, Google Drive |
56 | 23 |
|
57 | | -Use AudioShake and auto‑fetch lyrics (Genius + artist/title): |
58 | | -```bash |
59 | | -export AUDIOSHAKE_API_TOKEN=... # or pass --audioshake_api_token |
60 | | -export GENIUS_API_TOKEN=... |
61 | | -lyrics-transcriber /path/to/song.mp3 --artist "Artist" --title "Song" |
62 | | -``` |
| 24 | +## Final Version |
63 | 25 |
|
64 | | -Use Whisper on RunPod (fallback or standalone): |
65 | | -```bash |
66 | | -export RUNPOD_API_KEY=... |
67 | | -export WHISPER_RUNPOD_ID=... # your RunPod endpoint ID |
68 | | -lyrics-transcriber /path/to/song.mp3 --skip_cdg --skip_video |
69 | | -``` |
70 | | - |
71 | | -Provide a local lyrics file instead of fetching: |
72 | | -```bash |
73 | | -lyrics-transcriber /path/to/song.mp3 --lyrics_file /path/to/lyrics.txt |
74 | | -``` |
75 | | - |
76 | | -Render video/CDG (requires a styles JSON file): |
77 | | -```bash |
78 | | -lyrics-transcriber /path/to/song.mp3 \ |
79 | | - --output_styles_json /path/to/styles.json \ |
80 | | - --video_resolution 1080p |
81 | | -``` |
82 | | - |
83 | | -### Common flags |
84 | | -- **Song identification**: `--artist`, `--title`, `--lyrics_file` |
85 | | -- **APIs**: `--audioshake_api_token`, `--genius_api_token`, `--spotify_cookie`, `--runpod_api_key`, `--whisper_runpod_id` |
86 | | -- **Output**: `--output_dir`, `--cache_dir`, `--output_styles_json`, `--subtitle_offset` |
87 | | -- **Feature toggles**: `--skip_lyrics_fetch`, `--skip_transcription`, `--skip_correction`, `--skip_plain_text`, `--skip_lrc`, `--skip_cdg`, `--skip_video`, `--skip_countdown`, `--video_resolution {4k,1080p,720p,360p}` |
88 | | - |
89 | | -Run `lyrics-transcriber --help` for full usage. |
90 | | - |
91 | | -## Environment variables |
92 | | -These are read automatically (CLI flags override): |
93 | | -- `AUDIOSHAKE_API_TOKEN` |
94 | | -- `GENIUS_API_TOKEN`, `RAPIDAPI_KEY` |
95 | | -- `SPOTIFY_COOKIE_SP_DC` |
96 | | -- `RUNPOD_API_KEY`, `WHISPER_RUNPOD_ID` |
97 | | -- `WHISPER_DROPBOX_APP_KEY`, `WHISPER_DROPBOX_APP_SECRET`, `WHISPER_DROPBOX_REFRESH_TOKEN` |
98 | | -- `OPENROUTER_API_KEY` (optional LLM handler) |
99 | | -- `LYRICS_TRANSCRIBER_CACHE_DIR` (default `~/lyrics-transcriber-cache`) |
100 | | - |
101 | | -## Outputs |
102 | | -Generated files are written to `--output_dir` (default: CWD): |
103 | | -- `... (Lyrics Corrections).json` — full correction data and audit trail |
104 | | -- `... (Karaoke).ass` — styled karaoke subtitles (ASS) |
105 | | -- `... .lrc` — MidiCo compatible LRC |
106 | | -- `... (original).txt` and `... (corrected).txt` — plain text exports |
107 | | -- `... .cdg`, `... .mp3`, `... .zip` — CDG package (when enabled) |
108 | | -- `... (With Vocals).mkv` — video with lyrics overlay (when enabled) |
109 | | - |
110 | | -Notes |
111 | | -- If no `--output_styles_json` is provided, CDG and video are disabled automatically. |
112 | | -- `--subtitle_offset` shifts all word timings (ms) for late/early subtitles. |
113 | | - |
114 | | -## Review server (human‑in‑the‑loop) |
115 | | -If review is enabled (default), a local server starts during processing and opens the UI at `http://localhost:8000`: |
116 | | -- Inspect and adjust corrections |
117 | | -- Toggle correction handlers (rule‑based/LLM) |
118 | | -- Add another lyrics source (paste plain text) |
119 | | -- Generate a low‑res preview video on demand |
120 | | - |
121 | | -Frontend assets are bundled when installed from PyPI. For local dev, build the frontend once if needed: |
122 | | -``` |
123 | | -./scripts/build_frontend.sh |
124 | | -``` |
125 | | - |
126 | | -## Styles JSON (for CDG/Video) |
127 | | -Provide a JSON with at least a `karaoke` section (for video/ASS) and, if generating CDG, a `cdg` section. Example (minimal): |
128 | | -```json |
129 | | -{ |
130 | | - "karaoke": { |
131 | | - "ass_name": "Karaoke", |
132 | | - "font": "Oswald SemiBold", |
133 | | - "font_path": "lyrics_transcriber/output/fonts/Oswald-SemiBold.ttf", |
134 | | - "font_size": 120, |
135 | | - "primary_color": "255,165,0", |
136 | | - "secondary_color": "255,255,255", |
137 | | - "outline_color": "0,0,0", |
138 | | - "back_color": "0,0,0", |
139 | | - "bold": true, |
140 | | - "italic": false, |
141 | | - "underline": false, |
142 | | - "strike_out": false, |
143 | | - "scale_x": 100, |
144 | | - "scale_y": 100, |
145 | | - "spacing": 0, |
146 | | - "angle": 0, |
147 | | - "border_style": 1, |
148 | | - "outline": 3, |
149 | | - "shadow": 0, |
150 | | - "margin_l": 0, |
151 | | - "margin_r": 0, |
152 | | - "margin_v": 100, |
153 | | - "encoding": 1, |
154 | | - "background_color": "black", |
155 | | - "max_line_length": 36, |
156 | | - "top_padding": 180 |
157 | | - }, |
158 | | - "cdg": { |
159 | | - "font": "Oswald SemiBold", |
160 | | - "font_path": "lyrics_transcriber/output/fonts/Oswald-SemiBold.ttf" |
161 | | - } |
162 | | -} |
163 | | -``` |
164 | | - |
165 | | -## Using as a library |
166 | | -```python |
167 | | -from lyrics_transcriber import LyricsTranscriber |
168 | | -from lyrics_transcriber.core.controller import TranscriberConfig, LyricsConfig, OutputConfig |
169 | | - |
170 | | -transcriber = LyricsTranscriber( |
171 | | - audio_filepath="/path/to/song.mp3", |
172 | | - artist="Artist", # optional |
173 | | - title="Title", # optional |
174 | | - transcriber_config=TranscriberConfig( |
175 | | - audioshake_api_token="...", # or env |
176 | | - runpod_api_key="...", whisper_runpod_id="..." |
177 | | - ), |
178 | | - lyrics_config=LyricsConfig( |
179 | | - genius_api_token="...", spotify_cookie="...", rapidapi_key="...", |
180 | | - lyrics_file=None |
181 | | - ), |
182 | | - output_config=OutputConfig( |
183 | | - output_dir="./out", cache_dir="~/lyrics-transcriber-cache", |
184 | | - output_styles_json="/path/to/styles.json", # required for CDG/video |
185 | | - video_resolution="1080p", subtitle_offset_ms=0, |
186 | | - add_countdown=True # enable countdown for songs starting within 3s (default: True) |
187 | | - ), |
188 | | -) |
189 | | - |
190 | | -result = transcriber.process() |
191 | | -print(result.ass_filepath, result.lrc_filepath, result.video_filepath) |
192 | | - |
193 | | -# Check if countdown padding was added (useful for syncing other audio files) |
194 | | -if result.countdown_padding_added: |
195 | | - print(f"Countdown padding added: {result.countdown_padding_seconds}s") |
196 | | - print(f"Padded audio filepath: {result.padded_audio_filepath}") |
197 | | - # You can use this info to apply the same padding to instrumental tracks |
198 | | -``` |
199 | | - |
200 | | -## Docker |
201 | | -Build and run locally (includes FFmpeg and spaCy model): |
202 | | -```bash |
203 | | -docker build -t lyrics-transcriber:local . |
204 | | -docker run --rm -v "$PWD/input":/input -v "$PWD/output":/output \ |
205 | | - -e AUDIOSHAKE_API_TOKEN -e GENIUS_API_TOKEN -e RUNPOD_API_KEY -e WHISPER_RUNPOD_ID \ |
206 | | - lyrics-transcriber:local \ |
207 | | - --output_dir /output --skip_cdg --video_resolution 360p /input/song.mp3 |
208 | | -``` |
209 | | - |
210 | | -## Development |
211 | | -- Python 3.10–3.13, Poetry |
212 | | -- Install deps: `poetry install` |
213 | | -- Run tests: `poetry run pytest` |
214 | | -- Build frontend (if editing UI): `./scripts/build_frontend.sh` |
215 | | - |
216 | | -## Agentic AI (Experimental) |
217 | | - |
218 | | -Uses **LangChain + LangGraph** for AI-powered lyrics correction with automatic **Langfuse** observability. |
219 | | - |
220 | | -### Enabling |
221 | | -- CLI flags: `--use-agentic-ai` and `--ai-model provider/model` |
222 | | -- Or env: `USE_AGENTIC_AI=1`, `AGENTIC_AI_MODEL=ollama/gpt-oss:latest` |
223 | | - |
224 | | -### Model Format |
225 | | -Models use `provider/model` format for LangChain: |
226 | | -- **Ollama** (local): `ollama/gpt-oss:latest`, `ollama/llama3.2:latest` |
227 | | -- **OpenAI**: `openai/gpt-4`, `openai/gpt-4-turbo` |
228 | | -- **Anthropic**: `anthropic/claude-3-sonnet-20240229`, `anthropic/claude-3-opus-20240229` |
229 | | - |
230 | | -### Provider Configuration |
231 | | -- **API Keys**: Set provider-specific keys: |
232 | | - - OpenAI: `OPENAI_API_KEY` |
233 | | - - Anthropic: `ANTHROPIC_API_KEY` |
234 | | -- **Local/Privacy Mode**: `PRIVACY_MODE=1` (uses Ollama only) |
235 | | -- **Timeouts/Retries**: `AGENTIC_TIMEOUT_SECONDS=30`, `AGENTIC_MAX_RETRIES=2` |
236 | | -- **Circuit Breaker**: `AGENTIC_CIRCUIT_THRESHOLD=3`, `AGENTIC_CIRCUIT_OPEN_SECONDS=60` |
237 | | - |
238 | | -### Observability (Langfuse) |
239 | | -Automatic tracing via LangChain callbacks - just set: |
240 | | -```bash |
241 | | -export LANGFUSE_PUBLIC_KEY="pk-lf-..." |
242 | | -export LANGFUSE_SECRET_KEY="sk-lf-..." |
243 | | -export LANGFUSE_HOST="https://us.cloud.langfuse.com" # or https://cloud.langfuse.com for EU |
244 | | -``` |
245 | | - |
246 | | -Traces include: |
247 | | -- Full prompts and responses |
248 | | -- Token counts and latency |
249 | | -- Cost estimates (for paid APIs) |
250 | | -- Model performance metrics |
251 | | - |
252 | | -View metrics: `GET /api/v1/metrics` |
253 | | - |
254 | | -### Feedback Store |
255 | | -- SQLite DB persisted in cache dir (sessions, feedback) |
256 | | -- 3-year retention policy with automatic cleanup |
257 | | - |
258 | | -### Architecture |
259 | | -See `LANGCHAIN_MIGRATION.md` for details on the LangChain/LangGraph implementation. |
| 26 | +The last standalone version was **0.81.0** (December 2025). No further releases will be made to PyPI. |
260 | 27 |
|
261 | 28 | ## License |
262 | | -MIT. See `LICENSE`. |
263 | | - |
264 | | -## Credits |
265 | | -- Audio transcription by AudioShake and Whisper (RunPod) |
266 | | -- Lyrics via Genius, Spotify, Musixmatch; layout via `karaoke-lyrics-processor` |
267 | | -- UI/API: FastAPI, Vite/React frontend |
268 | 29 |
|
269 | | -## Support |
270 | | -Please open issues or PRs on the repo, or contact @beveradb. |
| 30 | +MIT License - see [LICENSE](LICENSE) for details. |
0 commit comments