This is an early prototype for a tool that helps with singing. The goal is to imitate karaoke software with pitch visualization, but without any catalogue restriction.
The main pipeline is:
- Input as artist + song name
- -> Youtube lookup
- -> audio download
- -> vocals/instrumental split
- -> pitch tracking
- -> comparaison with mic pitch tracking
- If an assembly ai api key is set, vocals are sent for lyrics transcription
Note: lyric transcription through assembly.ai isn't free, but it costs roughly $0.01 per song, out of $50 free credits for new accounts.
- Python, 3.11 (specifically for spleeter).
- Poetry
ffmpegandlibportaudio2. On Ubuntu-based distributions,sudo apt install ffmpeg libportaudio2.- (optional) For lyrics transcription, an AssemblyAI key should be set in env variable
ASSEMBLY_AI_API_KEY. It costs roughly $0.01 credits per song.
Note: it shouldn't be linux specific, but installing requirements is less convenient on Windows. Especially as this depends on Spleeter, which pins its dependencies to versions that have no wheel for Windows. Maybe I'll try to open a PR on Spleeter to update its dependencies.
poetry run karaoke_helper "The Beatles" "Blackbird"
Which then looks like this:
The horizontal axis is the time (left to right), with the vertical red line being "now".
The vertical axis is how "high" a pitch is, and the color shows the intensity of that pitch at any given time. But there's more than one line, because harmonics are also displayed.
To match exactly the original singer's pitch, your lowest line should line up with their lowest line. To sound "in tune", any of your line needs to line up with any of the singer's line. This may harmonize and add some colour to the song, but it never sounds out of tune.
There's of course some noise, which should be ignored. Sometimes, electric guitar riffs aren't split properly from the vocals and their pitch can show up.