-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed
Milestone
Description
✨ Motivation
After a few tests with stereo audio, we found most of them don’t come with discrete vocal and instrumental stems.
By adding an ML source‑separation layer we can:
-
Dynamic mixing – route isolated vocals to the center channel, send drums/bass to LFE, etc.
-
Karaoke / practice mode – solo or mute vocals on‑the‑fly.
-
Accessibility – raise vocal intelligibility for listeners with hearing loss.
🛠 Proposed Approach
| Item | Detail |
|---|---|
| Model | Start with the open‑source Hybrid Transformer Demucs v4 checkpoint (MIT‑licensed, PyTorch). |
| Inference mode | Offline (full track) first; near‑realtime streaming as a follow‑up. |
| Interface | New CLI flag --separate=2 or --separate=4, plus player UI toggle. |
| Output | Standard 32‑bit float WAV stems: vocals.wav, accompaniment.wav (or drums/bass/other). |
| Routing | Default mapping: vocals → center + height, accompaniment → L/R + surrounds. Configurable in config.yaml. |
📚 Below References Might be helpful
-
Demucs v4 paper & code – https://arxiv.org/abs/2211.15662
-
MUSDB18‑HQ dataset – https://sigsep.github.io/datasets/musdb.html
-
nussl toolkit – https://github.com/interactiveaudiolab/nussl
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed