Skip to content

Add an ML‑based “Vocal ⇆ Instrumental” Source‑Separation Layer #57

@Sam-Mucyo

Description

@Sam-Mucyo

✨ Motivation

After a few tests with stereo audio, we found most of them don’t come with discrete vocal and instrumental stems.
By adding an ML source‑separation layer we can:

  • Dynamic mixing – route isolated vocals to the center channel, send drums/bass to LFE, etc.

  • Karaoke / practice mode – solo or mute vocals on‑the‑fly.

  • Accessibility – raise vocal intelligibility for listeners with hearing loss.

🛠 Proposed Approach

Item Detail
Model Start with the open‑source Hybrid Transformer Demucs v4 checkpoint (MIT‑licensed, PyTorch).
Inference mode Offline (full track) first; near‑realtime streaming as a follow‑up.
Interface New CLI flag --separate=2 or --separate=4, plus player UI toggle.
Output Standard 32‑bit float WAV stems: vocals.wav, accompaniment.wav (or drums/bass/other).
Routing Default mapping: vocals → center + height, accompaniment → L/R + surrounds. Configurable in config.yaml.

📚 Below References Might be helpful

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions