Add an ML‑based “Vocal ⇆ Instrumental” Source‑Separation Layer

<h3>✨ Motivation</h3>
<p>After a few tests with stereo audio, we found most of them don’t come with discrete vocal and instrumental stems.<br>
By adding an ML source‑separation layer we can:</p>
<ul>
<li>
<p><strong>Dynamic mixing</strong> –&nbsp;route isolated vocals to the center channel, send drums/bass to LFE, etc.</p>
</li>
<li>
<p><strong>Karaoke / practice mode</strong> – solo or mute vocals on‑the‑fly.</p>
</li>
<li>
<p><strong>Accessibility</strong> –&nbsp;raise vocal intelligibility for listeners with hearing loss.</p>
</li>
</ul>

<h3>🛠 Proposed Approach</h3>

Item | Detail
-- | --
Model | Start with the open‑source Hybrid Transformer Demucs v4 checkpoint (MIT‑licensed, PyTorch).
Inference mode | Offline (full track) first; near‑realtime streaming as a follow‑up.
Interface | New CLI flag --separate=2 or --separate=4, plus player UI toggle.
Output | Standard 32‑bit float WAV stems: vocals.wav, accompaniment.wav (or drums/bass/other).
Routing | Default mapping: vocals → center + height, accompaniment → L/R + surrounds. Configurable in config.yaml.


<h3>📚 Below References Might be helpful</h3>
<ul>
<li>
<p><strong>Demucs v4 paper &amp; code</strong> –&nbsp;<a href="https://arxiv.org/abs/2211.15662">https://arxiv.org/abs/2211.15662</a></p>
</li>
<li>
<p><strong>MUSDB18‑HQ dataset</strong> –&nbsp;<a href="https://sigsep.github.io/datasets/musdb.html">https://sigsep.github.io/datasets/musdb.html</a></p>
</li>
<li>
<p><strong>nussl toolkit</strong> –&nbsp;<a href="https://github.com/interactiveaudiolab/nussl">https://github.com/interactiveaudiolab/nussl</a></p>
</li>
</ul>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an ML‑based “Vocal ⇆ Instrumental” Source‑Separation Layer #57

✨ Motivation

🛠 Proposed Approach

📚 Below References Might be helpful

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Item	Detail
Model	Start with the open‑source Hybrid Transformer Demucs v4 checkpoint (MIT‑licensed, PyTorch).
Inference mode	Offline (full track) first; near‑realtime streaming as a follow‑up.
Interface	New CLI flag --separate=2 or --separate=4, plus player UI toggle.
Output	Standard 32‑bit float WAV stems: vocals.wav, accompaniment.wav (or drums/bass/other).
Routing	Default mapping: vocals → center + height, accompaniment → L/R + surrounds. Configurable in config.yaml.

Add an ML‑based “Vocal ⇆ Instrumental” Source‑Separation Layer #57

Description

✨ Motivation

🛠 Proposed Approach

📚 Below References Might be helpful

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Add an ML‑based “Vocal ⇆ Instrumental” Source‑Separation Layer #57

🛠 Proposed Approach