This example demonstrates how to use the Multi-Model Audio Separator feature in OwnAudioSharp to process audio through multiple UVR MDX models in parallel and average their results for superior quality.
Multi-model processing in OwnAudioSharp uses an averaging pipeline. Instead of chaining models where one model's output is the next one's input, all models process the original audio independently in parallel. Their outputs (vocals and instrumentals) are then mathematically averaged together.
This technique is powerful because different models often have different "blind spots" or artifacts. By averaging them, you can cancel out specific artifacts and achieve a result that is cleaner and more balanced than any single model could produce.
Original Mix
│
┌────────────┼────────────┐
↓ ↓ ↓
┌────────┐ ┌────────┐ ┌────────┐
│Model 1 │ │Model 2 │ │Model 3 │ ← All process original
│ (Best) │ │(Default)│ │(Karaoke)│ independently
└────────┘ └────────┘ └────────┘
│ │ │
↓ ↓ ↓
V₁ + I₁ V₂ + I₂ V₃ + I₃
│ │ │
└────────────┼────────────┘
↓
┌─────────────┐
│ AVERAGING │
└─────────────┘
│
┌────────────┴────────────┐
↓ ↓
Vocals_avg Instrumental_avg
(V₁+V₂...)/N (I₁+I₂...)/N
- Vocal Refinement: Average multiple vocal models to reduce "metallic" artifacts or robotic sounds.
- Instrumental Cleaning: Combine several instrumental models to get a backing track with minimal vocal bleed.
- Specialized Combination: Mix a vocal-focused model with an instrumental-focused model to get the "best of both worlds".
cd OwnAudio/Examples/Ownaudio.Example.MultimodelSeparator
dotnet runThe program will prompt you to choose one of the example pipelines.
# Run with custom input and output paths
dotnet run "path/to/song.mp3" "path/to/output"
# Show help
dotnet run --helpThe easiest way to get started. Uses a helper method to average results from two models.
var separator = MultiModelExtensions.CreateSimplePipeline(
model1: InternalModel.Best,
model2: InternalModel.Karaoke,
outputDirectory: "output"
);
separator.Initialize();
var result = separator.Separate("song.mp3");Use case: Basic two-model averaging for improved quality.
Demonstrates a three-model pipeline with all intermediate results saved for comparison.
var separator = MultiModelExtensions.CreateTriplePipeline(
model1: InternalModel.Best,
model2: InternalModel.Default,
model3: InternalModel.Karaoke,
outputDirectory: "output"
);Use case: High-quality averaging with debugging outputs.
Full control over every aspect of the averaging, including per-model settings and specific intermediate saves.
var options = new MultiModelSeparationOptions
{
Models = new List<MultiModelInfo>
{
new MultiModelInfo
{
Name = "VocalExtraction",
Model = InternalModel.Best,
NFft = 6144,
SaveIntermediateOutput = true
},
new MultiModelInfo
{
Name = "Enhancement",
Model = InternalModel.Default,
SaveIntermediateOutput = true
}
},
SaveAllIntermediateResults = true
};Use case: Production pipelines requiring fine-grained control and artifact analysis.
Shows how to use your own ONNX model files from disk.
var options = new MultiModelSeparationOptions
{
Models = new List<MultiModelInfo>
{
new MultiModelInfo
{
Name = "CustomModel1",
ModelPath = "models/Voc_FT.onnx"
},
new MultiModelInfo
{
Name = "CustomModel2",
ModelPath = "models/Inst_HQ_3.onnx"
}
}
};Use case: Using custom-trained or community models not embedded in the library.
This demo shows how the system automatically detects whether a model outputs vocals or instrumentals based on its name or metadata.
Demonstrates how to explicitly combine models with different outputs (e.g., one vocal-focused and one instrumental-focused).
new MultiModelInfo
{
Name = "VocalModel",
ModelPath = "path/to/vocal_model.onnx",
OutputType = ModelOutputType.Vocals // Explicitly set output stem
},
new MultiModelInfo
{
Name = "InstrumentalModel",
ModelPath = "path/to/inst_model.onnx",
OutputType = ModelOutputType.Instrumental
}| Property | Type | Default | Description |
|---|---|---|---|
Models |
List<MultiModelInfo> |
Required | List of models to process in sequence |
OutputDirectory |
string |
"separated_multimodel" |
Output directory for results |
EnableGPU |
bool |
true |
Enable GPU acceleration |
ChunkSizeSeconds |
int |
15 |
Chunk size in seconds |
Margin |
int |
44100 |
Margin size for overlapping chunks |
SaveAllIntermediateResults |
bool |
false |
Save output after each model |
| Property | Type | Default | Description |
|---|---|---|---|
Name |
string |
"Model" |
Display name for this model |
Model |
InternalModel |
None |
Embedded model to use |
ModelPath |
string? |
null |
Path to custom ONNX model file |
OutputType |
ModelOutputType? |
null |
Vocals or Instrumental (auto-detected if null) |
NFft |
int |
6144 |
FFT size (auto-detected if 0) |
DimT |
int |
8 |
Temporal dimension (power of 2) |
DimF |
int |
2048 |
Frequency dimension |
DisableNoiseReduction |
bool |
false |
Disable noise reduction pass |
SaveIntermediateOutput |
bool |
false |
Save output from this model |
Subscribe to events for real-time progress updates:
separator.ProgressChanged += (sender, progress) =>
{
Console.WriteLine($"[{progress.CurrentModelName}] " +
$"Chunk {progress.ProcessedChunks}/{progress.TotalChunks} " +
$"({progress.OverallProgress:F1}%)");
};
separator.ProcessingCompleted += (sender, result) =>
{
Console.WriteLine($"Completed in {result.ProcessingTime}");
Console.WriteLine($"Output: {result.OutputPath}");
};- Streaming: Uses a streaming pipeline to process audio in small chunks, keeping memory footprint low.
- Per Model: ~500-800 MB for 15-second chunks.
- Sequential Loading: Models are loaded and processed one by one against the original audio to save memory.
- CPU: ~10-15x realtime per model (on modern hardware).
- GPU: ~50-100x realtime per model.
- Total Time: Sum of all models' processing times + minor overhead for averaging.
For Quality:
ChunkSizeSeconds = 15,
Margin = 44100, // 1 second margin for smooth blending
EnableGPU = trueFor Speed:
ChunkSizeSeconds = 10,
Margin = 22050, // 0.5 second margin
EnableGPU = trueFor Memory-Constrained Systems:
ChunkSizeSeconds = 5,
Margin = 11025, // 0.25 second margin- Ensure you're using valid
InternalModelenum values or valid paths to.onnxfiles.
- Reduce
ChunkSizeSeconds(e.g., from 15 to 10). - Reduce
Marginsize.
- Ensure GPU acceleration is enabled:
EnableGPU = true. - CoreML is used on macOS, CUDA on Windows/Linux.
- Check the
OutputTypeof your models. If auto-detection fails, explicitly set it toVocalsorInstrumental. - Save intermediate results (
SaveAllIntermediateResults = true) to see which model in the average is causing issues.
- Model Compatibility: Only UVR MDX-style models are supported (STFT-based)
- Input Formats: WAV, MP3, FLAC (automatically resampled to 44.1kHz)
- Output Format: 16-bit stereo WAV at 44.1kHz
- GPU Support: CUDA on Windows/Linux, CoreML on macOS
- Ownaudio.Example.VocalRemover - Single model separation
- Ownaudio.Example.ChordDetect - Chord detection
- Ownaudio.Example.Matching - Audio matchering
- OwnAudioSharp Documentation
- CLAUDE.md - Development guidelines
- Main README - Project overview