Add native MLX DeepFilterNet speech enhancement (v1/v2/v3)#561
Add native MLX DeepFilterNet speech enhancement (v1/v2/v3)#561lucasnewman merged 23 commits intoBlaizzy:mainfrom
Conversation
- Add missing Union import in model.py - Remove unused tree_unflatten import in weight_loader.py - Add docstrings to EncoderV1, ErbDecoderV1, DfDecoderV1 - Update README examples to use renamed noisey_audio_10s.wav - Move convert script into model package to match project conventions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move scratchpad/ and convert.py into deepfilternet/scripts/ to match project conventions. Rewrite benchmark.py to use argparse with relative paths instead of hardcoded absolute paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 8 integration tests that load real pretrained weights and run inference on the sample audio file: - Offline: output length, range, noise reduction, non-silence - Streaming: output length, range, noise reduction - Offline vs streaming correlation (> 0.85) - File roundtrip (enhance_file writes valid audio) - PyTorch parity (correlation > 0.90, skipped if df not installed) Tests are skipped gracefully when the model or audio is unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
I'm making an iOS app which needs noise reduction so wanted to add DeepFilterNet support. I wanted to use Swift/metal directly instead of using the Rust library bindings, so spent the last few days going over this getting the correlation between the Rust, and PyTorch and MLX versions as close as possible and performance tuning it. Ended up with a real time factor of 0.271x and can process the 10s sample file in 2.713s. I've put more effort into performance tuning the swift-mlx version (PR coming) as that's my main focus, but thought it would be easier to start with porting it to the python audio-MLX repo rather than jumping straight into swift. I've cleaned up the final state, but left the full commit history. Initial changes were done mostly with the help of Codex-5.3 and then the cleanup, performance tuning, docs, testing etc... stuff was a mix of manual and Opus 4.6 |
Remove _REPO_ROOT/DEFAULT_MODEL_DIR/resolve_model_dir in favor of the same pattern used by MossFormer2: default to the HuggingFace repo ID, check if the path exists locally, otherwise download from HF hub. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@kylehowells Thanks for the contribution! A few changes needed before this can be merged:
If you're open to making those changes, we can get this merged! |
|
Additionally, I think this could be under VAD module instead of STS👌🏽 |
@Blaizzy the closest existing model in the repo is "MossFormer2 SE" which does basically the same thing, which is currently in STS. |
I think keeping it in STS is fine, it is a speech-in-speech-out model. |
…update model repo - Remove benchmark, compare_outputs, and deep_filter_pytorch dev scripts - Remove model-specific CLI (sts/deepfilternet.py) and example wrapper - Add generic sts/generate.py CLI: python -m mlx_audio.sts.generate --model <repo> - Update default repo to mlx-community/DeepFilterNet-mlx with v1/v2/v3 subfolders - Add subfolder/version params to from_pretrained for multi-version repo support - Remove libDF runtime fallback (no PyTorch dependencies at runtime) - Update README and model docs to reference new HuggingFace repo - Keep conversion script and PyTorch parity test (skips when df not installed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Generate noisey_audio_10s_target.wav from official PyTorch DeepFilterNet - Add test_target_parity: compares MLX output against pre-generated reference using correlation (>0.999), SER (>25 dB), MAE (<0.002), and RMS diff (<1%) - Does not require PyTorch to run — works from the committed WAV file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
I actually didn't realise the mlx-community was open and I was able to upload them there. @lucasnewman I've updated the PR.
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Makes sense! 👌🏽 |
|
@kylehowells Looking good! Can you run the formatter with |
lucasnewman
left a comment
There was a problem hiding this comment.
Looks great, thank you! 🚀
|
Ran the formatter (and dragged in some output test files so reverted that) and pushed the reformatting changes. |
DeepFilterNet is a widely-used, low-complexity speech enhancement framework for full-band 48 kHz audio.
This PR adds a pure-MLX implementation covering all three model versions (v1, v2, v3), with both offline and true stateful streaming inference.
The implementation was checked against the original PyTorch implementation (correlation 0.9997, SER 31.6 dB).
Description
A complete pure-MLX port of DeepFilterNet, organized under
mlx_audio/sts/models/deepfilternet/:config.jsonmlx-community/DeepFilterNet-mlxwith v1/v2/v3 subfoldersscripts/convert.py) is standalone and only needed to regenerate weights.Changes in the codebase
New files (model package)
mlx_audio/sts/models/deepfilternet/config.py— Dataclass configs for DF1/DF2/DF3mlx_audio/sts/models/deepfilternet/model.py— Main runtime: STFT, features, enhance_array, enhance_file, streaming wrappersmlx_audio/sts/models/deepfilternet/network.py— DF2/DF3 architecture (Encoder, ErbDecoder, DfDecoder, DeepFilterOp, DfNet)mlx_audio/sts/models/deepfilternet/network_df1.py— DF1 architecture with grouped GRUmlx_audio/sts/models/deepfilternet/streaming.py— True stateful streaming state machinemlx_audio/sts/models/deepfilternet/weight_loader.py— PyTorch→MLX weight mapping and loading (pure MLX, no torch dependency)mlx_audio/sts/models/deepfilternet/__init__.py— Public API exportsmlx_audio/sts/models/deepfilternet/README.md— Quick start and usage docsNew files (CLI, example, scripts)
mlx_audio/sts/generate.py— Generic STS CLI entrypoint supporting DeepFilterNet and MossFormer2examples/denoise/noisey_audio_10s.wav— Sample noisy audio for testingexamples/denoise/noisey_audio_10s_target.wav— PyTorch reference output for parity testingmlx_audio/sts/models/deepfilternet/scripts/convert.py— Standalone PyTorch checkpoint converterNew files (tests)
mlx_audio/sts/tests/test_deepfilternet.py— 19 tests: config, forward-pass shapes (DF1/DF2/DF3), runtime helpers, and end-to-end integration tests with real weights including target parity validation and optional PyTorch parityModified files
mlx_audio/sts/__init__.py— Added DeepFilterNet exportsmlx_audio/sts/models/__init__.py— Added DeepFilterNet importspyproject.toml— Registered genericmlx_audio.sts.generateCLI entrypointREADME.md— Added DeepFilterNet to model tableCONTRIBUTIONS.md— Added attributionChanges outside the codebase
mlx-community/DeepFilterNet-mlxon HuggingFace, with v1/v2/v3 subfoldersAdditional information
PyTorch parity results
Tested on the included 10-second sample audio (MLX vs pre-generated PyTorch reference):
Test coverage
Checklist
Benchmark Results
Original Noisy Audio:
noisy_audio_10s.mp3
PyTorch DeepFilterNet Cleaned Audio:
noisy_audio_10s_target.mp3
mlx-audio Cleaned Audio:
noisy_audio_10s_dfn3_reference.mp3
Audio File Comparisons