Skip to content

Add native MLX DeepFilterNet speech enhancement (v1/v2/v3)#561

Merged
lucasnewman merged 23 commits intoBlaizzy:mainfrom
kylehowells:main
Mar 11, 2026
Merged

Add native MLX DeepFilterNet speech enhancement (v1/v2/v3)#561
lucasnewman merged 23 commits intoBlaizzy:mainfrom
kylehowells:main

Conversation

@kylehowells
Copy link
Copy Markdown
Contributor

@kylehowells kylehowells commented Mar 10, 2026

DeepFilterNet is a widely-used, low-complexity speech enhancement framework for full-band 48 kHz audio.
This PR adds a pure-MLX implementation covering all three model versions (v1, v2, v3), with both offline and true stateful streaming inference.
The implementation was checked against the original PyTorch implementation (correlation 0.9997, SER 31.6 dB).

Description

A complete pure-MLX port of DeepFilterNet, organized under mlx_audio/sts/models/deepfilternet/:

  • Full inference pipeline: STFT/ISTFT with Vorbis window, ERB + DF feature extraction with EMA normalization, neural network inference, spectral reconstruction, and delay compensation
  • All three architectures: v1 (grouped GRU), v2 (enc_concat path), v3 (split DF+mask fusion) — architecture is selected automatically from config.json
  • True stateful streaming for v2/v3: per-hop processing with persistent analysis/synthesis memories, GRU hidden states, and lookahead alignment queues
  • HuggingFace integration: Downloads pretrained weights from mlx-community/DeepFilterNet-mlx with v1/v2/v3 subfolders
  • Performance optimizations: EMA normalization loops in numpy to avoid per-frame MLX kernel launch overhead; conv weight transpose caching
  • No PyTorch runtime dependency: All inference is pure MLX/numpy. The conversion script (scripts/convert.py) is standalone and only needed to regenerate weights.

Changes in the codebase

New files (model package)

  • mlx_audio/sts/models/deepfilternet/config.py — Dataclass configs for DF1/DF2/DF3
  • mlx_audio/sts/models/deepfilternet/model.py — Main runtime: STFT, features, enhance_array, enhance_file, streaming wrappers
  • mlx_audio/sts/models/deepfilternet/network.py — DF2/DF3 architecture (Encoder, ErbDecoder, DfDecoder, DeepFilterOp, DfNet)
  • mlx_audio/sts/models/deepfilternet/network_df1.py — DF1 architecture with grouped GRU
  • mlx_audio/sts/models/deepfilternet/streaming.py — True stateful streaming state machine
  • mlx_audio/sts/models/deepfilternet/weight_loader.py — PyTorch→MLX weight mapping and loading (pure MLX, no torch dependency)
  • mlx_audio/sts/models/deepfilternet/__init__.py — Public API exports
  • mlx_audio/sts/models/deepfilternet/README.md — Quick start and usage docs

New files (CLI, example, scripts)

  • mlx_audio/sts/generate.py — Generic STS CLI entrypoint supporting DeepFilterNet and MossFormer2
  • examples/denoise/noisey_audio_10s.wav — Sample noisy audio for testing
  • examples/denoise/noisey_audio_10s_target.wav — PyTorch reference output for parity testing
  • mlx_audio/sts/models/deepfilternet/scripts/convert.py — Standalone PyTorch checkpoint converter

New files (tests)

  • mlx_audio/sts/tests/test_deepfilternet.py — 19 tests: config, forward-pass shapes (DF1/DF2/DF3), runtime helpers, and end-to-end integration tests with real weights including target parity validation and optional PyTorch parity

Modified files

  • mlx_audio/sts/__init__.py — Added DeepFilterNet exports
  • mlx_audio/sts/models/__init__.py — Added DeepFilterNet imports
  • pyproject.toml — Registered generic mlx_audio.sts.generate CLI entrypoint
  • README.md — Added DeepFilterNet to model table
  • CONTRIBUTIONS.md — Added attribution

Changes outside the codebase

Additional information

PyTorch parity results

Tested on the included 10-second sample audio (MLX vs pre-generated PyTorch reference):

  • Correlation: 0.9997
  • Signal-to-Error Ratio: 31.6 dB
  • MAE: 0.001
  • RMS difference: < 0.5%

Test coverage

  • 10 unit tests (config, forward shapes, runtime helpers) — run without model weights
  • 9 integration tests (noise reduction, output range, streaming parity, file roundtrip, target parity, optional PyTorch parity) — skip gracefully when model/deps unavailable

Checklist

  • Tests added/updated
  • Documentation updated

Benchmark Results

Original Noisy Audio:
noisy_audio_10s.mp3

PyTorch DeepFilterNet Cleaned Audio:
noisy_audio_10s_target.mp3

mlx-audio Cleaned Audio:
noisy_audio_10s_dfn3_reference.mp3

Audio File Comparisons

correlation_scatter spectrogram_comparison waveform_comparison rms_comparison

kylehowells and others added 18 commits March 10, 2026 18:37
- Add missing Union import in model.py
- Remove unused tree_unflatten import in weight_loader.py
- Add docstrings to EncoderV1, ErbDecoderV1, DfDecoderV1
- Update README examples to use renamed noisey_audio_10s.wav
- Move convert script into model package to match project conventions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move scratchpad/ and convert.py into deepfilternet/scripts/ to match
project conventions. Rewrite benchmark.py to use argparse with relative
paths instead of hardcoded absolute paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 8 integration tests that load real pretrained weights and run
inference on the sample audio file:
- Offline: output length, range, noise reduction, non-silence
- Streaming: output length, range, noise reduction
- Offline vs streaming correlation (> 0.85)
- File roundtrip (enhance_file writes valid audio)
- PyTorch parity (correlation > 0.90, skipped if df not installed)

Tests are skipped gracefully when the model or audio is unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kylehowells
Copy link
Copy Markdown
Contributor Author

kylehowells commented Mar 10, 2026

I'm making an iOS app which needs noise reduction so wanted to add DeepFilterNet support. I wanted to use Swift/metal directly instead of using the Rust library bindings, so spent the last few days going over this getting the correlation between the Rust, and PyTorch and MLX versions as close as possible and performance tuning it.

Ended up with a real time factor of 0.271x and can process the 10s sample file in 2.713s.

I've put more effort into performance tuning the swift-mlx version (PR coming) as that's my main focus, but thought it would be easier to start with porting it to the python audio-MLX repo rather than jumping straight into swift.

I've cleaned up the final state, but left the full commit history. Initial changes were done mostly with the help of Codex-5.3 and then the cleanup, performance tuning, docs, testing etc... stuff was a mix of manual and Opus 4.6

@kylehowells kylehowells marked this pull request as ready for review March 10, 2026 18:57
Remove _REPO_ROOT/DEFAULT_MODEL_DIR/resolve_model_dir in favor of the
same pattern used by MossFormer2: default to the HuggingFace repo ID,
check if the path exists locally, otherwise download from HF hub.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lucasnewman
Copy link
Copy Markdown
Collaborator

@kylehowells Thanks for the contribution! A few changes needed before this can be merged:

  • We don't want any extraneous scripts along with the model -- we'd prefer to have your PR contain just the model itself and any relevant tests to verify the functionality.
  • Since you already have a converted model, you should upload the weights to the mlx-community section of huggingface and document the model repo id in the README. We don't need/want to support inline conversion from pytorch -- we actually strive to have no pytorch dependencies when possible.
  • We don't want command line tools tied to specific models. If you'd like, you can create a generate.py script in the sts area to exercise the functionality, so it can be run like python -m mlx_audio.sts.generate --model <repo_id> <...>. You'll want to closely follow the examples in the sts and stt directories.

If you're open to making those changes, we can get this merged!

Copy link
Copy Markdown
Collaborator

@lucasnewman lucasnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment.

@Blaizzy
Copy link
Copy Markdown
Owner

Blaizzy commented Mar 10, 2026

Additionally, I think this could be under VAD module instead of STS👌🏽

@kylehowells
Copy link
Copy Markdown
Contributor Author

Additionally, I think this could be under VAD module instead of STS👌🏽

@Blaizzy the closest existing model in the repo is "MossFormer2 SE" which does basically the same thing, which is currently in STS.

@lucasnewman
Copy link
Copy Markdown
Collaborator

Additionally, I think this could be under VAD module instead of STS👌🏽

@Blaizzy the closest existing model in the repo is "MossFormer2 SE" which does basically the same thing, which is currently in STS.

I think keeping it in STS is fine, it is a speech-in-speech-out model.

kylehowells and others added 2 commits March 11, 2026 00:08
…update model repo

- Remove benchmark, compare_outputs, and deep_filter_pytorch dev scripts
- Remove model-specific CLI (sts/deepfilternet.py) and example wrapper
- Add generic sts/generate.py CLI: python -m mlx_audio.sts.generate --model <repo>
- Update default repo to mlx-community/DeepFilterNet-mlx with v1/v2/v3 subfolders
- Add subfolder/version params to from_pretrained for multi-version repo support
- Remove libDF runtime fallback (no PyTorch dependencies at runtime)
- Update README and model docs to reference new HuggingFace repo
- Keep conversion script and PyTorch parity test (skips when df not installed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Generate noisey_audio_10s_target.wav from official PyTorch DeepFilterNet
- Add test_target_parity: compares MLX output against pre-generated reference
  using correlation (>0.999), SER (>25 dB), MAE (<0.002), and RMS diff (<1%)
- Does not require PyTorch to run — works from the committed WAV file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kylehowells
Copy link
Copy Markdown
Contributor Author

kylehowells commented Mar 11, 2026

upload the weights to the mlx-community section of huggingface

I actually didn't realise the mlx-community was open and I was able to upload them there.

@lucasnewman I've updated the PR.

  • I've removed the other scripts and just left the model conversion script.
  • The pre-generated models are now on mlx-community.
  • Removed the benchmarking and comparison tests with a new end to end test which generates a de-noised audio sample and checks it matches what we expect the model to output.
  • Added the new python -m mlx_audio.sts.generate CLI instead of the DeepFilterNet specific one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Blaizzy
Copy link
Copy Markdown
Owner

Blaizzy commented Mar 11, 2026

@Blaizzy the closest existing model in the repo is "MossFormer2 SE" which does basically the same thing, which is currently in STS.

Makes sense! 👌🏽

@lucasnewman
Copy link
Copy Markdown
Collaborator

@kylehowells Looking good! Can you run the formatter with pre-commit run --all? Then we can merge.

Copy link
Copy Markdown
Collaborator

@lucasnewman lucasnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you! 🚀

@lucasnewman lucasnewman merged commit 1f7dea1 into Blaizzy:main Mar 11, 2026
10 checks passed
@kylehowells
Copy link
Copy Markdown
Contributor Author

Ran the formatter (and dragged in some output test files so reverted that) and pushed the reformatting changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants