Add native MLX DeepFilterNet speech enhancement (v1/v2/v3) by kylehowells · Pull Request #561 · Blaizzy/mlx-audio

kylehowells · 2026-03-10T18:28:24Z

DeepFilterNet is a widely-used, low-complexity speech enhancement framework for full-band 48 kHz audio.
This PR adds a pure-MLX implementation covering all three model versions (v1, v2, v3), with both offline and true stateful streaming inference.
The implementation was checked against the original PyTorch implementation (correlation 0.9997, SER 31.6 dB).

Description

A complete pure-MLX port of DeepFilterNet, organized under mlx_audio/sts/models/deepfilternet/:

Full inference pipeline: STFT/ISTFT with Vorbis window, ERB + DF feature extraction with EMA normalization, neural network inference, spectral reconstruction, and delay compensation
All three architectures: v1 (grouped GRU), v2 (enc_concat path), v3 (split DF+mask fusion) — architecture is selected automatically from config.json
True stateful streaming for v2/v3: per-hop processing with persistent analysis/synthesis memories, GRU hidden states, and lookahead alignment queues
HuggingFace integration: Downloads pretrained weights from mlx-community/DeepFilterNet-mlx with v1/v2/v3 subfolders
Performance optimizations: EMA normalization loops in numpy to avoid per-frame MLX kernel launch overhead; conv weight transpose caching
No PyTorch runtime dependency: All inference is pure MLX/numpy. The conversion script (scripts/convert.py) is standalone and only needed to regenerate weights.

Changes in the codebase

New files (model package)

mlx_audio/sts/models/deepfilternet/config.py — Dataclass configs for DF1/DF2/DF3
mlx_audio/sts/models/deepfilternet/model.py — Main runtime: STFT, features, enhance_array, enhance_file, streaming wrappers
mlx_audio/sts/models/deepfilternet/network.py — DF2/DF3 architecture (Encoder, ErbDecoder, DfDecoder, DeepFilterOp, DfNet)
mlx_audio/sts/models/deepfilternet/network_df1.py — DF1 architecture with grouped GRU
mlx_audio/sts/models/deepfilternet/streaming.py — True stateful streaming state machine
mlx_audio/sts/models/deepfilternet/weight_loader.py — PyTorch→MLX weight mapping and loading (pure MLX, no torch dependency)
mlx_audio/sts/models/deepfilternet/__init__.py — Public API exports
mlx_audio/sts/models/deepfilternet/README.md — Quick start and usage docs

New files (CLI, example, scripts)

mlx_audio/sts/generate.py — Generic STS CLI entrypoint supporting DeepFilterNet and MossFormer2
examples/denoise/noisey_audio_10s.wav — Sample noisy audio for testing
examples/denoise/noisey_audio_10s_target.wav — PyTorch reference output for parity testing
mlx_audio/sts/models/deepfilternet/scripts/convert.py — Standalone PyTorch checkpoint converter

New files (tests)

mlx_audio/sts/tests/test_deepfilternet.py — 19 tests: config, forward-pass shapes (DF1/DF2/DF3), runtime helpers, and end-to-end integration tests with real weights including target parity validation and optional PyTorch parity

Modified files

mlx_audio/sts/__init__.py — Added DeepFilterNet exports
mlx_audio/sts/models/__init__.py — Added DeepFilterNet imports
pyproject.toml — Registered generic mlx_audio.sts.generate CLI entrypoint
README.md — Added DeepFilterNet to model table
CONTRIBUTIONS.md — Added attribution

Changes outside the codebase

Pretrained MLX weights for all three versions consolidated at mlx-community/DeepFilterNet-mlx on HuggingFace, with v1/v2/v3 subfolders

Additional information

PyTorch parity results

Tested on the included 10-second sample audio (MLX vs pre-generated PyTorch reference):

Correlation: 0.9997
Signal-to-Error Ratio: 31.6 dB
MAE: 0.001
RMS difference: < 0.5%

Test coverage

10 unit tests (config, forward shapes, runtime helpers) — run without model weights
9 integration tests (noise reduction, output range, streaming parity, file roundtrip, target parity, optional PyTorch parity) — skip gracefully when model/deps unavailable

Checklist

Tests added/updated
Documentation updated

Benchmark Results

Original Noisy Audio:
noisy_audio_10s.mp3

PyTorch DeepFilterNet Cleaned Audio:
noisy_audio_10s_target.mp3

mlx-audio Cleaned Audio:
noisy_audio_10s_dfn3_reference.mp3

Audio File Comparisons

- Add missing Union import in model.py - Remove unused tree_unflatten import in weight_loader.py - Add docstrings to EncoderV1, ErbDecoderV1, DfDecoderV1 - Update README examples to use renamed noisey_audio_10s.wav - Move convert script into model package to match project conventions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move scratchpad/ and convert.py into deepfilternet/scripts/ to match project conventions. Rewrite benchmark.py to use argparse with relative paths instead of hardcoded absolute paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add 8 integration tests that load real pretrained weights and run inference on the sample audio file: - Offline: output length, range, noise reduction, non-silence - Streaming: output length, range, noise reduction - Offline vs streaming correlation (> 0.85) - File roundtrip (enhance_file writes valid audio) - PyTorch parity (correlation > 0.90, skipped if df not installed) Tests are skipped gracefully when the model or audio is unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kylehowells · 2026-03-10T18:54:06Z

I'm making an iOS app which needs noise reduction so wanted to add DeepFilterNet support. I wanted to use Swift/metal directly instead of using the Rust library bindings, so spent the last few days going over this getting the correlation between the Rust, and PyTorch and MLX versions as close as possible and performance tuning it.

Ended up with a real time factor of 0.271x and can process the 10s sample file in 2.713s.

I've put more effort into performance tuning the swift-mlx version (PR coming) as that's my main focus, but thought it would be easier to start with porting it to the python audio-MLX repo rather than jumping straight into swift.

I've cleaned up the final state, but left the full commit history. Initial changes were done mostly with the help of Codex-5.3 and then the cleanup, performance tuning, docs, testing etc... stuff was a mix of manual and Opus 4.6

Remove _REPO_ROOT/DEFAULT_MODEL_DIR/resolve_model_dir in favor of the same pattern used by MossFormer2: default to the HuggingFace repo ID, check if the path exists locally, otherwise download from HF hub. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

lucasnewman · 2026-03-10T21:50:25Z

@kylehowells Thanks for the contribution! A few changes needed before this can be merged:

We don't want any extraneous scripts along with the model -- we'd prefer to have your PR contain just the model itself and any relevant tests to verify the functionality.
Since you already have a converted model, you should upload the weights to the mlx-community section of huggingface and document the model repo id in the README. We don't need/want to support inline conversion from pytorch -- we actually strive to have no pytorch dependencies when possible.
We don't want command line tools tied to specific models. If you'd like, you can create a generate.py script in the sts area to exercise the functionality, so it can be run like python -m mlx_audio.sts.generate --model <repo_id> <...>. You'll want to closely follow the examples in the sts and stt directories.

If you're open to making those changes, we can get this merged!

lucasnewman

See comment.

Blaizzy · 2026-03-10T22:01:05Z

Additionally, I think this could be under VAD module instead of STS👌🏽

kylehowells · 2026-03-10T23:14:54Z

Additionally, I think this could be under VAD module instead of STS👌🏽

@Blaizzy the closest existing model in the repo is "MossFormer2 SE" which does basically the same thing, which is currently in STS.

lucasnewman · 2026-03-10T23:16:26Z

Additionally, I think this could be under VAD module instead of STS👌🏽

@Blaizzy the closest existing model in the repo is "MossFormer2 SE" which does basically the same thing, which is currently in STS.

I think keeping it in STS is fine, it is a speech-in-speech-out model.

…update model repo - Remove benchmark, compare_outputs, and deep_filter_pytorch dev scripts - Remove model-specific CLI (sts/deepfilternet.py) and example wrapper - Add generic sts/generate.py CLI: python -m mlx_audio.sts.generate --model <repo> - Update default repo to mlx-community/DeepFilterNet-mlx with v1/v2/v3 subfolders - Add subfolder/version params to from_pretrained for multi-version repo support - Remove libDF runtime fallback (no PyTorch dependencies at runtime) - Update README and model docs to reference new HuggingFace repo - Keep conversion script and PyTorch parity test (skips when df not installed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Generate noisey_audio_10s_target.wav from official PyTorch DeepFilterNet - Add test_target_parity: compares MLX output against pre-generated reference using correlation (>0.999), SER (>25 dB), MAE (<0.002), and RMS diff (<1%) - Does not require PyTorch to run — works from the committed WAV file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kylehowells · 2026-03-11T00:17:36Z

upload the weights to the mlx-community section of huggingface

I actually didn't realise the mlx-community was open and I was able to upload them there.

@lucasnewman I've updated the PR.

I've removed the other scripts and just left the model conversion script.
The pre-generated models are now on mlx-community.
Removed the benchmarking and comparison tests with a new end to end test which generates a de-noised audio sample and checks it matches what we expect the model to output.
Added the new python -m mlx_audio.sts.generate CLI instead of the DeepFilterNet specific one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Blaizzy · 2026-03-11T00:21:37Z

@Blaizzy the closest existing model in the repo is "MossFormer2 SE" which does basically the same thing, which is currently in STS.

Makes sense! 👌🏽

lucasnewman · 2026-03-11T15:33:11Z

@kylehowells Looking good! Can you run the formatter with pre-commit run --all? Then we can merge.

lucasnewman

Looks great, thank you! 🚀

kylehowells · 2026-03-11T16:26:25Z

Ran the formatter (and dragged in some output test files so reverted that) and pushed the reformatting changes.

kylehowells and others added 18 commits March 10, 2026 18:37

Add MLX DeepFilterNet CLI and parity fixes

71785d3

Add noise-reduction overlay and abs-diff comparison plots

1e5b953

Match libDF ISTFT normalization for closer PyTorch parity

eda6dd1

DeepFilterNet MLX parity fixes and benchmark tooling

a129feb

Speed up MLX DeepFilterNet inference hot paths

73f4705

Add native MLX support for DeepFilterNet v1/v2/v3

7c897c4

refactor DeepFilterNet into STS and add true streaming support

6780c88

deepfilternet: support model-path loading with config-driven version

2ec2cff

deepfilternet: speed up mlx path and simplify model loading

0a19b28

deepfilternet: consolidate model selection to --model

fe2556e

style: run black and isort across deepfilternet changes

1c1e793

docs: add DeepFilterNet attribution to contributions

ec3a079

docs: update DeepFilterNet model link in README

8a0582a

deepfilternet: add sample noisy audio for testing

9fd175f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: format benchmark.py with black

3356d5b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kylehowells force-pushed the main branch from d8ddc18 to 85b6d09 Compare March 10, 2026 18:37

kylehowells marked this pull request as ready for review March 10, 2026 18:57

lucasnewman requested changes Mar 10, 2026

View reviewed changes

kylehowells and others added 2 commits March 11, 2026 00:08

sts generate: simplify CLI with lazy imports and remove dead code

c950c22

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

lucasnewman approved these changes Mar 11, 2026

View reviewed changes

style: run pre-commit formatter

6db73c5

kylehowells force-pushed the main branch from 4067ef5 to 6db73c5 Compare March 11, 2026 16:07

lucasnewman approved these changes Mar 11, 2026

View reviewed changes

lucasnewman merged commit 1f7dea1 into Blaizzy:main Mar 11, 2026
10 checks passed

kylehowells mentioned this pull request Mar 16, 2026

Add DeepFilterNet speech enhancement model Blaizzy/mlx-audio-swift#100

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add native MLX DeepFilterNet speech enhancement (v1/v2/v3)#561

Add native MLX DeepFilterNet speech enhancement (v1/v2/v3)#561
lucasnewman merged 23 commits intoBlaizzy:mainfrom
kylehowells:main

kylehowells commented Mar 10, 2026 •

edited

Loading

Uh oh!

kylehowells commented Mar 10, 2026 •

edited

Loading

Uh oh!

lucasnewman commented Mar 10, 2026

Uh oh!

lucasnewman left a comment

Uh oh!

Blaizzy commented Mar 10, 2026

Uh oh!

kylehowells commented Mar 10, 2026

Uh oh!

lucasnewman commented Mar 10, 2026

Uh oh!

kylehowells commented Mar 11, 2026 •

edited

Loading

Uh oh!

Blaizzy commented Mar 11, 2026

Uh oh!

lucasnewman commented Mar 11, 2026

Uh oh!

lucasnewman left a comment

Uh oh!

Uh oh!

kylehowells commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

kylehowells commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes in the codebase

New files (model package)

New files (CLI, example, scripts)

New files (tests)

Modified files

Changes outside the codebase

Additional information

PyTorch parity results

Test coverage

Checklist

Benchmark Results

Audio File Comparisons

Uh oh!

kylehowells commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucasnewman commented Mar 10, 2026

Uh oh!

lucasnewman left a comment

Choose a reason for hiding this comment

Uh oh!

Blaizzy commented Mar 10, 2026

Uh oh!

kylehowells commented Mar 10, 2026

Uh oh!

lucasnewman commented Mar 10, 2026

Uh oh!

kylehowells commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Blaizzy commented Mar 11, 2026

Uh oh!

lucasnewman commented Mar 11, 2026

Uh oh!

lucasnewman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylehowells commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kylehowells commented Mar 10, 2026 •

edited

Loading

kylehowells commented Mar 10, 2026 •

edited

Loading

kylehowells commented Mar 11, 2026 •

edited

Loading