Skip to content

fix(fish_qwen3_omni): support loading quantized/converted models in sanitize()#584

Merged
lucasnewman merged 1 commit intoBlaizzy:mainfrom
yoshphys:fix/fish-qwen3-omni-sanitize-quantized-weights
Mar 20, 2026
Merged

fix(fish_qwen3_omni): support loading quantized/converted models in sanitize()#584
lucasnewman merged 1 commit intoBlaizzy:mainfrom
yoshphys:fix/fish-qwen3-omni-sanitize-quantized-weights

Conversation

@yoshphys
Copy link
Copy Markdown
Contributor

Problem

sanitize() in fish_qwen3_omni/fish_speech.py skips any weight key that does not start with text_model.model. or audio_decoder., via else: continue. This means that when loading a model that was already converted to MLX format (e.g., produced by python -m mlx_audio.convert --quantize), all weight keys (which start with model.) are silently dropped, resulting in an empty dict.

The downstream effect:

  • apply_quantization() finds no .scales keys → skips quantization entirely
  • load_weights() is called with an empty dict → raises ValueError: Missing N parameters

This is the root cause of the bug reported in #578 ("Fish s2 pro Breaks when quantizing using convert").

Fix

Add a model.* passthrough case before the existing conditions so that keys already in MLX format are preserved as-is:

if key.startswith("model."):
    # Already in MLX format (e.g., previously converted/quantized model)
    remapped[key] = value
elif key.startswith("text_model.model."):
    ...

How to reproduce

python -m mlx_audio.convert \
  --hf-path mlx-community/fish-audio-s2-pro-bf16 \
  --mlx-path ./fish-audio-s2-pro-4bit \
  -q --q-bits 4 --q-group-size 64 --model-domain tts

python -c "
from mlx_audio.tts.utils import load_model
model = load_model('./fish-audio-s2-pro-4bit')  # raises ValueError before fix
"

Testing

After the fix, quantized models load and run correctly.

🤖 Generated with Claude Code

…anitize()

sanitize() was skipping all keys that did not start with
"text_model.model." or "audio_decoder.", including keys already in
MLX format ("model.*"). This caused quantized models produced by
mlx_audio.convert to fail to load with "Missing N parameters" error,
because sanitize() returned an empty dict for such models.

Fix: pass through "model.*" keys unchanged so that both the original
HuggingFace weights and previously converted/quantized MLX weights
are handled correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lucasnewman
Copy link
Copy Markdown
Collaborator

@yoshphys Thanks! Did you test audio generation and does it sound good to your ear? Your agent can't hear very well so it needs human verification :)

@yoshphys
Copy link
Copy Markdown
Contributor Author

@lucasnewman Thank you for paying attention on this PR. Sure. I wished I could make my agent hear the sound, but it cannot. So I had to listen to some audio outputs after modifying the code. I generated a couple of sets of audio sound with different sets of quantized parameters, 4bit and 8 bit, respectively. In the case of 4bit, I got aware that a few parts of audio with inline control had less quality, but I guess that was because of the quantization. In the case of 8bit-parameters set, there's no problem and the audio quality was good enough.

Copy link
Copy Markdown
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @yoshphys

Thanks for your contribution!

I would add a quant predicate to skip the codec, embedding and projections from the quantisation process or use a higher quant. This would ensure high quality even in low quants precision.

Am thinking about how we can automatically scan and suggest this with PRs such as #490 but for now has to be manual.

cc: @lucasnewman

@lucasnewman
Copy link
Copy Markdown
Collaborator

The conversion actually works correctly, so that part should be good. I tested your change with an 8-bit conversion and it looks good, thank you!

@lucasnewman lucasnewman merged commit bdfaaf3 into Blaizzy:main Mar 20, 2026
10 checks passed
korale77 added a commit to korale77/mlx-audio that referenced this pull request Mar 25, 2026
…tized model loading

sanitize() drops weight keys not found in the model's current parameter
shapes. Since the model isn't quantized yet at sanitize time, quantization
metadata keys (.scales, .biases) are silently removed. Later,
apply_quantization() checks for these keys to decide which layers to
quantize -- finds nothing -- skips quantization -- and loading fails with
a shape mismatch.

Preserve .scales and .biases keys through sanitization, matching the
existing pattern in chatterbox/s3gen.

Same class of bug as Blaizzy#584 (fish_qwen3_omni sanitize fix).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants